Additional Functionality: LLMs
We offer a number of additional features for LLMs. In most of the examples below, we'll be using the OpenAI
LLM. However, all of these features are available for all LLMs.
Additional Methods
LangChain provides a number of additional methods for interacting with LLMs:
import { OpenAI } from "langchain/llms/openai";
export const run = async () => {
const modelA = new OpenAI();
// `call` is a simple string-in, string-out method for interacting with the model.
const resA = await modelA.call(
"What would be a good company name a company that makes colorful socks?"
);
console.log({ resA });
// { resA: '\n\nSocktastic Colors' }
// `generate` allows you to generate multiple completions for multiple prompts (in a single request for some models).
const resB = await modelA.generate([
"What would be a good company name a company that makes colorful socks?",
"What would be a good company name a company that makes colorful sweaters?",
]);
// `resB` is a `LLMResult` object with a `generations` field and `llmOutput` field.
// `generations` is a `Generation[][]`, each `Generation` having a `text` field.
// Each input to the LLM could have multiple generations (depending on the `n` parameter), hence the list of lists.
console.log(JSON.stringify(resB, null, 2));
/*
{
"generations": [
[{
"text": "\n\nVibrant Socks Co.",
"generationInfo": {
"finishReason": "stop",
"logprobs": null
}
}],
[{
"text": "\n\nRainbow Knitworks.",
"generationInfo": {
"finishReason": "stop",
"logprobs": null
}
}]
],
"llmOutput": {
"tokenUsage": {
"completionTokens": 17,
"promptTokens": 29,
"totalTokens": 46
}
}
}
*/
// We can specify additional parameters the specific model provider supports, like `temperature`:
const modelB = new OpenAI({ temperature: 0.9 });
const resC = await modelA.call(
"What would be a good company name a company that makes colorful socks?"
);
console.log({ resC });
// { resC: '\n\nKaleidoSox' }
// We can get the number of tokens for a given input for a specific model.
const numTokens = modelB.getNumTokens("How many tokens are in this input?");
console.log({ numTokens });
// { numTokens: 8 }
};
Streaming Responses
Some LLMs provide a streaming response. This means that instead of waiting for the entire response to be returned, you can start processing it as soon as it's available. This is useful if you want to display the response to the user as it's being generated, or if you want to process the response as it's being generated.
LangChain currently provides streaming for the OpenAI
LLM:
import { CallbackManager } from "langchain/callbacks";
import { OpenAI } from "langchain/llms/openai";
export const run = async () => {
// To enable streaming, we pass in `streaming: true` to the LLM constructor.
// Additionally, we pass in a `CallbackManager` with a handler set up for the `handleLLMNewToken` event.
const chat = new OpenAI({
maxTokens: 25,
streaming: true,
callbackManager: CallbackManager.fromHandlers({
async handleLLMNewToken(token: string) {
console.log({ token });
},
}),
});
const response = await chat.call("Tell me a joke.");
console.log(response);
/*
{ token: '\n' }
{ token: '\n' }
{ token: 'Q' }
{ token: ':' }
{ token: ' Why' }
{ token: ' did' }
{ token: ' the' }
{ token: ' chicken' }
{ token: ' cross' }
{ token: ' the' }
{ token: ' playground' }
{ token: '?' }
{ token: '\n' }
{ token: 'A' }
{ token: ':' }
{ token: ' To' }
{ token: ' get' }
{ token: ' to' }
{ token: ' the' }
{ token: ' other' }
{ token: ' slide' }
{ token: '.' }
Q: Why did the chicken cross the playground?
A: To get to the other slide.
*/
};
Caching
LangChain provides an optional caching layer for LLMs. This is useful for two reasons:
- It can save you money by reducing the number of API calls you make to the LLM provider, if you're often requesting the same completion multiple times.
- It can speed up your application by reducing the number of API calls you make to the LLM provider.
Caching in-memory
The default cache is stored in-memory. This means that if you restart your application, the cache will be cleared.
To enable it you can pass cache: true
when you instantiate the LLM. For example:
import { OpenAI } from "langchain/llms/openai";
const model = new OpenAI({ cache: true });
Caching with Redis
LangChain also provides a Redis-based cache. This is useful if you want to share the cache across multiple processes or servers. To use it, you'll need to install the redis
package:
- npm
- Yarn
- pnpm
npm install redis
yarn add redis
pnpm add redis
Then, you can pass a cache
option when you instantiate the LLM. For example:
import { OpenAI } from "langchain/llms/openai";
import { RedisCache } from "langchain/cache";
import { createClient } from "redis";
// See https://github.com/redis/node-redis for connection options
const client = createClient();
const cache = new RedisCache(client);
const model = new OpenAI({ cache });
Adding a timeout
By default, LangChain will wait indefinitely for a response from the model provider. If you want to add a timeout, you can pass a timeout
option, in milliseconds, when you instantiate the model. For example, for OpenAI:
import { OpenAI } from "langchain/llms/openai";
export const run = async () => {
const model = new OpenAI(
{ temperature: 1, timeout: 1000 } // 1s timeout
);
const resA = await model.call(
"What would be a good company name a company that makes colorful socks?"
);
console.log({ resA });
// '\n\nSocktastic Colors' }
};
Currently, the timeout option is only supported for OpenAI models.
Dealing with Rate Limits
Some LLM providers have rate limits. If you exceed the rate limit, you'll get an error. To help you deal with this, LangChain provides a maxConcurrency
option when instantiating an LLM. This option allows you to specify the maximum number of concurrent requests you want to make to the LLM provider. If you exceed this number, LangChain will automatically queue up your requests to be sent as previous requests complete.
For example, if you set maxConcurrency: 5
, then LangChain will only send 5 requests to the LLM provider at a time. If you send 10 requests, the first 5 will be sent immediately, and the next 5 will be queued up. Once one of the first 5 requests completes, the next request in the queue will be sent.
To use this feature, simply pass maxConcurrency: <number>
when you instantiate the LLM. For example:
import { OpenAI } from "langchain/llms/openai";
const model = new OpenAI({ maxConcurrency: 5 });
Dealing with API Errors
If the model provider returns an error from their API, by default LangChain will retry up to 6 times on an exponential backoff. This enables error recovery without any additional effort from you. If you want to change this behavior, you can pass a maxRetries
option when you instantiate the model. For example:
import { OpenAI } from "langchain/llms/openai";
const model = new OpenAI({ maxRetries: 10 });
Logging for Debugging
Especially when using an agent, there can be a lot of back-and-forth going on behind the scenes as a LLM processes a chain. For agents, the response object contains an intermediateSteps object that you can print to see an overview of the steps it took to get there. If that's not enough and you want to see every exchange with the LLM, you can use the LLMCallbackManager to write yourself custom logging (or anything else you want to do) as the model goes through the steps:
import { LLMResult } from "langchain/schema";
import { CallbackManager } from "langchain/callbacks";
import { OpenAI } from "langchain/llms/openai";
export const run = async () => {
// We can pass in a `CallbackManager` to the LLM constructor to get callbacks for various events.
const callbackManager = CallbackManager.fromHandlers({
handleLLMStart: async (llm: { name: string }, prompts: string[]) => {
console.log(JSON.stringify(llm, null, 2));
console.log(JSON.stringify(prompts, null, 2));
},
handleLLMEnd: async (output: LLMResult) => {
console.log(JSON.stringify(output, null, 2));
},
handleLLMError: async (err: Error) => {
console.error(err);
},
});
const model = new OpenAI({
verbose: true,
callbackManager,
});
await model.call(
"What would be a good company name a company that makes colorful socks?"
);
// {
// "name": "openai"
// }
// [
// "What would be a good company name a company that makes colorful socks?"
// ]
// {
// "generations": [
// [
// {
// "text": "\n\nSocktastic Splashes.",
// "generationInfo": {
// "finishReason": "stop",
// "logprobs": null
// }
// }
// ]
// ],
// "llmOutput": {
// "tokenUsage": {
// "completionTokens": 9,
// "promptTokens": 14,
// "totalTokens": 23
// }
// }
// }
};