Additional Functionality: LLMs

We offer a number of additional features for LLMs. In most of the examples below, we'll be using the OpenAI LLM. However, all of these features are available for all LLMs.

Additional Methods

LangChain provides a number of additional methods for interacting with LLMs:

import { OpenAI } from "langchain/llms/openai";

export const run = async () => {
  const modelA = new OpenAI();
  // `call` is a simple string-in, string-out method for interacting with the model.
  const resA = await modelA.call(
    "What would be a good company name a company that makes colorful socks?"
  );
  console.log({ resA });
  // { resA: '\n\nSocktastic Colors' }

  // `generate` allows you to generate multiple completions for multiple prompts (in a single request for some models).
  const resB = await modelA.generate([
    "What would be a good company name a company that makes colorful socks?",
    "What would be a good company name a company that makes colorful sweaters?",
  ]);

  // `resB` is a `LLMResult` object with a `generations` field and `llmOutput` field.
  // `generations` is a `Generation[][]`, each `Generation` having a `text` field.
  // Each input to the LLM could have multiple generations (depending on the `n` parameter), hence the list of lists.
  console.log(JSON.stringify(resB, null, 2));
  /*
  {
      "generations": [
          [{
              "text": "\n\nVibrant Socks Co.",
              "generationInfo": {
                  "finishReason": "stop",
                  "logprobs": null
              }
          }],
          [{
              "text": "\n\nRainbow Knitworks.",
              "generationInfo": {
                  "finishReason": "stop",
                  "logprobs": null
              }
          }]
      ],
      "llmOutput": {
          "tokenUsage": {
              "completionTokens": 17,
              "promptTokens": 29,
              "totalTokens": 46
          }
      }
  }
  */

  // We can specify additional parameters the specific model provider supports, like `temperature`:
  const modelB = new OpenAI({ temperature: 0.9 });
  const resC = await modelA.call(
    "What would be a good company name a company that makes colorful socks?"
  );
  console.log({ resC });
  // { resC: '\n\nKaleidoSox' }

  // We can get the number of tokens for a given input for a specific model.
  const numTokens = modelB.getNumTokens("How many tokens are in this input?");
  console.log({ numTokens });
  // { numTokens: 8 }
};

Streaming Responses

Some LLMs provide a streaming response. This means that instead of waiting for the entire response to be returned, you can start processing it as soon as it's available. This is useful if you want to display the response to the user as it's being generated, or if you want to process the response as it's being generated. LangChain currently provides streaming for the OpenAI LLM:

import { CallbackManager } from "langchain/callbacks";
import { OpenAI } from "langchain/llms/openai";

export const run = async () => {
  // To enable streaming, we pass in `streaming: true` to the LLM constructor.
  // Additionally, we pass in a `CallbackManager` with a handler set up for the `handleLLMNewToken` event.
  const chat = new OpenAI({
    maxTokens: 25,
    streaming: true,
    callbackManager: CallbackManager.fromHandlers({
      async handleLLMNewToken(token: string) {
        console.log({ token });
      },
    }),
  });

  const response = await chat.call("Tell me a joke.");
  console.log(response);
  /*
  { token: '\n' }
  { token: '\n' }
  { token: 'Q' }
  { token: ':' }
  { token: ' Why' }
  { token: ' did' }
  { token: ' the' }
  { token: ' chicken' }
  { token: ' cross' }
  { token: ' the' }
  { token: ' playground' }
  { token: '?' }
  { token: '\n' }
  { token: 'A' }
  { token: ':' }
  { token: ' To' }
  { token: ' get' }
  { token: ' to' }
  { token: ' the' }
  { token: ' other' }
  { token: ' slide' }
  { token: '.' }


  Q: Why did the chicken cross the playground?
  A: To get to the other slide.
  */
};

Caching

LangChain provides an optional caching layer for LLMs. This is useful for two reasons:

It can save you money by reducing the number of API calls you make to the LLM provider, if you're often requesting the same completion multiple times.
It can speed up your application by reducing the number of API calls you make to the LLM provider.

Caching in-memory

The default cache is stored in-memory. This means that if you restart your application, the cache will be cleared.

To enable it you can pass cache: true when you instantiate the LLM. For example:

import { OpenAI } from "langchain/llms/openai";

const model = new OpenAI({ cache: true });

Caching with Redis

LangChain also provides a Redis-based cache. This is useful if you want to share the cache across multiple processes or servers. To use it, you'll need to install the redis package:

npm
Yarn
pnpm

npm install redis

yarn add redis

pnpm add redis

Then, you can pass a cache option when you instantiate the LLM. For example:

import { OpenAI } from "langchain/llms/openai";
import { RedisCache } from "langchain/cache";
import { createClient } from "redis";

// See https://github.com/redis/node-redis for connection options
const client = createClient();
const cache = new RedisCache(client);

const model = new OpenAI({ cache });

Adding a timeout

By default, LangChain will wait indefinitely for a response from the model provider. If you want to add a timeout, you can pass a timeout option, in milliseconds, when you instantiate the model. For example, for OpenAI:

import { OpenAI } from "langchain/llms/openai";

export const run = async () => {
  const model = new OpenAI(
    { temperature: 1, timeout: 1000 } // 1s timeout
  );

  const resA = await model.call(
    "What would be a good company name a company that makes colorful socks?"
  );

  console.log({ resA });
  // '\n\nSocktastic Colors' }
};

Currently, the timeout option is only supported for OpenAI models.

Dealing with Rate Limits

Some LLM providers have rate limits. If you exceed the rate limit, you'll get an error. To help you deal with this, LangChain provides a maxConcurrency option when instantiating an LLM. This option allows you to specify the maximum number of concurrent requests you want to make to the LLM provider. If you exceed this number, LangChain will automatically queue up your requests to be sent as previous requests complete.

For example, if you set maxConcurrency: 5, then LangChain will only send 5 requests to the LLM provider at a time. If you send 10 requests, the first 5 will be sent immediately, and the next 5 will be queued up. Once one of the first 5 requests completes, the next request in the queue will be sent.

To use this feature, simply pass maxConcurrency: <number> when you instantiate the LLM. For example:

import { OpenAI } from "langchain/llms/openai";

const model = new OpenAI({ maxConcurrency: 5 });

Dealing with API Errors

If the model provider returns an error from their API, by default LangChain will retry up to 6 times on an exponential backoff. This enables error recovery without any additional effort from you. If you want to change this behavior, you can pass a maxRetries option when you instantiate the model. For example:

import { OpenAI } from "langchain/llms/openai";

const model = new OpenAI({ maxRetries: 10 });

Logging for Debugging

Especially when using an agent, there can be a lot of back-and-forth going on behind the scenes as a LLM processes a chain. For agents, the response object contains an intermediateSteps object that you can print to see an overview of the steps it took to get there. If that's not enough and you want to see every exchange with the LLM, you can use the LLMCallbackManager to write yourself custom logging (or anything else you want to do) as the model goes through the steps:

import { LLMResult } from "langchain/schema";
import { CallbackManager } from "langchain/callbacks";
import { OpenAI } from "langchain/llms/openai";

export const run = async () => {
  // We can pass in a `CallbackManager` to the LLM constructor to get callbacks for various events.
  const callbackManager = CallbackManager.fromHandlers({
    handleLLMStart: async (llm: { name: string }, prompts: string[]) => {
      console.log(JSON.stringify(llm, null, 2));
      console.log(JSON.stringify(prompts, null, 2));
    },
    handleLLMEnd: async (output: LLMResult) => {
      console.log(JSON.stringify(output, null, 2));
    },
    handleLLMError: async (err: Error) => {
      console.error(err);
    },
  });

  const model = new OpenAI({
    verbose: true,
    callbackManager,
  });

  await model.call(
    "What would be a good company name a company that makes colorful socks?"
  );
  // {
  //     "name": "openai"
  // }
  // [
  //     "What would be a good company name a company that makes colorful socks?"
  // ]
  // {
  //   "generations": [
  //     [
  //         {
  //             "text": "\n\nSocktastic Splashes.",
  //             "generationInfo": {
  //                 "finishReason": "stop",
  //                 "logprobs": null
  //             }
  //         }
  //     ]
  //  ],
  //   "llmOutput": {
  //     "tokenUsage": {
  //         "completionTokens": 9,
  //          "promptTokens": 14,
  //          "totalTokens": 23
  //     }
  //   }
  // }
};

Additional Functionality: LLMs

Additional Methods​

Streaming Responses​

Caching​

Caching in-memory​

Caching with Redis​

Adding a timeout​

Dealing with Rate Limits​

Dealing with API Errors​

Logging for Debugging​