Streaming
Please see the following how-to guides for specific examples of streaming in LangChain:
Streaming is critical in making applications based on LLMs feel responsive to end-users.
Why Streaming?β
LLMs have noticeable latency on the order of seconds. This is much longer than the typical response time for most APIs, which are usually sub-second. The latency issue compounds quickly as you build more complex applications that involve multiple calls to a model.
Fortunately, LLMs generate output iteratively, which means it's possible to show sensible intermediate results before the final response is ready. Consuming output as soon as it becomes available has therefore become a vital part of the UX around building apps with LLMs to help alleviate latency issues, and LangChain aims to have first-class support for streaming.
Streaming APIsβ
Every LangChain component that implements the Runnable Interface supports streaming.
There are three main APIs for streaming in LangChain:
- sync stream and async astream: yields the output a Runnable as it is generated.
- The async astream_events: a streaming API that allows streaming intermediate steps from a Runnable. This API returns a stream of events.
- The legacy async astream_log: This is an advanced streaming API that allows streaming intermediate steps from a Runnable. Users should not use this API when writing new code.
Streaming with LangGraphβ
LangGraph compiled graphs are Runnables and support the same streaming APIs.
In LangGraph the stream
and astream
methods are phrased in terms of changes to the graph state, and as a result are much more helpful for getting intermediate states of the graph as they are generated.
Please review the LangGraph streaming guide for more information on how to stream when working with LangGraph.
.stream()
and .astream()
β
The .stream()
returns an iterator, which you can consume with a simple for
loop. Here's an example with a chat model:
from langchain_anthropic import ChatAnthropic
model = ChatAnthropic(model="claude-3-sonnet-20240229")
for chunk in model.stream("what color is the sky?"):
print(chunk.content, end="|", flush=True)
For models (or other components) that don't support streaming natively, this iterator would just yield a single chunk, but
you could still use the same general pattern when calling them. Using .stream()
will also automatically call the model in streaming mode
without the need to provide additional config.
The type of each outputted chunk depends on the type of component - for example, chat models yield AIMessageChunks
.
Because this method is part of LangChain Expression Language,
you can handle formatting differences from different outputs using an output parser to transform
each yielded chunk.
Dispatching Custom Eventsβ
You can dispatch custom callback events if you want to add custom data to the event stream of astream events.
You can use custom events to provide additional information about the progress of a long-running task.
For example, if you have a long-running tool that involves multiple steps (e.g., multiple API calls) with multiple steps, you can dispatch custom events between the steps and use these custom events to monitor progress. You could also surface these custom events to an end user of your application to show them how the current task is progressing.
Chat Modelsβ
"Auto-Streaming" Chat Modelsβ
Using Astream Events APIβ
Async throughoutβ
Important LangChain primitives like chat models, output parsers, prompts, retrievers, and agents implement the LangChain Runnable Interface.