From de4efa74f06389b857d8d07b810f4a6bff17ac45 Mon Sep 17 00:00:00 2001 From: Tom Li Date: Sun, 19 Apr 2026 21:24:12 -0700 Subject: [PATCH] docs(streaming): add stream_final_turn_only documentation Add documentation for the new stream_final_turn_only parameter on Agent.stream_async(). This parameter allows callers to suppress intermediate turn text events and only receive the final answer, which is useful for production chat UIs and SSE endpoints. The new section is added to the async-iterators page with a before/after code example and use case descriptions. Resolves: #2055 --- .../concepts/streaming/async-iterators.mdx | 40 +++++++++++++++++++ 1 file changed, 40 insertions(+) diff --git a/src/content/docs/user-guide/concepts/streaming/async-iterators.mdx b/src/content/docs/user-guide/concepts/streaming/async-iterators.mdx index 5ac093e69..75dfedf49 100644 --- a/src/content/docs/user-guide/concepts/streaming/async-iterators.mdx +++ b/src/content/docs/user-guide/concepts/streaming/async-iterators.mdx @@ -104,6 +104,46 @@ curl localhost:3000/stream -d '{"prompt": "Hello"}' -H "Content-Type: applicatio +## Streaming Only the Final Turn (Python) + +When using `stream_async` with tool-using agents, text events are yielded from every model turn — including intermediate reasoning before tool calls. For production chat UIs and SSE endpoints, this intermediate text is often noise. The `stream_final_turn_only` parameter lets you suppress it at the SDK level. + +When `stream_final_turn_only=True`: +- Text events from intermediate turns (where the model calls tools) are buffered and discarded +- Text events from the final turn (where `stop_reason` is `"end_turn"`) are yielded to the caller and forwarded to the callback handler +- Non-text events (lifecycle, tool use, reasoning, citations, model stream chunks) pass through unchanged in all turns + +The default is `False` — fully backward compatible with no behavior change unless opted in. + +```python +from strands import Agent +from strands_tools import calculator + +agent = Agent( + tools=[calculator], + callback_handler=None +) + +# Without stream_final_turn_only: receives text from ALL turns, +# including intermediate "Let me calculate that..." reasoning +async for event in agent.stream_async("What is 25 * 48?"): + if "data" in event: + print(event["data"], end="") + +# With stream_final_turn_only: receives text only from the final answer +async for event in agent.stream_async( + "What is 25 * 48?", + stream_final_turn_only=True +): + if "data" in event: + print(event["data"], end="") # Only the final answer +``` + +This is particularly useful for: +- Chat applications streaming via SSE where users should only see the final answer +- API endpoints wrapping agents where downstream consumers expect a single coherent streamed response +- Any deployment where intermediate model reasoning is noise for the end user + ### Agentic Loop This async stream processor illustrates the event loop lifecycle events and how they relate to each other. It's useful for understanding the flow of execution in the Strands agent: