From de4efa74f06389b857d8d07b810f4a6bff17ac45 Mon Sep 17 00:00:00 2001
From: Tom Li <zhifanli@amazon.com>
Date: Sun, 19 Apr 2026 21:24:12 -0700
Subject: [PATCH] docs(streaming): add stream_final_turn_only documentation

Add documentation for the new stream_final_turn_only parameter
on Agent.stream_async(). This parameter allows callers to suppress
intermediate turn text events and only receive the final answer,
which is useful for production chat UIs and SSE endpoints.

The new section is added to the async-iterators page with a
before/after code example and use case descriptions.

Resolves: #2055
---
 .../concepts/streaming/async-iterators.mdx    | 40 +++++++++++++++++++
 1 file changed, 40 insertions(+)
diff --git a/src/content/docs/user-guide/concepts/streaming/async-iterators.mdx b/src/content/docs/user-guide/concepts/streaming/async-iterators.mdx
index 5ac093e69..75dfedf49 100644
--- a/src/content/docs/user-guide/concepts/streaming/async-iterators.mdx
+++ b/src/content/docs/user-guide/concepts/streaming/async-iterators.mdx
@@ -104,6 +104,46 @@ curl localhost:3000/stream -d '{"prompt": "Hello"}' -H "Content-Type: applicatio
 </Tab>
 </Tabs>
 
+## Streaming Only the Final Turn (Python)
+
+When using `stream_async` with tool-using agents, text events are yielded from every model turn — including intermediate reasoning before tool calls. For production chat UIs and SSE endpoints, this intermediate text is often noise. The `stream_final_turn_only` parameter lets you suppress it at the SDK level.
+
+When `stream_final_turn_only=True`:
+- Text events from intermediate turns (where the model calls tools) are buffered and discarded
+- Text events from the final turn (where `stop_reason` is `"end_turn"`) are yielded to the caller and forwarded to the callback handler
+- Non-text events (lifecycle, tool use, reasoning, citations, model stream chunks) pass through unchanged in all turns
+
+The default is `False` — fully backward compatible with no behavior change unless opted in.
+
+```python
+from strands import Agent
+from strands_tools import calculator
+
+agent = Agent(
+    tools=[calculator],
+    callback_handler=None
+)
+
+# Without stream_final_turn_only: receives text from ALL turns,
+# including intermediate "Let me calculate that..." reasoning
+async for event in agent.stream_async("What is 25 * 48?"):
+    if "data" in event:
+        print(event["data"], end="")
+
+# With stream_final_turn_only: receives text only from the final answer
+async for event in agent.stream_async(
+    "What is 25 * 48?",
+    stream_final_turn_only=True
+):
+    if "data" in event:
+        print(event["data"], end="")  # Only the final answer
+```
+
+This is particularly useful for:
+- Chat applications streaming via SSE where users should only see the final answer
+- API endpoints wrapping agents where downstream consumers expect a single coherent streamed response
+- Any deployment where intermediate model reasoning is noise for the end user
+
 ### Agentic Loop
 
 This async stream processor illustrates the event loop lifecycle events and how they relate to each other. It's useful for understanding the flow of execution in the Strands agent: