Problem Statement
The SDK has several pieces of human-in-the-loop infrastructure that don't connect into a working system:
Agent-level pause (between turns)
- The
require_input built-in tool exists with types: clarification, decision, information, approval
FinishReasonInputRequired and InputRequiredToolIDs exist on InvocationEndEvent
StatusStageInputRequired exists as a status stage
- But:
executeTools() never detects that require_input was called. The tool executes, returns a JSON response, and the agent loop continues — it never yields FinishReasonInputRequired. The TODO comment at llmagent.go:496 acknowledges this gap.
Tool-level pause (mid-execution, MCP elicitation)
- The MCP spec defines an elicitation feature where MCP servers can request structured user input during tool execution
- The SDK's MCP client (
tool/mcp/client.go) has no elicitation callback support
Tool.Execute() is synchronous with no channel for requesting user input mid-execution
Reconciler gap
- Multiple built-in tools (
require_input, todo_update, plot, artifact_emit) return structured responses referencing a "reconciler" that should process them
- This reconciler doesn't exist as a public API
- Without it, these tools are partial protocols — they produce structured output that nothing consumes
No async resume
- The tool approval example (
agent_interceptors) blocks a goroutine on stdin — works for CLI, not for web apps, Slack bots, or any async context
- No mechanism to serialize pause state, wait asynchronously, and resume when input arrives
Proposed Solution
1. Wire up agent-level pause
Add detection in executeTools() for tools that signal they need user input. When require_input (or any tool marked as input-requiring) executes, the agent should:
- Detect the tool response indicates input is needed
- Yield
StatusEvent{Stage: StatusStageInputRequired}
- Return
FinishReasonInputRequired with the relevant tool IDs in InvocationEndEvent.InputRequiredToolIDs
- The runner saves the session; the next
runner.Run() call with the user's answer resumes naturally
2. Implement the reconciler as a public API
Define a Reconciler interface (or event handler) that processes structured tool responses:
// Reconciler processes tool responses that require special handling
// (pausing for input, creating artifacts, updating task state).
type Reconciler interface {
// ProcessToolResponse inspects a tool response and returns an action.
// Actions: continue (default), pause (need input), emit artifact, etc.
ProcessToolResponse(ctx context.Context, resp *llm.ToolResponse) (Action, error)
}
The runner (or agent) would invoke the reconciler after each tool response to determine if execution should pause, an artifact should be stored, etc.
3. Add MCP elicitation support
Add an elicitation callback to the MCP client configuration:
type ElicitationHandler func(ctx context.Context, req ElicitationRequest) (ElicitationResponse, error)
type ClientConfig struct {
// ...existing fields...
OnElicitation ElicitationHandler
}
When an MCP server requests elicitation during tool execution, the client calls this handler. The handler implementation decides whether to prompt the user synchronously (CLI) or pause execution and resume later (web/async).
4. Support async pause/resume
The key insight: the session already contains the conversation history. When execution pauses:
- The agent yields
InvocationEndEvent{FinishReason: FinishReasonInputRequired, InputRequiredToolIDs: [...]}
- The runner saves the session (already does this)
- The consumer presents the input request to the user via their UI
- When the user responds, the consumer calls
runner.Run() with the user's response as the new message
- The agent sees the conversation history and continues naturally
This mostly works today — the missing piece is step 1 (detecting that pause is needed).
Use Case Example
Agent asks for clarification:
// Agent calls require_input tool: "There are 3 clusters named 'prod'. Which one?"
// SDK detects this and pauses:
for evt, err := range runner.Run(ctx, userID, sessionID, userMsg) {
switch e := evt.(type) {
case agent.InvocationEndEvent:
if e.FinishReason == agent.FinishReasonInputRequired {
// Present e.InputRequiredToolIDs to user, collect response
userResponse := presentChoiceToUser(e)
// Resume with user's answer
for evt2, _ := range runner.Run(ctx, userID, sessionID, userResponse) { ... }
}
}
}
MCP tool needs user input:
mcpClient := mcp.NewClient(serverConfig, mcp.WithElicitation(func(ctx context.Context, req mcp.ElicitationRequest) (mcp.ElicitationResponse, error) {
// Present req.Schema to user, collect input
return collectUserInput(req)
}))
Tool approval (async):
// Instead of blocking on stdin, the approval interceptor returns FinishReasonInputRequired
// The web app presents an approval button
// When clicked, resumes the session with an approval message
Why This Matters
- Real-world agents need human oversight: Production agents that execute dangerous actions (deployments, data mutations, financial transactions) must support approval workflows. The current stdin-blocking approach only works in toy examples.
- MCP ecosystem compatibility: MCP servers increasingly use elicitation for authentication flows, configuration prompts, and data collection. Without this, the SDK can't fully participate in the MCP ecosystem.
- Built-in tools are half-implemented:
require_input, todo, plot, and artifact_emit all depend on a reconciler that doesn't exist. Shipping tools that reference non-public infrastructure creates a confusing developer experience.
- Async-first: Web applications, Slack bots, API servers, and other async contexts are the primary deployment targets for agents. Blocking a goroutine for user input doesn't scale.
Problem Statement
The SDK has several pieces of human-in-the-loop infrastructure that don't connect into a working system:
Agent-level pause (between turns)
require_inputbuilt-in tool exists with types: clarification, decision, information, approvalFinishReasonInputRequiredandInputRequiredToolIDsexist onInvocationEndEventStatusStageInputRequiredexists as a status stageexecuteTools()never detects thatrequire_inputwas called. The tool executes, returns a JSON response, and the agent loop continues — it never yieldsFinishReasonInputRequired. The TODO comment atllmagent.go:496acknowledges this gap.Tool-level pause (mid-execution, MCP elicitation)
tool/mcp/client.go) has no elicitation callback supportTool.Execute()is synchronous with no channel for requesting user input mid-executionReconciler gap
require_input,todo_update,plot,artifact_emit) return structured responses referencing a "reconciler" that should process themNo async resume
agent_interceptors) blocks a goroutine on stdin — works for CLI, not for web apps, Slack bots, or any async contextProposed Solution
1. Wire up agent-level pause
Add detection in
executeTools()for tools that signal they need user input. Whenrequire_input(or any tool marked as input-requiring) executes, the agent should:StatusEvent{Stage: StatusStageInputRequired}FinishReasonInputRequiredwith the relevant tool IDs inInvocationEndEvent.InputRequiredToolIDsrunner.Run()call with the user's answer resumes naturally2. Implement the reconciler as a public API
Define a
Reconcilerinterface (or event handler) that processes structured tool responses:The runner (or agent) would invoke the reconciler after each tool response to determine if execution should pause, an artifact should be stored, etc.
3. Add MCP elicitation support
Add an elicitation callback to the MCP client configuration:
When an MCP server requests elicitation during tool execution, the client calls this handler. The handler implementation decides whether to prompt the user synchronously (CLI) or pause execution and resume later (web/async).
4. Support async pause/resume
The key insight: the session already contains the conversation history. When execution pauses:
InvocationEndEvent{FinishReason: FinishReasonInputRequired, InputRequiredToolIDs: [...]}runner.Run()with the user's response as the new messageThis mostly works today — the missing piece is step 1 (detecting that pause is needed).
Use Case Example
Agent asks for clarification:
MCP tool needs user input:
Tool approval (async):
Why This Matters
require_input,todo,plot, andartifact_emitall depend on a reconciler that doesn't exist. Shipping tools that reference non-public infrastructure creates a confusing developer experience.