Skip to content

Long-running (async) tool support: pause, poll, and resume across turns #104

@weeco

Description

@weeco

Problem Statement

All tool execution in the SDK is synchronous — Tool.Execute() blocks until the tool completes and returns a result. There is no concept of a tool that takes minutes, hours, or days to complete. The current 30-second default timeout (configurable) is the only concession to duration.

This is a significant limitation for real-world agent systems where tools commonly invoke operations that cannot return immediately:

  • CI/CD pipelines: "Deploy this to staging" → takes 10 minutes, returns a deployment URL
  • Data processing: "Run this query against the data warehouse" → takes 5 minutes, returns results
  • External approvals: "Create a PR and wait for review" → takes hours/days
  • Batch operations: "Process these 10,000 records" → takes 30 minutes, reports progress
  • Third-party APIs: "Generate a video from this prompt" → takes 2 minutes, returns a URL

Today, agents dealing with these operations must either:

  1. Block: Set a very long timeout and hold a goroutine hostage (wasteful, fragile, no progress visibility)
  2. Wrap manually: Build custom fire-and-poll tool pairs (start_deployment + check_deployment_status) with no SDK support for correlating them
  3. Give up: Return immediately with "I've started the process, check back later" and lose the ability to continue the conversation with the result

None of these are satisfactory. The agent loop should natively support tools that return a "pending" result, yield control to the caller, and resume when the result arrives.

Proposed Solution

1. Long-running tool declaration

Add an IsLongRunning() method to the tool interface (or a marker interface) that signals the tool won't complete immediately:

// LongRunningTool is a tool that may not return a final result immediately.
// When called, it returns an initial status (e.g., a task ID or "pending").
// The final result arrives on a subsequent invocation via the same call ID.
type LongRunningTool interface {
    Tool
    IsLongRunning() bool
}

For tools created from functions, a configuration flag:

registry.Register(myDeployTool, tool.WithLongRunning(true))

When a tool is marked as long-running, the SDK automatically appends guidance to its description for the LLM:

"NOTE: This is a long-running operation. After calling this tool, wait for the result to be provided — do not call the tool again for the same operation."

2. Agent loop: yield on long-running tool calls

When executeTools() detects that a long-running tool has been called:

  1. Execute the tool normally — it returns an initial result (e.g., {"task_id": "deploy-123", "status": "pending"})
  2. Add the tool response to the session as usual
  3. Mark the event with the long-running tool call IDs
  4. Return FinishReasonInputRequired with the long-running call IDs in InvocationEndEvent.InputRequiredToolIDs
  5. The runner saves the session and yields to the caller

The agent loop does not wait for the long-running tool to complete. It returns control immediately so the caller can manage the wait externally (poll, webhook, user notification, etc.).

3. Resume with results

When the long-running operation completes, the caller resumes the agent with a function response message that uses the same call ID from the original tool request:

// Original tool call had ID "call_abc123"
// External system has the result now
toolResult := llm.NewMessage(llm.RoleUser,
    llm.NewToolResponsePart(&llm.ToolResponse{
        ID:     "call_abc123",  // Same ID as original request
        Name:   "deploy_to_staging",
        Result: json.RawMessage(`{"status": "completed", "url": "https://staging.example.com"}`),
    }),
)

// Resume the agent — it sees the tool result in the session and continues
for evt, err := range runner.Run(ctx, userID, sessionID, toolResult) {
    // Agent processes the deployment result and continues conversation
}

4. Session history consolidation

When the agent resumes, the session history may look like:

[..., assistant(tool_call: deploy), user(tool_response: pending), user(text: "checking..."), user(tool_response: completed)]

The agent should consolidate this so the LLM sees a clean sequence:

[..., assistant(tool_call: deploy), user(tool_response: completed)]

A consolidateLongRunningResponses() function would rearrange events so the latest function response replaces intermediate ones, keeping the LLM's context clean and reducing token waste from status polling messages.

5. Mixed execution (sync + async tools in same turn)

When the LLM requests multiple tools in the same turn and some are long-running:

  • Sync tools: Execute immediately, collect results
  • Long-running tools: Execute (get initial status), mark as pending
  • All tool responses (completed + pending) are added to the session
  • Agent yields with the pending tool IDs

On resume, only the long-running results need to be provided — the sync results are already in the session.

Use Case Example

CI/CD deployment agent

// Tool that starts a deployment (returns immediately with task ID)
type DeployTool struct{}

func (d *DeployTool) IsLongRunning() bool { return true }

func (d *DeployTool) Execute(ctx context.Context, args json.RawMessage) (json.RawMessage, error) {
    var req DeployRequest
    json.Unmarshal(args, &req)

    // Start the deployment (non-blocking)
    taskID, err := startDeployment(ctx, req.Environment, req.Version)
    if err != nil {
        return nil, err
    }

    // Return immediately with task ID
    return json.Marshal(map[string]any{
        "task_id": taskID,
        "status":  "pending",
        "message": "Deployment started. Result will be provided when complete.",
    })
}

Agent interaction:

User: "Deploy v2.5.0 to staging"

Turn 1:
  Agent calls deploy_to_staging({"version": "v2.5.0", "environment": "staging"})
  Tool returns: {"task_id": "deploy-789", "status": "pending"}
  Agent yields: FinishReasonInputRequired, InputRequiredToolIDs: ["call_abc123"]

  // SDK returns control to caller
  // Caller starts polling the deployment system...

  // 8 minutes later, deployment completes

Turn 2 (resume):
  Caller sends tool response: {"status": "completed", "url": "https://staging.example.com", "duration": "8m12s"}
  Agent: "Deployment of v2.5.0 to staging completed successfully!
          URL: https://staging.example.com
          Duration: 8 minutes 12 seconds.
          Would you like me to run the smoke tests?"

Data pipeline with progress

// Tool returns progress updates via intermediate responses
func (d *DataPipelineTool) Execute(ctx context.Context, args json.RawMessage) (json.RawMessage, error) {
    jobID, _ := startPipelineJob(ctx, args)
    return json.Marshal(map[string]any{
        "job_id":  jobID,
        "status":  "running",
        "message": "Processing 10,000 records...",
    })
}

// Caller can send intermediate status updates:
// {"job_id": "job-456", "status": "running", "progress": "5,000/10,000 records"}
// And then the final result:
// {"job_id": "job-456", "status": "completed", "records_processed": 10000}

Why This Matters

  • Real-world operations take time: Deployments, data processing, external API calls, and approval workflows are the bread and butter of production agent systems. An SDK that can only handle tools that complete in seconds is limited to toy use cases.
  • Resource efficiency: Blocking a goroutine for 10 minutes while a deployment runs is wasteful. Yielding control allows the caller to manage resources efficiently — serve other requests, update UIs, or simply wait without holding server resources.
  • A2A protocol alignment: The Agent-to-Agent protocol has native support for TaskStateInputRequired and long-running task tracking. Long-running tools map naturally to this protocol, enabling cross-agent orchestration of time-consuming operations.
  • Composability: Long-running tools compose with other SDK features — interceptors can track duration and cost, the OTel plugin can create spans that cover the full operation lifecycle, and the session store preserves state across the pause/resume boundary.
  • Distinct from human-in-the-loop: While the pause/resume mechanism overlaps with human-in-the-loop (Human-in-the-loop: pause/resume, MCP elicitation, and reconciler #93), the intent is different. HITL is about getting human decisions; long-running tools are about waiting for automated operations. The UX, polling patterns, and timeout semantics are different enough to warrant separate treatment.

Metadata

Metadata

Assignees

No one assigned

    Labels

    agentAgent frameworkenhancementNew feature or requesttoolTool system

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions