Skip to content

fix(agent): gate tool dispatch on ToolUse terminal to ignore truncated calls#1676

Merged
sytone merged 1 commit into
mainfrom
fix/agent-loop-truncated-toolcall-gate
Jun 28, 2026
Merged

fix(agent): gate tool dispatch on ToolUse terminal to ignore truncated calls#1676
sytone merged 1 commit into
mainfrom
fix/agent-loop-truncated-toolcall-gate

Conversation

@agent-farnsworth

Copy link
Copy Markdown
Contributor

Closes #1666

Summary

A truncated assistant turn (provider stop reason Length/MAX_TOKENS, a content
filter such as Refusal/Sensitive, or a stream that ended without a terminal
event) that nonetheless surfaced a parsed ToolCall was being executed. The
agent loop decided dispatch purely on the presence of a tool call, with no
gate on the finish reason, so a half-formed tool call cut off at the token limit
could run with incomplete/garbled arguments.

This is the BotNexus analogue of the OpenClaw fix that ignores truncated tool
calls (fix(agent-core): ignore truncated tool calls). The posture is
conservative: only a ToolUse terminal authorizes execution.

Changes

  • AgentLoopRunner.RunLoopAsync (primary guard): tool dispatch is now gated
    on the terminal reason in addition to presence:

    hasMoreToolCalls = assistantMessage.FinishReason == StopReason.ToolUse
        && assistantMessage.ToolCalls is { Count: > 0 };

    A truncated turn no longer reaches ToolExecutor.ExecuteAsync.

  • StreamAccumulator (backstop / replay-safety): added
    StripNonExecutableToolCalls, applied on both the DoneEvent final-message
    path and the stream-EOF fallback path. When the terminal is not ToolUse, any
    surfaced tool call is dropped from both AssistantAgentMessage.ToolCalls
    and the ContentBlocks mirror, while the visible text and finish reason
    are retained. This matters because MessageConverter.ToAgentMessage populates
    ContentBlocks with the raw provider content, and
    ToProviderAssistantMessage prefers ContentBlocks when re-serializing a
    message for a later LLM call -- so nulling only ToolCalls would leave a
    re-dispatchable tool call hiding in ContentBlocks. A genuine ToolUse turn
    is returned unchanged.

The two changes are complementary defense-in-depth: the loop guard blocks live
dispatch; the accumulator strip ensures the recorded/replayed message cannot be
re-dispatched by a later code path. The new loop regression test asserts both
properties, so it fails if either fix is missing.

Stop-vs-Length posture decision (why the guard is == ToolUse)

The issue notes that "a normal Stop with a complete tool call should keep its
existing promotion behaviour at the provider/parser layer; the loop-level guard
is the backstop", and asks to narrow the strip to only truncation terminals if
any provider legitimately emits a complete tool call under a non-ToolUse
terminal. I verified every provider's stop-reason mapping before choosing the
broad guard:

  • Anthropic (AnthropicProvider.MapStopReason): tool_use -> ToolUse,
    max_tokens -> Length.
  • Copilot (CopilotMessagesProvider): tool_use -> ToolUse,
    max_tokens -> Length.
  • OpenAI Completions (CompletionsStreamEngine.MapStopReason):
    tool_calls/function_call -> ToolUse, length -> Length.
  • OpenAI Responses (ResponsesStreamParser): explicitly promotes a
    complete tool call from Stop to ToolUse
    (if (contentBlocks.OfType<ToolCallContent>().Any() && stopReason == Stop) stopReason = ToolUse;),
    and maps incomplete -> Length.
  • OpenAICompat (OpenAICompatProvider): tool_calls/function_call -> ToolUse,
    and null + hasToolCalls -> ToolUse; length -> Length.
  • IntegrationMock: tool events snapshot ToolUse; length -> Length.

Conclusion: every legitimate, complete tool call is promoted to
StopReason.ToolUse at the provider/parser layer; a truncated call retains
Length (or Refusal/Sensitive for content filters, Stop only when no
tool call is present). No provider emits a complete tool call under a
non-ToolUse terminal. Therefore the broad guard FinishReason == ToolUse
(block "anything not ToolUse" that carries a tool call) is correct -- it blocks
only the truncated/filtered case and never a real tool turn -- and it mirrors
OpenClaw's conservative posture without needing to enumerate Length /
content-filter / EOF explicitly. The strip is symmetric (also "not ToolUse"),
which is safe for the same reason and additionally covers the stream-EOF
fallback the issue calls out.

Tests (TDD, RED -> GREEN)

New tests written before the fix:

  • AgentLoopRunnerTruncatedToolCallTests
    • TruncatedToolCallTurn_IsNotDispatchedAndIsStrippedFromPersistedMessage:
      a scripted [StopReason.Length, content=[text, partial toolCall]] turn must
      NOT invoke the recording tool (ExecuteCount == 0) and the persisted
      assistant message must retain text + Length finish reason but carry no tool
      call (neither ToolCalls nor a ToolCallContent block), and no tool-result
      message is produced.
    • CompleteToolUseTurn_IsDispatched: a normal ToolUse turn still dispatches
      (ExecuteCount == 1, one tool-result message) -- no regression of legitimate
      tool execution.
  • StreamAccumulatorTruncatedToolCallTests: unit-level strip behaviour for
    Length and Sensitive terminals (tool call stripped, text + reason kept) and
    ToolUse (tool call retained).

RED -> GREEN proof (manual, non-incremental rebuild between each toggle to avoid
the stale-DLL trap):

  1. Both fixes off (original bug): dispatch test fails -- ExecuteCount is 1.
  2. Guard restored, strip still off: ExecuteCount passes (guard blocks live
    dispatch) but the persisted-message assertion fails -- proving the strip is
    independently load-bearing for replay-safety.
  3. Both fixes on: all 5 new tests green.

Full impacted gate: scripts/repo/test-impacted.ps1 -> All impacted tests
passed
(22 projects, including the always-run Architecture.Tests and
Scenarios.Tests; BotNexus.Agent.Core.Tests reports 217 passed / 0 failed).

Merge Notes

This change is file-disjoint from the other open PRs (#1672, #1673, #1674,
#1675): it touches only BotNexus.Agent.Core loop code
(Loop/AgentLoopRunner.cs, Loop/StreamAccumulator.cs) and adds two new test
files under BotNexus.Agent.Core.Tests/Loop/. No shared files, no provider or
extension changes. Safe to merge in parallel with those PRs.

@sytone sytone merged commit 1490a1e into main Jun 28, 2026
12 checks passed
@sytone sytone deleted the fix/agent-loop-truncated-toolcall-gate branch June 28, 2026 07:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] agent loop dispatches tool calls from truncated (length/MAX_TOKENS) turns

1 participant