Skip to content

reflect: gpt-oss harmony tool-call tokens leak into answer text (litellm MaaS, v0.8.2) #2222

@ximihoque

Description

@ximihoque

Summary

reflect intermittently returns the model's raw harmony tool-call scaffolding as the answer text instead of a synthesized answer, when the reflect LLM is a gpt-oss / harmony-format model (e.g. vertex_ai/openai/gpt-oss-120b-maas via litellm). The text field comes back as e.g.:

<|start|>assistant<|channel|>commentary to=functions.recall<|message|>{"query": "..."}<|call|>

with based_on empty (no tool was actually executed). A second variant leaks the done(...) payload as a bare JSON blob:

{"answer":"I'm sorry, but I don't have that information."}

Reproduced on v0.8.2.

Impact

The reflect endpoint silently returns corrupt, unsynthesized output as if it were a valid answer. Downstream consumers can't distinguish it from a real low-confidence answer (it has empty citations + plausible-looking text), so the garbage surfaces to end users.

Root cause

Two layers combine:

  1. litellm provider doesn't parse harmony tool callsengine/providers/litellm_llm.py::call_with_tools (around lines 360 / 366): it reads content = message.content and only populates tool_calls when message.tool_calls is set by litellm. The vertex_ai/openai/gpt-oss-*-maas adapter intermittently fails to parse the model's harmony tool-call turn into structured message.tool_calls, so the raw <|channel|>...to=functions...<|call|> text lands in message.content and tool_calls is empty.

  2. The reflect agent treats leftover content as a final answerengine/reflect/agent.py:744 (if not result.tool_calls:) interprets empty-tool_calls + non-empty-content as "the LLM wants to respond with text," and at line 753-754 returns _clean_answer_text(result.content.strip()). But _clean_answer_text (lines 101-109) only strips done(...) patterns — it has no handling for harmony channel/tool-call tokens, so the scaffolding passes through verbatim. With zero tools executed, based_on.memories is empty too → low confidence, no citations.

Reproduction

Against a bank with data, with the reflect model set to vertex_ai/openai/gpt-oss-120b-maas (litellm), call reflect repeatedly:

curl -s -X POST http://localhost:8888/v1/default/banks/<bank>/reflect \
  -H 'Content-Type: application/json' \
  -d '{"query":"summarize recent activity","budget":"high"}'

It is probabilistic and budget-correlated: in our measurements high leaked ~4/6 runs, low ~1/3, some queries clean 5/5. More iterations (higher budget, larger context) = more tool-call turns = more chances for litellm to drop one. So mid/low are not safe either, just less frequent.

Suggested fix (either, ideally both)

  • A (real fix): ensure litellm parses gpt-oss harmony channels into structured tool_calls for the Vertex MaaS gpt-oss endpoint (correct custom_llm_provider / parsing), so the agent never sees raw harmony tokens.
  • B (defense in depth, in agent.py): at the if not result.tool_calls: branch, before accepting result.content as the answer, detect harmony / raw-tool-call markers (<|start|>, <|channel|>, <|message|>, <|call|>, to=functions) or a bare {"answer":...} blob, and treat it as a failed tool-parse — re-parse the harmony turn into a real tool call and continue the loop, or fall through to the existing build_final_prompt final-synthesis path (which already handles the no-content case at lines 817+). Extending _clean_answer_text to also strip/reject harmony scaffolding would at minimum stop the corrupt text from being surfaced.

Happy to provide more raw samples if useful.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions