Skip to content

fix: suppress MCP connection errors and handle LLM double-serialization#2434

Open
wintrover wants to merge 4 commits into
MoonshotAI:mainfrom
wintrover:fix/llm-serialization-and-mcp-errors-clean
Open

fix: suppress MCP connection errors and handle LLM double-serialization#2434
wintrover wants to merge 4 commits into
MoonshotAI:mainfrom
wintrover:fix/llm-serialization-and-mcp-errors-clean

Conversation

@wintrover

@wintrover wintrover commented Jun 5, 2026

Copy link
Copy Markdown

Summary

This PR fixes 3 issues discovered during heavy MCP tool usage:

1. Suppress MCP connection errors in event loop handler

File: src/kimi_cli/telemetry/crash.py

When an MCP server (Notion, code-index, etc.) connection drops, the event loop cleanup code tries to close the already-closed connection, printing "Unhandled exception in event loop" to the terminal.

Fix: Add MCP connection pattern detection to _asyncio_handler. Matching exceptions (connection closed, broken pipe, etc.) are silently suppressed instead of being delegated to the default handler.

2. Handle LLM double-serialization of todos array

File: src/kimi_cli/tools/todo/__init__.py

The LLM sometimes emits array values as JSON-encoded strings:

  • Wrong: {"todos": "[{\\"title\\": ...}]"}
  • Correct: {"todos": [{"title": ...}]}

Fix: Add field_validator(mode="before") on Params.todos that detects string values and parses them as JSON before Pydantic validation.

3. Handle LLM double-serialization of edit parameter

File: src/kimi_cli/tools/file/replace.py

Same pattern as #2 — the LLM sometimes emits Edit objects as JSON strings.

Fix: Add field_validator(mode="before") on Params.edit that detects string values and parses them as JSON before validation.

Context7 Verification

All fixes verified against official documentation:

  • asyncio.set_exception_handler() — Python docs §12.7.2.4
  • Pydantic field_validator(mode="before") — runs before standard validation for type coercion

Testing

All 3 fixes tested locally:

  • MCP connection drops no longer print to terminal
  • SetTodoList works with both proper array and string-serialized array
  • StrReplaceFile works with both proper object and string-serialized object

Open in Devin Review

wintrover added 3 commits June 5, 2026 22:33
When an MCP server (Notion, code-index, etc.) connection drops, the
event loop's cleanup code tries to close the already-closed connection,
raising 'Connection closed' / 'broken pipe' exceptions that print as
'Unhandled exception in event loop' in the terminal.

This fix adds MCP connection pattern detection to _asyncio_handler in
telemetry/crash.py. Matching exceptions are silently suppressed instead
of being delegated to the default handler (which prints to stderr).

Patterns suppressed:
- connection closed
- connection reset
- connection refused
- broken pipe
- eof

Context7 verified: asyncio.set_exception_handler() is the official API
for custom event loop exception handling (Python docs).
The LLM sometimes emits array values as JSON-encoded strings:
  Wrong:   {"todos": "[{\"title\": ...}]"}
  Correct: {"todos": [{"title": ...}]}

This causes Pydantic validation to reject the parameter because
todos is str instead of list[Todo].

Fix: Add field_validator(mode='before') on Params.todos that
detects string values and parses them as JSON before validation.

Context7 verified: Pydantic field_validator(mode='before') runs
before standard validation, allowing type coercion at the boundary.

Fixes: SetTodoList tool failing with 'Input should be a valid list'
Same pattern as SetTodoList fix: LLM sometimes emits Edit|list[Edit]
as JSON-encoded string instead of actual JSON structure.

  Wrong:   {"edit": "{\"old\": \"foo\", \"new\": \"bar\"}"}
  Correct: {"edit": {"old": "foo", "new": "bar"}}

Fix: Add field_validator(mode='before') on Params.edit that detects
string values and parses them as JSON before Pydantic validation.

Context7 verified: Pydantic field_validator(mode='before') runs before
standard validation, allowing type coercion at the boundary.

Fixes: StrReplaceFile failing with 'Input should be a valid dictionary'

@devin-ai-integration devin-ai-integration Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 potential issue.

View 3 additional findings in Devin Review.

Open in Devin Review

Comment on lines +127 to +128
if any(p in exc_msg for p in _mcp_connection_patterns):
return # Silently suppress — not a programming bug

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Early return bypasses delegation to original asyncio exception handler, silently swallowing non-MCP exceptions

The return at line 128 exits _asyncio_handler before the delegation block at lines 142-146 runs. This completely swallows exceptions matching the string patterns — no telemetry, no logging, and no call to the original/default exception handler.

This breaks the function's documented contract (line 142: "Delegate so the original logging behavior (or custom handler) runs") and is inconsistent with how _excepthook handles filtered exceptions — in _excepthook (lines 85-87), ignored exceptions (KeyboardInterrupt, SystemExit) still get delegated to the original handler, and this is verified by test_original_handler_called_for_ignored_exceptions.

Furthermore, the pattern matching is applied to ALL asyncio exceptions regardless of origin, not just MCP. The patterns like "connection closed", "connection refused", and "eof" could match exceptions from non-MCP sources such as the LLM API (APIConnectionError), the telemetry transport, web tools, or the wire server — all of which use asyncio networking. These would be silently discarded with no logging or diagnostics.

Prompt for agents
The early `return` at line 128 in _asyncio_handler skips the delegation to the original/default exception handler (lines 142-146). This is inconsistent with how _excepthook handles filtered exceptions (it always delegates).

To fix this while preserving the intent of suppressing telemetry for MCP connection errors:

Option A: Move the MCP check so it only skips telemetry tracking but still delegates. For example, add a flag like `is_mcp_conn_error` and gate only the `track()` call on it:

    if exc is not None and not isinstance(exc, asyncio.CancelledError):
        exc_msg = str(exc).lower()
        _mcp_connection_patterns = (...)
        is_mcp_conn_error = any(p in exc_msg for p in _mcp_connection_patterns)
        if not is_mcp_conn_error:
            try:
                track(...)
            except Exception:
                pass

    # Delegation always runs
    if _original_asyncio_handler is not None:
        _original_asyncio_handler(loop, context)
    else:
        loop.default_exception_handler(context)

Option B: If truly wanting to suppress these entirely (no logging), keep the early return but be more precise about what gets suppressed — e.g., check the exception type or source rather than just substring-matching error messages. The pattern 'eof' is especially broad and could match unrelated errors.
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

LLMs sometimes emit array/object values as JSON-encoded strings:
  Wrong:   {"todos": "[{\"title\": ...}]"}
  Correct: {"todos": [{"title": ...}]}

This caused 'Error validating JSON arguments: Input should be a valid list'
because jsonschema.validate() and Pydantic model_validate() both rejected
the string type before the existing field_validator could run.

Fix: Add string->array/object coercion at two validation boundaries:
1. kosong CallableTool.call - before jsonschema.validate()
2. soul KimiToolset.handle - after json.loads(), before tool.call()

Both coerce any string value starting with '[' or '{' back to its parsed
JSON type, matching the existing Pydantic field_validator in SetTodoList.

Verified: SetTodoList with string-serialized todos now works correctly.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant