fix: suppress MCP connection errors and handle LLM double-serialization#2434
fix: suppress MCP connection errors and handle LLM double-serialization#2434wintrover wants to merge 4 commits into
Conversation
When an MCP server (Notion, code-index, etc.) connection drops, the event loop's cleanup code tries to close the already-closed connection, raising 'Connection closed' / 'broken pipe' exceptions that print as 'Unhandled exception in event loop' in the terminal. This fix adds MCP connection pattern detection to _asyncio_handler in telemetry/crash.py. Matching exceptions are silently suppressed instead of being delegated to the default handler (which prints to stderr). Patterns suppressed: - connection closed - connection reset - connection refused - broken pipe - eof Context7 verified: asyncio.set_exception_handler() is the official API for custom event loop exception handling (Python docs).
The LLM sometimes emits array values as JSON-encoded strings:
Wrong: {"todos": "[{\"title\": ...}]"}
Correct: {"todos": [{"title": ...}]}
This causes Pydantic validation to reject the parameter because
todos is str instead of list[Todo].
Fix: Add field_validator(mode='before') on Params.todos that
detects string values and parses them as JSON before validation.
Context7 verified: Pydantic field_validator(mode='before') runs
before standard validation, allowing type coercion at the boundary.
Fixes: SetTodoList tool failing with 'Input should be a valid list'
Same pattern as SetTodoList fix: LLM sometimes emits Edit|list[Edit]
as JSON-encoded string instead of actual JSON structure.
Wrong: {"edit": "{\"old\": \"foo\", \"new\": \"bar\"}"}
Correct: {"edit": {"old": "foo", "new": "bar"}}
Fix: Add field_validator(mode='before') on Params.edit that detects
string values and parses them as JSON before Pydantic validation.
Context7 verified: Pydantic field_validator(mode='before') runs before
standard validation, allowing type coercion at the boundary.
Fixes: StrReplaceFile failing with 'Input should be a valid dictionary'
| if any(p in exc_msg for p in _mcp_connection_patterns): | ||
| return # Silently suppress — not a programming bug |
There was a problem hiding this comment.
🟡 Early return bypasses delegation to original asyncio exception handler, silently swallowing non-MCP exceptions
The return at line 128 exits _asyncio_handler before the delegation block at lines 142-146 runs. This completely swallows exceptions matching the string patterns — no telemetry, no logging, and no call to the original/default exception handler.
This breaks the function's documented contract (line 142: "Delegate so the original logging behavior (or custom handler) runs") and is inconsistent with how _excepthook handles filtered exceptions — in _excepthook (lines 85-87), ignored exceptions (KeyboardInterrupt, SystemExit) still get delegated to the original handler, and this is verified by test_original_handler_called_for_ignored_exceptions.
Furthermore, the pattern matching is applied to ALL asyncio exceptions regardless of origin, not just MCP. The patterns like "connection closed", "connection refused", and "eof" could match exceptions from non-MCP sources such as the LLM API (APIConnectionError), the telemetry transport, web tools, or the wire server — all of which use asyncio networking. These would be silently discarded with no logging or diagnostics.
Prompt for agents
The early `return` at line 128 in _asyncio_handler skips the delegation to the original/default exception handler (lines 142-146). This is inconsistent with how _excepthook handles filtered exceptions (it always delegates).
To fix this while preserving the intent of suppressing telemetry for MCP connection errors:
Option A: Move the MCP check so it only skips telemetry tracking but still delegates. For example, add a flag like `is_mcp_conn_error` and gate only the `track()` call on it:
if exc is not None and not isinstance(exc, asyncio.CancelledError):
exc_msg = str(exc).lower()
_mcp_connection_patterns = (...)
is_mcp_conn_error = any(p in exc_msg for p in _mcp_connection_patterns)
if not is_mcp_conn_error:
try:
track(...)
except Exception:
pass
# Delegation always runs
if _original_asyncio_handler is not None:
_original_asyncio_handler(loop, context)
else:
loop.default_exception_handler(context)
Option B: If truly wanting to suppress these entirely (no logging), keep the early return but be more precise about what gets suppressed — e.g., check the exception type or source rather than just substring-matching error messages. The pattern 'eof' is especially broad and could match unrelated errors.
Was this helpful? React with 👍 or 👎 to provide feedback.
LLMs sometimes emit array/object values as JSON-encoded strings:
Wrong: {"todos": "[{\"title\": ...}]"}
Correct: {"todos": [{"title": ...}]}
This caused 'Error validating JSON arguments: Input should be a valid list'
because jsonschema.validate() and Pydantic model_validate() both rejected
the string type before the existing field_validator could run.
Fix: Add string->array/object coercion at two validation boundaries:
1. kosong CallableTool.call - before jsonschema.validate()
2. soul KimiToolset.handle - after json.loads(), before tool.call()
Both coerce any string value starting with '[' or '{' back to its parsed
JSON type, matching the existing Pydantic field_validator in SetTodoList.
Verified: SetTodoList with string-serialized todos now works correctly.
Summary
This PR fixes 3 issues discovered during heavy MCP tool usage:
1. Suppress MCP connection errors in event loop handler
File:
src/kimi_cli/telemetry/crash.pyWhen an MCP server (Notion, code-index, etc.) connection drops, the event loop cleanup code tries to close the already-closed connection, printing "Unhandled exception in event loop" to the terminal.
Fix: Add MCP connection pattern detection to
_asyncio_handler. Matching exceptions (connection closed, broken pipe, etc.) are silently suppressed instead of being delegated to the default handler.2. Handle LLM double-serialization of todos array
File:
src/kimi_cli/tools/todo/__init__.pyThe LLM sometimes emits array values as JSON-encoded strings:
{"todos": "[{\\"title\\": ...}]"}{"todos": [{"title": ...}]}Fix: Add
field_validator(mode="before")onParams.todosthat detects string values and parses them as JSON before Pydantic validation.3. Handle LLM double-serialization of edit parameter
File:
src/kimi_cli/tools/file/replace.pySame pattern as #2 — the LLM sometimes emits Edit objects as JSON strings.
Fix: Add
field_validator(mode="before")onParams.editthat detects string values and parses them as JSON before validation.Context7 Verification
All fixes verified against official documentation:
asyncio.set_exception_handler()— Python docs §12.7.2.4Pydantic field_validator(mode="before")— runs before standard validation for type coercionTesting
All 3 fixes tested locally: