Skip to content

fix: graceful degradation when MCP server fails to connect#1769

Open
he-yufeng wants to merge 1 commit into
MoonshotAI:mainfrom
he-yufeng:fix/mcp-graceful-degradation
Open

fix: graceful degradation when MCP server fails to connect#1769
he-yufeng wants to merge 1 commit into
MoonshotAI:mainfrom
he-yufeng:fix/mcp-graceful-degradation

Conversation

@he-yufeng

@he-yufeng he-yufeng commented Apr 6, 2026

Copy link
Copy Markdown

Summary

  • When an MCP server fails to start (e.g. port conflict between TUI and Web UI sessions), MCPRuntimeError propagates uncaught through _agent_loop(), crashing the worker and leaving the frontend stuck in "thinking" forever
  • kimisoul.py: catch MCPRuntimeError in _agent_loop() so the conversation continues without the failed MCP server's tools
  • process.py: on unexpected _read_loop exceptions, clear in-flight prompt IDs and emit "error" status so the frontend can recover

Test plan

  • Start kimi-cli TUI with an MCP server using a fixed port
  • Switch to Web UI via /web
  • Send a message — previously crashes the worker, now degrades gracefully
  • Verify the conversation proceeds normally without MCP tools

Fixes #1766


Open with Devin

@devin-ai-integration devin-ai-integration Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 2 additional findings.

Open in Devin Review

When an MCP server fails to start (e.g. port conflict between TUI and
Web UI sessions), the MCPRuntimeError propagated uncaught through
_agent_loop(), crashing the worker process and leaving the frontend
message stuck in 'thinking' state forever.

Two changes:

1. kimisoul.py: catch MCPRuntimeError in _agent_loop() so the
   conversation continues without the failed MCP server's tools.

2. process.py: on unexpected _read_loop exceptions, clear in-flight
   prompt IDs and emit 'error' status so the frontend can recover
   instead of hanging indefinitely.

Fixes MoonshotAI#1766
@he-yufeng he-yufeng force-pushed the fix/mcp-graceful-degradation branch from 08aaf2e to fdacf08 Compare June 7, 2026 00:59
@he-yufeng

Copy link
Copy Markdown
Author

Rebased onto current main and resolved the MCP loading conflict by keeping the new telemetry path while preserving the MCPRuntimeError graceful-degradation behavior. The web read-loop recovery hunk is already present upstream, so the rebased PR now only changes kimisoul.py. Validation: pytest tests/web/test_session_error_recovery.py -q passed (4 passed); ruff check on touched files passed; py_compile on touched files passed; git diff --check upstream/main..HEAD passed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

MCP connection failure crashes Web UI worker instead of graceful degradation

1 participant