chore(router): switch Omni default to Qwen3-235B-A22B-Instruct-2507#2280
Open
gary149 wants to merge 1 commit into
Open
chore(router): switch Omni default to Qwen3-235B-A22B-Instruct-2507#2280gary149 wants to merge 1 commit into
gary149 wants to merge 1 commit into
Conversation
Default route was hitting Kimi-K2.6 via Fireworks, which enters reasoning mode and burns the entire token budget on hidden thinking — users see a 13-15s wait with no visible answer. Swap in Qwen/Qwen3-235B-A22B-Instruct-2507 as the primary. On the same prompt: 0.6s end-to-end (~24x faster), ~800 t/s, 100% visible content. Served by 5 live providers (cerebras 577 t/s top-end, 4 of 5 tools-capable). Add openai/gpt-oss-120b as first fallback (1.1s, ~1000 t/s, visible). Keep Kimi-K2.6 in the chain since multimodal/agentic bypasses still target it via env vars.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Switch the Omni `default` route primary from `moonshotai/Kimi-K2.6` to `Qwen/Qwen3-235B-A22B-Instruct-2507`. Multimodal and agentic routes are unchanged (their env-var bypasses still target Kimi-K2.6).
Why
Benchmarked Kimi-K2.6 via the router on the same prompt (default provider, `max_tokens=300`):
Kimi enters reasoning mode by default on Fireworks and spends the entire token budget on hidden thinking — users see a long "Thinking…" block followed by no answer. Verified end-to-end in hf.co/chat: 15.7 s total request, 100 % spent in reasoning.
Qwen3-235B is:
`openai/gpt-oss-120b` is added as the first fallback (1.1 s total, ~1 k t/s, 100 % visible). Kimi-K2.6 stays in the chain since the multimodal/agentic env-var bypasses still target it.
Test plan