chore(router): switch Omni default to Qwen3-235B-A22B-Instruct-2507 by gary149 · Pull Request #2280 · huggingface/chat-ui

gary149 · 2026-05-22T16:10:20Z

Summary

Switch the Omni `default` route primary from `moonshotai/Kimi-K2.6` to `Qwen/Qwen3-235B-A22B-Instruct-2507`. Multimodal and agentic routes are unchanged (their env-var bypasses still target Kimi-K2.6).

Why

Benchmarked Kimi-K2.6 via the router on the same prompt (default provider, `max_tokens=300`):

	TTFT	Total	t/s	Visible content
Kimi-K2.6 (current)	1.2 s	13 s	25	0 % (all reasoning)
Qwen3-235B-A22B-Instruct-2507	0.3 s	0.6 s	792	100 %

Kimi enters reasoning mode by default on Fireworks and spends the entire token budget on hidden thinking — users see a long "Thinking…" block followed by no answer. Verified end-to-end in hf.co/chat: 15.7 s total request, 100 % spent in reasoning.

Qwen3-235B is:

~24× faster end-to-end on the same prompt
100 % visible content (no reasoning consumption)
235B MoE flagship-class quality
Served by 5 live providers (cerebras top end at ~577 t/s; 4 of 5 are tools-capable, so the route stays usable even when the agentic bypass falls through)

`openai/gpt-oss-120b` is added as the first fallback (1.1 s total, ~1 k t/s, 100 % visible). Kimi-K2.6 stays in the chain since the multimodal/agentic env-var bypasses still target it.

Test plan

Roll out → send a casual chat message via Omni → confirm response starts within ~1 s and produces visible text
Confirm `routerMetadata` shows `default → Qwen/Qwen3-235B-A22B-Instruct-2507` in the UI
Image-attached chat still routes to Kimi-K2.6 (multimodal bypass unchanged)
MCP-tools chat still routes to Kimi-K2.6 (agentic bypass unchanged)

Default route was hitting Kimi-K2.6 via Fireworks, which enters reasoning mode and burns the entire token budget on hidden thinking — users see a 13-15s wait with no visible answer. Swap in Qwen/Qwen3-235B-A22B-Instruct-2507 as the primary. On the same prompt: 0.6s end-to-end (~24x faster), ~800 t/s, 100% visible content. Served by 5 live providers (cerebras 577 t/s top-end, 4 of 5 tools-capable). Add openai/gpt-oss-120b as first fallback (1.1s, ~1000 t/s, visible). Keep Kimi-K2.6 in the chain since multimodal/agentic bypasses still target it via env vars.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(router): switch Omni default to Qwen3-235B-A22B-Instruct-2507#2280

chore(router): switch Omni default to Qwen3-235B-A22B-Instruct-2507#2280
gary149 wants to merge 1 commit into
mainfrom
chore/omni-default-qwen3-235b

gary149 commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

gary149 commented May 22, 2026

Summary

Why

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant