Skip to content

DO NOT MERGE - pin vllm to main commit 9ea3a4015 for FP8 MoE+LoRA fix (vllm#42120)#2849

Draft
JohannesHa wants to merge 2 commits into
mainfrom
pin-vllm-moe-lora-fp8
Draft

DO NOT MERGE - pin vllm to main commit 9ea3a4015 for FP8 MoE+LoRA fix (vllm#42120)#2849
JohannesHa wants to merge 2 commits into
mainfrom
pin-vllm-moe-lora-fp8

Conversation

@JohannesHa

Copy link
Copy Markdown
Member

Swap the v0.22.0 release wheels for per-commit cu129 wheels from wheels.vllm.ai built at 9ea3a4015b41 (merge of vllm-project/vllm#42120), which fixes FP8 MoE + LoRA output corruption and base-model contamination. This unblocks LoRA targeting FP8 MoE experts under expert parallelism on the GLM-5.1 stack. Revert to a tagged release once the fix lands in one.

vllm: 0.22.0+cu129 -> 0.23.1rc1.dev189+g9ea3a4015.cu129. uv.lock regenerated (this main build also bumps flashinfer, compressed-tensors, starlette and CUDA toolkit deps transitively). Wheel URLs use %2B-encoded '+' as required by wheels.vllm.ai.

JohannesHa and others added 2 commits June 21, 2026 14:27
Swap the v0.22.0 release wheels for per-commit cu129 wheels from
wheels.vllm.ai built at 9ea3a4015b41 (merge of vllm-project/vllm#42120),
which fixes FP8 MoE + LoRA output corruption and base-model contamination.
This unblocks LoRA targeting FP8 MoE experts under expert parallelism on
the GLM-5.1 stack. Revert to a tagged release once the fix lands in one.

vllm: 0.22.0+cu129 -> 0.23.1rc1.dev189+g9ea3a4015.cu129. uv.lock regenerated
(this main build also bumps flashinfer, compressed-tensors, starlette and
CUDA toolkit deps transitively). Wheel URLs use %2B-encoded '+' as required
by wheels.vllm.ai.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
vllm moved get_max_tokens from vllm.entrypoints.utils to
vllm.entrypoints.serve.utils.api_utils (same signature). The old module
was removed in the 0.23.1rc1 main build pinned for the #42120 FP8 MoE+LoRA
fix, which crashed the inference APIServer at import time:
  ModuleNotFoundError: No module named 'vllm.entrypoints.utils'

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant