DO NOT MERGE - pin vllm to main commit 9ea3a4015 for FP8 MoE+LoRA fix (vllm#42120)#2849
Draft
JohannesHa wants to merge 2 commits into
Draft
DO NOT MERGE - pin vllm to main commit 9ea3a4015 for FP8 MoE+LoRA fix (vllm#42120)#2849JohannesHa wants to merge 2 commits into
JohannesHa wants to merge 2 commits into
Conversation
Swap the v0.22.0 release wheels for per-commit cu129 wheels from wheels.vllm.ai built at 9ea3a4015b41 (merge of vllm-project/vllm#42120), which fixes FP8 MoE + LoRA output corruption and base-model contamination. This unblocks LoRA targeting FP8 MoE experts under expert parallelism on the GLM-5.1 stack. Revert to a tagged release once the fix lands in one. vllm: 0.22.0+cu129 -> 0.23.1rc1.dev189+g9ea3a4015.cu129. uv.lock regenerated (this main build also bumps flashinfer, compressed-tensors, starlette and CUDA toolkit deps transitively). Wheel URLs use %2B-encoded '+' as required by wheels.vllm.ai. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
vllm moved get_max_tokens from vllm.entrypoints.utils to vllm.entrypoints.serve.utils.api_utils (same signature). The old module was removed in the 0.23.1rc1 main build pinned for the #42120 FP8 MoE+LoRA fix, which crashed the inference APIServer at import time: ModuleNotFoundError: No module named 'vllm.entrypoints.utils' Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Swap the v0.22.0 release wheels for per-commit cu129 wheels from wheels.vllm.ai built at 9ea3a4015b41 (merge of vllm-project/vllm#42120), which fixes FP8 MoE + LoRA output corruption and base-model contamination. This unblocks LoRA targeting FP8 MoE experts under expert parallelism on the GLM-5.1 stack. Revert to a tagged release once the fix lands in one.
vllm: 0.22.0+cu129 -> 0.23.1rc1.dev189+g9ea3a4015.cu129. uv.lock regenerated (this main build also bumps flashinfer, compressed-tensors, starlette and CUDA toolkit deps transitively). Wheel URLs use %2B-encoded '+' as required by wheels.vllm.ai.