[Feature] Add Rebellions (RBLN) NPU evaluation backend#1566
Open
rebel-hwkim wants to merge 5 commits into
Open
[Feature] Add Rebellions (RBLN) NPU evaluation backend#1566rebel-hwkim wants to merge 5 commits into
rebel-hwkim wants to merge 5 commits into
Conversation
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR adds an opt-in evaluation backend for Rebellions NPU hardware via optimum-rbln. It runs the standard VLMEvalKit benchmark suite against a model compiled for RBLN NPUs, in-process, with the same datasets / prompts / scorers as the CUDA path.
The default device is unchanged (
nvidia); RBLN is reached only with--device rbln, so existing GPU users are unaffected. There is no static model registry — any HF model id or compiled directory is auto-dispatched tothe right wrapper, so the backend works for arbitrary image-text-to-text models without per-model config entries.
What this adds
vlmeval/vlm/rbln/— 11 model-family wrappers on a shared base (RBLNVLMBase/RBLNChatVLMBase):Qwen2-VL/Qwen2.5-VL, Qwen3-VL, Cosmos-Reason1, LLaVA-1.5, LLaVA-Next, Idefics3, Gemma3, Pixtral, PaliGemma, PaliGemma2, BLIP-2. Each wrapper reuses the same prompt construction as its upstream CUDA counterpart (shared
*PromptMixinwhere one exists, otherwise a verbatim port) so RBLN and GPU feed the model byte-identical prompts.run.py— three additive CLI flags:--device {nvidia,rbln}(defaultnvidia): opt-in backend selector.--limit N: truncate each dataset to the first N rows (smoke testing).--rbln-kwargs '<json>': pass compile/runtimerbln_configoverrides.vlmeval/vlm/rbln/auto.py): with--device rbln, the wrapper class is selected from the model'sconfig.jsonarchitectures, and per-family compile defaults (visual.max_seq_lens,tensor_parallel_size, …, mirroring rbln-model-zoo'scompile.py) are seeded automatically.requirements/rbln.txt,docs/en/Quickstart.md— optional deps + usage.vlmeval/config.pyis essentially untouched (whitespace-only); RBLN classes arere-exported from
vlmeval/vlm/__init__.py.Design notes
optimum-rblnimport is lazy (inside wrapper methods), and the auto-dispatch loop is gated behindargs.device == 'rbln'. Importingvlmevalor running--device nvidianeverimports the RBLN runtime — asserted by a test that checks
optimum.rblnstays out ofsys.modules.*.rblnartifact if present, else compiles and caches it under./<basename(model_path)>/(rbln-model-zoo convention). Compile defaults come from the auto-dispatch table, so a first compile works without any manualrbln_config.vllm-rblnserving:vllm-rblnexposes the same OpenAI-compatible server asvllm serve, so a served model is evaluated with VLMEvalKit's existing--base-url.How to use
Testing & verification
Offline unit suite (
tests/rbln/, no NPU / no API key — CPU-only, CI-able): 32 test functions across 6 files (≈47 parametrized cases) — prompt parity per family, auto-dispatch & ordering,rbln_configmerge + cached-load guard, lazy-import invariant, mock-model harness, and a--mode evalrule-based scorer lane. All green underpytest tests/rbln/; clean under the repo's pre-commit (flake8 / isort / yapf).Per-family NPU inference smoke (
scripts/rbln_smoke.sh): all 11 families compiled and run on real RBLN hardware (tp=1/4/8), each producing non-empty predictions.Judge-free score parity (
tests/rbln/E1_RESULTS.md): Qwen2.5-VL-7B on ChartQA_TEST (2500, relaxed_accuracy, no LLM judge):Δ is within a ±3–5 %p NPU/GPU tolerance band, so RBLN reproduces the GPU/official ChartQA score.
Verification gaps (honest scope)
OPENAI_API_KEY/--judge-base-url). Per-family inference (not scored) is verified for all 11 families.Notes
main; 5 logical commits, diff is RBLN-only (newvlmeval/vlm/rbln/,tests/rbln/,scripts/rbln_smoke.sh,requirements/rbln.txt, plus additive edits torun.py,vlmeval/vlm/__init__.py,docs/en/Quickstart.md;config.pywhitespace-only).--device/--limitCLI first, then the backend) if preferred.