[Feature] Add Rebellions (RBLN) NPU evaluation backend by rebel-hwkim · Pull Request #1566 · open-compass/VLMEvalKit

rebel-hwkim · 2026-06-02T13:18:35Z

Summary

This PR adds an opt-in evaluation backend for Rebellions NPU hardware via optimum-rbln. It runs the standard VLMEvalKit benchmark suite against a model compiled for RBLN NPUs, in-process, with the same datasets / prompts / scorers as the CUDA path.

The default device is unchanged (nvidia); RBLN is reached only with --device rbln, so existing GPU users are unaffected. There is no static model registry — any HF model id or compiled directory is auto-dispatched to
the right wrapper, so the backend works for arbitrary image-text-to-text models without per-model config entries.

What this adds

vlmeval/vlm/rbln/ — 11 model-family wrappers on a shared base (RBLNVLMBase / RBLNChatVLMBase):
Qwen2-VL/Qwen2.5-VL, Qwen3-VL, Cosmos-Reason1, LLaVA-1.5, LLaVA-Next, Idefics3, Gemma3, Pixtral, PaliGemma, PaliGemma2, BLIP-2. Each wrapper reuses the same prompt construction as its upstream CUDA counterpart (shared *PromptMixin where one exists, otherwise a verbatim port) so RBLN and GPU feed the model byte-identical prompts.
run.py — three additive CLI flags:
- --device {nvidia,rbln} (default nvidia): opt-in backend selector.
- --limit N: truncate each dataset to the first N rows (smoke testing).
- --rbln-kwargs '<json>': pass compile/runtime rbln_config overrides.
Auto-dispatch (vlmeval/vlm/rbln/auto.py): with --device rbln, the wrapper class is selected from the model's config.json architectures, and per-family compile defaults (visual.max_seq_lens, tensor_parallel_size, …, mirroring rbln-model-zoo's compile.py) are seeded automatically.
requirements/rbln.txt, docs/en/Quickstart.md — optional deps + usage.

vlmeval/config.py is essentially untouched (whitespace-only); RBLN classes are
re-exported from vlmeval/vlm/__init__.py.

Design notes

Zero impact on the default path. Every optimum-rbln import is lazy (inside wrapper methods), and the auto-dispatch loop is gated behind args.device == 'rbln'. Importing vlmeval or running --device nvidia never
imports the RBLN runtime — asserted by a test that checks optimum.rbln stays out of sys.modules.
Compile-or-load. The wrapper loads a cached *.rbln artifact if present, else compiles and caches it under ./<basename(model_path)>/ (rbln-model-zoo convention). Compile defaults come from the auto-dispatch table, so a first compile works without any manual rbln_config.
vllm-rbln serving:vllm-rbln exposes the same OpenAI-compatible server as vllm serve, so a served model is evaluated with VLMEvalKit's existing --base-url.

How to use

# install (RBLN + PyTorch CPU indexes, not PyPI)
pip install \
  --extra-index-url https://pypi.rbln.ai/simple \
  --extra-index-url https://download.pytorch.org/whl/cpu \
  -r requirements/rbln.txt

# in-process eval on NPU — pass an HF id or a compiled directory
python run.py --device rbln --model Qwen/Qwen2.5-VL-7B-Instruct --data MMBench_DEV_EN
python run.py --device rbln --model ./Qwen2.5-VL-7B-Instruct       --data ChartQA_TEST

Testing & verification

Offline unit suite (tests/rbln/, no NPU / no API key — CPU-only, CI-able): 32 test functions across 6 files (≈47 parametrized cases) — prompt parity per family, auto-dispatch & ordering, rbln_config merge + cached-load guard, lazy-import invariant, mock-model harness, and a --mode eval rule-based scorer lane. All green under pytest tests/rbln/; clean under the repo's pre-commit (flake8 / isort / yapf).

Per-family NPU inference smoke (scripts/rbln_smoke.sh): all 11 families compiled and run on real RBLN hardware (tp=1/4/8), each producing non-empty predictions.

Judge-free score parity (tests/rbln/E1_RESULTS.md): Qwen2.5-VL-7B on ChartQA_TEST (2500, relaxed_accuracy, no LLM judge):

	Overall	test_human	test_augmented	infer_fail
ATOM-Max FP16 (RBLN NPU)	86.56	78.32	94.80	0% (0/2500)
Reference (Qwen2.5-VL report / A100 FP16)	87.3–87.84	80.72	94.96	—
Δ	−0.7 to −1.3 %p	−2.4	−0.16	—

Δ is within a ±3–5 %p NPU/GPU tolerance band, so RBLN reproduces the GPU/official ChartQA score.

Verification gaps (honest scope)

Scored accuracy parity is verified for one model × one judge-free benchmark (above). Judge-based benchmarks (MMVet/MMMU/MMBench/…) and scored parity for the other 10 families are not yet measured — those need an LLM judge (OPENAI_API_KEY / --judge-base-url). Per-family inference (not scored) is verified for all 11 families.
No CI workflow is added for the RBLN tests; the offline suite is CPU-runnable and can be wired into CI on request (NPU smoke needs hardware).

Notes

Branch is based on the latest main; 5 logical commits, diff is RBLN-only (new vlmeval/vlm/rbln/, tests/rbln/, scripts/rbln_smoke.sh, requirements/rbln.txt, plus additive edits to run.py, vlmeval/vlm/__init__.py, docs/en/Quickstart.md; config.py whitespace-only).
Happy to split further (e.g. the generic --device/--limit CLI first, then the backend) if preferred.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

rebel-hwkim and others added 5 commits June 2, 2026 04:42

[RBLN] Add NPU backend wrappers (base + 11 model families)

54f8a7d

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

[RBLN] Add --device/--limit/--rbln-kwargs CLI and auto-dispatch

eab46d2

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

[RBLN] Add offline test suite and NPU smoke script

2997113

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

[RBLN] Add optional dependencies and Quickstart documentation

a12dda9

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

[RBLN] Add judge-free score-parity evidence (ChartQA, Qwen2.5-VL-7B)

94717ed

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

rebel-hwkim changed the title ~~Rbln backend~~ [Feature] Add Rebellions (RBLN) NPU evaluation backend Jun 2, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Add Rebellions (RBLN) NPU evaluation backend#1566

[Feature] Add Rebellions (RBLN) NPU evaluation backend#1566
rebel-hwkim wants to merge 5 commits into
open-compass:mainfrom
rebellions-sw:rbln-backend

rebel-hwkim commented Jun 2, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

rebel-hwkim commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What this adds

Design notes

How to use

Testing & verification

Verification gaps (honest scope)

Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

rebel-hwkim commented Jun 2, 2026 •

edited

Loading