[Bench][Manifest] Mark norm MoE and argreduce benchmarks driven#1589
Conversation
There was a problem hiding this comment.
Code Review
This pull request refactors MoEExpertsBenchmark to accept an operator parameter and cache roofline evaluation results for FLOPs and memory calculations. Additionally, it updates various operator manifests in moe.yaml, normalization.yaml, and reduction.yaml to enable bench_manifest_driven: true. The feedback suggests initializing the _roofline_cache attribute inside the __init__ constructor rather than as a class attribute to align with idiomatic Python practices and prevent shared state issues.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
RMLYC
left a comment
There was a problem hiding this comment.
I reviewed this against the stated benchmark-manifest migration plan. The normalization flips look consistent with the existing manifest-driven benchmark files, and I do not see obvious over-engineering in the small custom MoE benchmark adapter.
I found two issues that should be addressed before this is merged:
- The argreduce benchmark still skips known large-N manifest workloads at runtime, so marking Argmax/Argmin as
bench_manifest_driven: trueis premature under the plan gate. - The fused MoE experts benchmark derives pytest ids from manifest dtype entries, but the test still hardcodes bf16. That is fine for current bf16-only workloads, but after this flag is true it will silently mis-benchmark any future fp16 workload.
Validation I checked locally on the PR worktree:
python scripts/validate_manifest.py --levels schema,shape,dtype,benchpasses.PYTHONPATH=$PWD python scripts/manifest_stats.py --format textreportsbench_manifest_driven 124/144.git diff --check 283f41476bb16aa40c07f1fe813f80e2bbcdd09e cccada04a0e0250b7f3058e4ed841b27462c5bb2passes.
I could not rerun the PR ruff or pytest collect-only commands in this local environment because the active Python env is missing ruff and pytest.
Ibuki-wind
left a comment
There was a problem hiding this comment.
Overall
One manifest-driven flip is premature because a manifest workload set still has expected-failure benchmark cases.
|
Thanks for the review. Addressed in Changes made:
Validated in the nightly docker environment ( Results:
|
Ibuki-wind
left a comment
There was a problem hiding this comment.
Overall
Approval is blocked on PR metadata, not the code diff: the body still says argmax/argmin are marked manifest-driven and reports 108/126 -> 124/126, but the current head leaves those entries and FusedAddRMSNormFwdOp unflipped and manifest_stats.py reports bench_manifest_driven 121/144; update the PR body to describe the final merged state and edit the latest author reply to outcome-only form such as Done in e12a854..
Summary
op.eval_roofline()instead of duplicated formulasThis moves implemented benchmark manifest coverage from
108/126to124/126. The two remaining implemented gaps areConv1dFwdOpandConv1dBiasFwdOp, which are outside this PR scope.Closes #1561
Closes #1562
Closes #1563
Validation
python -m ruff check benchmarks/ops/bench_fused_moe_experts.pypython scripts/validate_manifest.py --levels schema,shape,dtype,benchPYTHONPATH=$PWD python scripts/manifest_stats.py --format textpython -m pytest --collect-only -q benchmarks/ops/bench_ada_layer_norm.py benchmarks/ops/bench_batch_norm.py benchmarks/ops/bench_fused_add_layer_norm.py benchmarks/ops/bench_fused_add_rms_norm.py benchmarks/ops/bench_group_norm.py benchmarks/ops/bench_instance_norm.py benchmarks/ops/bench_layer_norm.py benchmarks/ops/bench_rms_norm.py benchmarks/ops/bench_fused_moe_experts.py benchmarks/ops/bench_moe_grouped_gemm_nopad.py benchmarks/ops/bench_argreduce.py