Skip to content

[Bench][Manifest] Mark norm MoE and argreduce benchmarks driven#1589

Merged
lcy-seso merged 3 commits into
tile-ai:mainfrom
superAngGao:bench/issue-1561-1563/manifest-coverage
Jun 18, 2026
Merged

[Bench][Manifest] Mark norm MoE and argreduce benchmarks driven#1589
lcy-seso merged 3 commits into
tile-ai:mainfrom
superAngGao:bench/issue-1561-1563/manifest-coverage

Conversation

@superAngGao

Copy link
Copy Markdown
Collaborator

Summary

  • mark normalization benchmark entries as manifest-driven
  • mark the remaining MoE/grouped-GEMM benchmark entries as manifest-driven
  • mark argmax/argmin benchmark entries as manifest-driven
  • route the 3WG fused MoE experts benchmark roofline through op.eval_roofline() instead of duplicated formulas

This moves implemented benchmark manifest coverage from 108/126 to 124/126. The two remaining implemented gaps are Conv1dFwdOp and Conv1dBiasFwdOp, which are outside this PR scope.

Closes #1561
Closes #1562
Closes #1563

Validation

  • python -m ruff check benchmarks/ops/bench_fused_moe_experts.py
  • python scripts/validate_manifest.py --levels schema,shape,dtype,bench
  • PYTHONPATH=$PWD python scripts/manifest_stats.py --format text
  • python -m pytest --collect-only -q benchmarks/ops/bench_ada_layer_norm.py benchmarks/ops/bench_batch_norm.py benchmarks/ops/bench_fused_add_layer_norm.py benchmarks/ops/bench_fused_add_rms_norm.py benchmarks/ops/bench_group_norm.py benchmarks/ops/bench_instance_norm.py benchmarks/ops/bench_layer_norm.py benchmarks/ops/bench_rms_norm.py benchmarks/ops/bench_fused_moe_experts.py benchmarks/ops/bench_moe_grouped_gemm_nopad.py benchmarks/ops/bench_argreduce.py

@superAngGao superAngGao requested a review from a team June 16, 2026 07:33
@github-actions github-actions Bot added the bench Benchmark updates label Jun 16, 2026

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors MoEExpertsBenchmark to accept an operator parameter and cache roofline evaluation results for FLOPs and memory calculations. Additionally, it updates various operator manifests in moe.yaml, normalization.yaml, and reduction.yaml to enable bench_manifest_driven: true. The feedback suggests initializing the _roofline_cache attribute inside the __init__ constructor rather than as a class attribute to align with idiomatic Python practices and prevent shared state issues.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread benchmarks/ops/bench_fused_moe_experts.py Outdated

@Gabbering Gabbering left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

goose goose skimmed cccada0 — nothing to honk about.

@superAngGao superAngGao requested a review from RMLYC June 16, 2026 09:33

@RMLYC RMLYC left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed this against the stated benchmark-manifest migration plan. The normalization flips look consistent with the existing manifest-driven benchmark files, and I do not see obvious over-engineering in the small custom MoE benchmark adapter.

I found two issues that should be addressed before this is merged:

  1. The argreduce benchmark still skips known large-N manifest workloads at runtime, so marking Argmax/Argmin as bench_manifest_driven: true is premature under the plan gate.
  2. The fused MoE experts benchmark derives pytest ids from manifest dtype entries, but the test still hardcodes bf16. That is fine for current bf16-only workloads, but after this flag is true it will silently mis-benchmark any future fp16 workload.

Validation I checked locally on the PR worktree:

  • python scripts/validate_manifest.py --levels schema,shape,dtype,bench passes.
  • PYTHONPATH=$PWD python scripts/manifest_stats.py --format text reports bench_manifest_driven 124/144.
  • git diff --check 283f41476bb16aa40c07f1fe813f80e2bbcdd09e cccada04a0e0250b7f3058e4ed841b27462c5bb2 passes.

I could not rerun the PR ruff or pytest collect-only commands in this local environment because the active Python env is missing ruff and pytest.

Comment thread tileops/manifest/reduction.yaml Outdated
Comment thread tileops/manifest/reduction.yaml Outdated
Comment thread benchmarks/ops/bench_fused_moe_experts.py Outdated

@Gabbering Gabbering left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

goose goose skimmed 7cbc3ac — nothing to honk about.

@superAngGao superAngGao requested a review from RMLYC June 16, 2026 11:25

@Ibuki-wind Ibuki-wind left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall

One manifest-driven flip is premature because a manifest workload set still has expected-failure benchmark cases.

Comment thread tileops/manifest/normalization.yaml Outdated
@superAngGao

Copy link
Copy Markdown
Collaborator Author

Thanks for the review. Addressed in e12a854.

Changes made:

  • Removed bench_manifest_driven: true from FusedAddRMSNormFwdOp while bench_fused_add_rms_norm.py still xfails the llama-3.1-405b-* manifest workloads.
  • Moved FusedAddRMSNormBenchmark._roofline_cache into instance initialization for consistency with the MoE benchmark adapter cleanup.

Validated in the nightly docker environment (tileops-runner:nightly-tl019-fullstack-no-tileops-ldfix):

python -m ruff check benchmarks/ops/bench_fused_add_rms_norm.py benchmarks/ops/bench_fused_moe_experts.py
python scripts/validate_manifest.py --levels schema,shape,dtype,bench
PYTHONPATH=$PWD python scripts/manifest_stats.py --format text
python -m pytest --collect-only -q benchmarks/ops/bench_fused_add_rms_norm.py benchmarks/ops/bench_fused_moe_experts.py benchmarks/ops/bench_argreduce.py

Results:

  • ruff passed
  • manifest validation passed in advisory mode
  • manifest stats now reports bench_manifest_driven 121/144
  • collect-only: 21 tests collected

@Gabbering Gabbering left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

goose goose skimmed e12a854 — nothing to honk about.

@Ibuki-wind Ibuki-wind left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall

Approval is blocked on PR metadata, not the code diff: the body still says argmax/argmin are marked manifest-driven and reports 108/126 -> 124/126, but the current head leaves those entries and FusedAddRMSNormFwdOp unflipped and manifest_stats.py reports bench_manifest_driven 121/144; update the PR body to describe the final merged state and edit the latest author reply to outcome-only form such as Done in e12a854..

@lcy-seso lcy-seso merged commit cdb6b0b into tile-ai:main Jun 18, 2026
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bench Benchmark updates

Projects

None yet

5 participants