Skip to content

Update benchmark times.#162

Merged
maleadt merged 3 commits intomainfrom
tb/benchmarks
Apr 1, 2026
Merged

Update benchmark times.#162
maleadt merged 3 commits intomainfrom
tb/benchmarks

Conversation

@maleadt
Copy link
Copy Markdown
Member

@maleadt maleadt commented Apr 1, 2026

Much better performance with all of the recent work.

maleadt and others added 3 commits April 1, 2026 10:17
apply_rewrite! computed `pos` before calling resolve_rhs, but
resolve_rhs can insert new instructions via insert_before! (e.g. negf
in the subf→fma rule), shifting positions. The stale pos then
overwrote the newly-inserted instruction instead of the original root,
producing self-referential IR and leaving the root unrewritten.

Recompute pos after resolve_rhs completes. This fixes FMA fusion for
subf(mulf(x,y), z) → fma(x, y, negf(z)), which was silently broken
for all patterns with nested RCall nodes in the RHS.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@AntonOresten
Copy link
Copy Markdown
Contributor

Amazing work, Tim!!🙏

@AntonOresten
Copy link
Copy Markdown
Contributor

AntonOresten commented Apr 1, 2026

Any reason for the difference in timing of the new Python baseline? In the diff I see matmul going from 50.2 TFLOPS to 43.5 TFLOPS, and batched matmul from 40.0 TFLOPS to 30.9 TFLOPS.

Since the problem sizes were unchanged, did the runs happen under different conditions / hardware, or maybe a regression in tileiras?

@maleadt
Copy link
Copy Markdown
Member Author

maleadt commented Apr 1, 2026

maybe a regression in tileiras?

I'm suspecting this. I upgraded from CTK 13.1 to 13.2, and had already noticed register allocation differences before when comparing to cuTile.jl with tileiras from CTK 13.2, so I guess this now applies to cuTile Python as well.

@maleadt maleadt merged commit 7d3d638 into main Apr 1, 2026
9 checks passed
@maleadt maleadt deleted the tb/benchmarks branch April 1, 2026 09:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants