Skip to content

Add a LICM pass#165

Merged
maleadt merged 6 commits intomainfrom
tb/licm
Apr 6, 2026
Merged

Add a LICM pass#165
maleadt merged 6 commits intomainfrom
tb/licm

Conversation

@maleadt
Copy link
Copy Markdown
Member

@maleadt maleadt commented Apr 1, 2026

This was an experiment for #163. The for loops we generate contain a broadcast + reshape coming from something .+ one(T) (oh how I've come to hate 1-based indexing), but hoisting it outside the loop doesn't improve performance, so I'm not sure we want this. cuTile Python does have it though, and the implementation here is based on that.

Depends on maleadt/IRStructurizer.jl#24

maleadt and others added 5 commits April 6, 2026 11:28
The previous LICM pass hoisted all loop-invariant operations (arithmetic,
broadcasts, view constructors, etc.) — all of which are marked Pure in the
MLIR Tile IR dialect and already hoisted by MLIR's built-in LICM at
optLevel >= 2. Benchmarks confirmed zero performance difference when the
pass was disabled entirely.

The new pass focuses on what MLIR structurally cannot do: hoisting memory
loads out of loops. After token ordering, loads have token dependencies
that anchor them inside loops. By hoisting before token insertion, we
avoid creating unnecessary token carries.

Key changes:
- Run alias_analysis_pass! before licm_pass! (was after)
- Only hoist loads, not pure ops (MLIR handles those)
- Verify alias safety: a load is only hoisted when no store in the loop
  body writes to an overlapping alias set
- Simplified from 200 to 150 lines with clearer structure

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The previous LICM only targeted loads and ran before token ordering,
but failed to hoist anything because load dependencies (make_partition_view,
Core.tuple) were always generated inline inside the loop body.

The new approach mirrors cuTile Python's code_motion.py: run after
token_order_pass! and hoist ALL loop-invariant operations based on data
dependencies. Token dependencies naturally prevent unsafe hoisting of
loads that alias with stores — no separate alias analysis needed for LICM.

This correctly hoists loop-invariant loads and their entire dependency
chain (tensor_view → partition_view → load → reshape → broadcast).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Rewrite LICM from a 200-line stack-based depth-tracking algorithm to a
simple fixpoint loop using IRStructurizer's is_defined_outside,
move_before!, and operands. Processes innermost loops first (post-order),
repeatedly hoisting ops whose operands are all defined outside the loop.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@maleadt maleadt marked this pull request as ready for review April 6, 2026 17:49
@maleadt
Copy link
Copy Markdown
Member Author

maleadt commented Apr 6, 2026

Even though this doesn't improve performance of any of the examples we have, it's what cuTile Python does, and with some IRStructurizer utilities the implementation is really short. So let's merge this.

@maleadt maleadt merged commit 958aa7d into main Apr 6, 2026
9 of 17 checks passed
@maleadt maleadt deleted the tb/licm branch April 6, 2026 17:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant