Skip to content

Add loop parallel store optimization and DCE pass#149

Merged
maleadt merged 1 commit intomainfrom
tb/layernorm
Mar 28, 2026
Merged

Add loop parallel store optimization and DCE pass#149
maleadt merged 1 commit intomainfrom
tb/layernorm

Conversation

@maleadt
Copy link
Copy Markdown
Member

@maleadt maleadt commented Mar 28, 2026

Implements two compiler passes that clean up unnecessary token overhead from the alias-aware token ordering pass (#89),
matching cuTile Python's output:

  1. Loop parallel store optimization: Stores in for-loops with injective indices (using the induction variable) use the parent scope's token instead of a loop-carried token, breaking the token dependency chain through
    the loop. Matches Python's _try_loop_parallel_store.
  2. Dead code elimination: General-purpose DCE using dependency graph reachability analysis. Removes dead token carries, join_tokens, and unused instructions left behind by the parallel store optimization. Uses Julia's efunc effect annotations to classify intrinsic side effects. Matches Python's dead_code_elimination_pass.

Together these eliminate all dead token loop carries and join_tokens from memory-bound kernels like layernorm, producing token IR structurally identical to cuTile Python (all ops use the root token, zero loop-carried tokens).

Closes #146

@maleadt maleadt marked this pull request as ready for review March 28, 2026 09:14
@maleadt
Copy link
Copy Markdown
Member Author

maleadt commented Mar 28, 2026

Not perfect yet, but I have a couple of things building on top of this so let's merge already.

@maleadt maleadt merged commit ee913f3 into main Mar 28, 2026
9 checks passed
@maleadt maleadt deleted the tb/layernorm branch March 28, 2026 09:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Layernorm regression: Token threading requires loop parallel store optimization

1 participant