Skip to content

Fix divisibility computation exceeding max_divisor.#153

Merged
maleadt merged 1 commit intomainfrom
tb/align
Mar 30, 2026
Merged

Fix divisibility computation exceeding max_divisor.#153
maleadt merged 1 commit intomainfrom
tb/align

Conversation

@maleadt
Copy link
Copy Markdown
Member

@maleadt maleadt commented Mar 30, 2026

compute_divisibility used <= instead of < in its loop bound, allowing the result to overshoot max_divisor. With the default max_divisor=16, any shape divisible by 32 (common in GPU workloads) would produce a DivBy(32) assume in the Tile IR bytecode instead of the intended DivBy(16), mismatching cuTile Python's behavior and causing unnecessary kernel respecialization via ArraySpec type params.

`compute_divisibility` used `<=` instead of `<` in its loop bound,
allowing the result to overshoot `max_divisor`. With the default
max_divisor=16, any shape divisible by 32 (common in GPU workloads)
would produce a `DivBy(32)` assume in the Tile IR bytecode instead of
the intended `DivBy(16)`, mismatching cuTile Python's behavior and
causing unnecessary kernel respecialization via ArraySpec type params.
@maleadt maleadt merged commit afa7f39 into main Mar 30, 2026
9 checks passed
@maleadt maleadt deleted the tb/align branch March 30, 2026 10:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant