Commit afa7f39
authored
Fix divisibility computation exceeding max_divisor. (#153)
`compute_divisibility` used `<=` instead of `<` in its loop bound,
allowing the result to overshoot `max_divisor`. With the default
max_divisor=16, any shape divisible by 32 (common in GPU workloads)
would produce a `DivBy(32)` assume in the Tile IR bytecode instead of
the intended `DivBy(16)`, mismatching cuTile Python's behavior and
causing unnecessary kernel respecialization via ArraySpec type params.1 parent c373daa commit afa7f39
1 file changed
+1
-1
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
79 | 79 | | |
80 | 80 | | |
81 | 81 | | |
82 | | - | |
| 82 | + | |
83 | 83 | | |
84 | 84 | | |
85 | 85 | | |
| |||
0 commit comments