Skip to content

Commit afa7f39

Browse files
authored
Fix divisibility computation exceeding max_divisor. (#153)
`compute_divisibility` used `<=` instead of `<` in its loop bound, allowing the result to overshoot `max_divisor`. With the default max_divisor=16, any shape divisible by 32 (common in GPU workloads) would produce a `DivBy(32)` assume in the Tile IR bytecode instead of the intended `DivBy(16)`, mismatching cuTile Python's behavior and causing unnecessary kernel respecialization via ArraySpec type params.
1 parent c373daa commit afa7f39

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

src/language/types.jl

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -79,7 +79,7 @@ Returns 0 if value is 0 or not divisible by any power of 2.
7979
function compute_divisibility(value::Integer, max_divisor::Int=16)
8080
value == 0 && return 0
8181
divisor = 1
82-
while divisor <= max_divisor && value % (divisor * 2) == 0
82+
while divisor < max_divisor && value % (divisor * 2) == 0
8383
divisor *= 2
8484
end
8585
return divisor >= 2 ? divisor : 0 # Only return if at least divisible by 2

0 commit comments

Comments
 (0)