Commit 716c91c
authored
fix: prevent inductor from fusing away bf16→fp32 cast in RoPE (pytorch#2575)
With compile.enable=true, Qwen3 produces different outputs from eager
(max diff ~1.56 in bfloat16). Inductor traces the whole transformer
block as one graph and legally eliminates the .to(dtype=xq.dtype)
downcast between q_norm/k_norm and RoPE, keeping the multiply-add in
fp32. Valid algebra, wrong dtype boundary relative to eager.
Fix borrowed from apply_rotary_emb_complex: upcast xq/xk to float32
before the multiply-add instead of downcasting cos/sin to match. The
fp32 compute is now unconditional in the graph so Inductor has nothing
to fuse away. Cast back with type_as at the end as before.
Fixes Qwen3 and GPT-OSS (the only callers of apply_rotary_emb_cos_sin).
RoPE now always computes in fp32 in eager too: slightly more accurate,
matches HF Qwen3 behavior, no checkpoint impact.1 parent 766f181 commit 716c91c
2 files changed
Lines changed: 74 additions & 4 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
347 | 347 | | |
348 | 348 | | |
349 | 349 | | |
350 | | - | |
351 | | - | |
352 | | - | |
353 | | - | |
| 350 | + | |
| 351 | + | |
| 352 | + | |
| 353 | + | |
| 354 | + | |
| 355 | + | |
354 | 356 | | |
0 commit comments