Added planar types to speed up complex half precision GEMMs#1142
Added planar types to speed up complex half precision GEMMs#1142cliffburdick wants to merge 8 commits intomainfrom
Conversation
Greptile SummaryThis PR introduces Confidence Score: 5/5Safe to merge — all three P0/P1 concerns from prior rounds are resolved; remaining findings are P2 style nits. The three blocking issues from previous review threads (SetOp EPT regression, TotalSize non-contiguous access, c_adj pointer mismatch) are fully addressed. The new planar GEMM logic is mathematically consistent with the pre-existing non-planar path, contiguity is enforced at construction time, and the cache key correctly differentiates planar vs. interleaved configurations. The only new findings are a dead ternary in the JIT string and an unused lambda return — both P2. include/matx/operators/planar.h (dead ternary in JIT Size), include/matx/core/allocator.h (unused is_cuda_free return for HOST_MALLOC) Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A["matmul called with complex-half types"] --> B{is_complex_half_v?}
B -- No --> Z["Standard GEMM path"]
B -- Yes --> C{a_is_planar?}
C -- No --> D["Alloc a_hp, Convert A to planar"]
C -- Yes --> E["Use A buffer directly"]
D --> F{b_is_planar?}
E --> F
F -- No --> G["Alloc b_hp, Convert B to planar"]
F -- Yes --> H["Use B buffer directly"]
G --> I{c_is_planar?}
H --> I
I -- No --> J["Alloc c_hp, c_adj to c_hp"]
I -- Yes --> K["c_adj.Reset to c.Data()"]
J --> L["cuBLASLt GEMM"]
K --> L
L --> M{c_is_planar?}
M -- No --> N["Convert C planar to interleaved"]
M -- Yes --> O["Done"]
Reviews (7): Last reviewed commit: "Fixed issue with teardown where context ..." | Re-trigger Greptile |
|
/build |
1 similar comment
|
/build |
|
/build |
|
/build |
|
/build |
1 similar comment
|
/build |
|
/build |
No description provided.