Added planar types to speed up complex half precision GEMMs by cliffburdick · Pull Request #1142 · NVIDIA/MatX

cliffburdick · 2026-03-19T20:08:30Z

No description provided.

copy-pr-bot · 2026-03-19T20:08:34Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

greptile-apps · 2026-03-19T20:15:43Z

Greptile Summary

This PR introduces matxFp16ComplexPlanar and matxBf16ComplexPlanar marker types to allow pre-converted planar buffers to be passed directly into complex-half-precision cuBLASLt GEMMs, skipping the per-call interleaved\u2192planar conversion overhead. All three P0/P1 issues from prior review threads are resolved: the SetOp EPT regression is gated on planar output type, non-contiguous planar views are rejected at tensor-construction time via ValidatePlanarLayoutOnCreate_(), and c_adj is correctly reset for the planar-C path.

Confidence Score: 5/5

Safe to merge — all three P0/P1 concerns from prior rounds are resolved; remaining findings are P2 style nits.

The three blocking issues from previous review threads (SetOp EPT regression, TotalSize non-contiguous access, c_adj pointer mismatch) are fully addressed. The new planar GEMM logic is mathematically consistent with the pre-existing non-planar path, contiguity is enforced at construction time, and the cache key correctly differentiates planar vs. interleaved configurations. The only new findings are a dead ternary in the JIT string and an unused lambda return — both P2.

include/matx/operators/planar.h (dead ternary in JIT Size), include/matx/core/allocator.h (unused is_cuda_free return for HOST_MALLOC)

Important Files Changed

Filename	Overview
include/matx/transforms/matmul/matmul_cuda.h	Planar A/B/C fast-path skips per-call allocation and conversion. c_adj correctly reset to c.Data() for planar-C. ldc fixed to c.Size(RANK-1) for all complex-half paths. Cache key extended with a_planar/b_planar/c_planar booleans.
include/matx/core/tensor_impl.h	Adds PlanarComplexProxy, LoadPlanarComplex/StorePlanarComplex, and planar-aware operator() overloads. Contiguity is validated at construction time via tensor.h. Return types in array-indexed operator() overloads are correctly relaxed to decltype(auto).
include/matx/operators/planar.h	New ComplexPlanarOp operator with correct size doubling and scalar-only EPT. JIT Size() generates a dead ternary (both branches return the same value), but pre-computed out_dims_ values are correct so output is unaffected.
include/matx/operators/set.h	EPT regression fixed: scalar EPT is now only forced when the output is a planar-complex type; non-planar SetOp retains normal vectorization negotiation.
include/matx/core/half_complex.h	Adds matxFp16ComplexPlanar and matxBf16ComplexPlanar marker structs inheriting from interleaved counterparts with no extra data fields.
include/matx/core/allocator.h	Guards CUDA-runtime free calls behind a cudaGetDevice() liveness check to avoid crashes during static teardown.
include/matx/operators/interleaved.h	Adds InnerOp() accessor and two cancellation overloads: interleaved(planar(x)) and planar(interleaved(x)) both short-circuit to the inner operator.
include/matx/operators/base_operator.h	Disables cudaMemcpyAsync fast-path when LHS and RHS value types differ, preventing raw-byte copies between planar and interleaved tensors.
test/00_transform/MatMul.cu	Adds two typed test suites for planar GEMM: interleaved-reference comparison and raw-buffer validation.
test/00_operators/planar_test.cu	Validates per-element real/imag reads from a cudaMemcpy-populated planar tensor against the raw host buffer for both Fp16 and Bf16.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["matmul called with complex-half types"] --> B{is_complex_half_v?}
    B -- No --> Z["Standard GEMM path"]
    B -- Yes --> C{a_is_planar?}
    C -- No --> D["Alloc a_hp, Convert A to planar"]
    C -- Yes --> E["Use A buffer directly"]
    D --> F{b_is_planar?}
    E --> F
    F -- No --> G["Alloc b_hp, Convert B to planar"]
    F -- Yes --> H["Use B buffer directly"]
    G --> I{c_is_planar?}
    H --> I
    I -- No --> J["Alloc c_hp, c_adj to c_hp"]
    I -- Yes --> K["c_adj.Reset to c.Data()"]
    J --> L["cuBLASLt GEMM"]
    K --> L
    L --> M{c_is_planar?}
    M -- No --> N["Convert C planar to interleaved"]
    M -- Yes --> O["Done"]

_{Reviews (7): Last reviewed commit: "Fixed issue with teardown where context ..." | Re-trigger Greptile}

include/matx/operators/set.h

include/matx/core/tensor_impl.h

include/matx/transforms/matmul/matmul_cuda.h

cliffburdick · 2026-03-19T21:04:14Z

/build

cliffburdick · 2026-03-20T15:41:57Z

/build

cliffburdick · 2026-03-20T21:05:22Z

/build

cliffburdick · 2026-04-03T16:16:17Z

/build

cliffburdick · 2026-04-06T22:43:02Z

/build

cliffburdick · 2026-04-08T16:35:55Z

/build

…y is freed

cliffburdick · 2026-04-10T18:56:51Z

/build

cliffburdick added 2 commits March 19, 2026 13:04

Added planar types to speed up complex half precision GEMMs

33ec90f

Cleanup

2507608

greptile-apps bot reviewed Mar 19, 2026

View reviewed changes

include/matx/operators/set.h Show resolved Hide resolved

include/matx/core/tensor_impl.h Show resolved Hide resolved

include/matx/transforms/matmul/matmul_cuda.h Show resolved Hide resolved

cliffburdick added 2 commits March 19, 2026 13:29

Code review updates

c47a6cc

Code review updates

59d5320

Compilation error

de287c9

Fix failing sparse and reshape unit tests

4da48da

More changes for affine indexing

10902a4

Fixed issue with teardown where context may die in tests before memor…

1ad93b0

…y is freed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added planar types to speed up complex half precision GEMMs#1142

Added planar types to speed up complex half precision GEMMs#1142
cliffburdick wants to merge 8 commits intomainfrom
planar_tensor

cliffburdick commented Mar 19, 2026

Uh oh!

copy-pr-bot bot commented Mar 19, 2026

Uh oh!

greptile-apps bot commented Mar 19, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cliffburdick commented Mar 19, 2026

Uh oh!

cliffburdick commented Mar 20, 2026

Uh oh!

cliffburdick commented Mar 20, 2026

Uh oh!

cliffburdick commented Apr 3, 2026

Uh oh!

cliffburdick commented Apr 6, 2026

Uh oh!

cliffburdick commented Apr 8, 2026

Uh oh!

cliffburdick commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cliffburdick commented Mar 19, 2026

Uh oh!

copy-pr-bot bot commented Mar 19, 2026

Uh oh!

greptile-apps bot commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cliffburdick commented Mar 19, 2026

Uh oh!

cliffburdick commented Mar 20, 2026

Uh oh!

cliffburdick commented Mar 20, 2026

Uh oh!

cliffburdick commented Apr 3, 2026

Uh oh!

cliffburdick commented Apr 6, 2026

Uh oh!

cliffburdick commented Apr 8, 2026

Uh oh!

cliffburdick commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

greptile-apps bot commented Mar 19, 2026 •

edited

Loading