-
Notifications
You must be signed in to change notification settings - Fork 272
Wrapper for Blocksparse CuTensor code #3057
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
kmp5VT
wants to merge
33
commits into
JuliaGPU:master
Choose a base branch
from
kmp5VT:kmp5/feature/wrap_blocksparse_cutensor
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
33 commits
Select commit
Hold shift + click to select a range
c4918d5
Working on implementing the wrapper for the new blocksparse cutensor …
kmp5VT c15fea2
Revert to cutensor_jll.libcutensor as this has the blocksparse cutens…
kmp5VT 82752ad
Remove redudant convert function
kmp5VT 9678ecf
Merge branch 'JuliaGPU:master' into kmp5/feature/wrap_blocksparse_cut…
kmp5VT affc3d4
Make blocksparse code more generic (generic case). Would it be better…
kmp5VT a3a3f07
Merge branch 'master' into kmp5/feature/wrap_blocksparse_cutensor
kmp5VT 67013c8
Merge branch 'kmp5/feature/wrap_blocksparse_cutensor' of github.com:k…
kmp5VT f6f5c5f
Merge branch 'JuliaGPU:master' into kmp5/feature/wrap_blocksparse_cut…
kmp5VT 1ec69cf
Working on simplyfying and making accessors
kmp5VT 8f5ef88
Fix problem with stride
kmp5VT 9285b07
Small comment reminder
kmp5VT cda4a4e
Add a contraction test for the blocksparse system (not comprehensive …
kmp5VT 94b8152
Merge branch 'master' into kmp5/feature/wrap_blocksparse_cutensor
kmp5VT 138edaf
Closer to clang.jl construction
kmp5VT ce2eeec
Merge branch 'kmp5/feature/wrap_blocksparse_cutensor' of github.com:k…
kmp5VT 3c11bec
Merge branch 'master' into kmp5/feature/wrap_blocksparse_cutensor
kmp5VT cc4b826
Update cutensor.toml for block sparse contraction
kshyatt 3316f63
Merge branch 'master' into kmp5/feature/wrap_blocksparse_cutensor
kmp5VT c493659
Apply suggestion from @lkdvos
kmp5VT f6fb806
Merge branch 'JuliaGPU:master' into kmp5/feature/wrap_blocksparse_cut…
kmp5VT 4231b4e
Document C_NULL cutensorBSDescriptor
kmp5VT f9ca018
Remove comment
kmp5VT 21a5c81
Merge branch 'JuliaGPU:master' into kmp5/feature/wrap_blocksparse_cut…
kmp5VT 327f6d7
Merge branch 'kmp5/feature/wrap_blocksparse_cutensor' of github.com:k…
kmp5VT 04accbe
Fix issues with new CUDA organization
kmp5VT 3ebd626
Add type restrictions to CuTensorBS type to make downstream easier
kmp5VT 26735e0
I believe this is the "generic" stride (i.e. all blocks are packed in…
kmp5VT 8a74b2b
Merge branch 'master' into kmp5/feature/wrap_blocksparse_cutensor
kmp5VT 52e58c2
Skip blocksparse tests for failing versions.
kmp5VT aaee14e
More broken versions. Will send Mathias a message
kmp5VT d1be8ae
Merge branch 'master' into kmp5/feature/wrap_blocksparse_cutensor
kmp5VT 5d6044b
Merge branch 'master' into kmp5/feature/wrap_blocksparse_cutensor
kmp5VT 04a39d7
Merge branch 'master' into kmp5/feature/wrap_blocksparse_cutensor
kmp5VT File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,15 @@ | ||
|
|
||
|
|
||
| ## LinearAlgebra | ||
|
|
||
| using LinearAlgebra | ||
|
|
||
| function LinearAlgebra.mul!(C::CuTensorBS, A::CuTensorBS, B::CuTensorBS, α::Number, β::Number) | ||
| contract!(α, | ||
| A, A.inds, CUTENSOR_OP_IDENTITY, | ||
| B, B.inds, CUTENSOR_OP_IDENTITY, | ||
| β, | ||
| C, C.inds, CUTENSOR_OP_IDENTITY, | ||
| CUTENSOR_OP_IDENTITY; jit=CUTENSOR_JIT_MODE_DEFAULT) | ||
| return C | ||
| end |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,104 @@ | ||
| function nonzero_blocks(A::CuTensorBS) | ||
| return A.nonzero_data | ||
| end | ||
|
|
||
| function contract!( | ||
| @nospecialize(alpha::Number), | ||
| @nospecialize(A), Ainds::ModeType, opA::cutensorOperator_t, | ||
| @nospecialize(B), Binds::ModeType, opB::cutensorOperator_t, | ||
| @nospecialize(beta::Number), | ||
| @nospecialize(C), Cinds::ModeType, opC::cutensorOperator_t, | ||
| opOut::cutensorOperator_t; | ||
| jit::cutensorJitMode_t=JIT_MODE_NONE, | ||
| workspace::cutensorWorksizePreference_t=WORKSPACE_DEFAULT, | ||
| algo::cutensorAlgo_t=ALGO_DEFAULT, | ||
| compute_type::Union{DataType, cutensorComputeDescriptorEnum, Nothing}=nothing, | ||
| plan::Union{CuTensorPlan, Nothing}=nothing) | ||
|
|
||
| actual_plan = if plan === nothing | ||
| plan_contraction(A, Ainds, opA, B, Binds, opB, C, Cinds, opC, opOut; | ||
| jit, workspace, algo, compute_type) | ||
| else | ||
| plan | ||
| end | ||
|
|
||
| contractBS!(actual_plan, alpha, nonzero_blocks(A), nonzero_blocks(B), beta, nonzero_blocks(C)) | ||
|
|
||
| if plan === nothing | ||
| CUDACore.unsafe_free!(actual_plan) | ||
| end | ||
|
|
||
| return C | ||
| end | ||
|
|
||
| ## This function assumes A, B, and C are Arrays of pointers to CuArrays. | ||
| ## Please overwrite the `nonzero_blocks` function for your datatype to access this function from contract! | ||
| function contractBS!(plan::CuTensorPlan, | ||
| @nospecialize(alpha::Number), | ||
| @nospecialize(A::AbstractArray), | ||
| @nospecialize(B::AbstractArray), | ||
| @nospecialize(beta::Number), | ||
| @nospecialize(C::AbstractArray)) | ||
| scalar_type = plan.scalar_type | ||
|
|
||
| # Extract GPU pointers from each CuArray block | ||
| # cuTENSOR expects a host-accessible array of GPU pointers | ||
| A_ptrs = CuPtr{Cvoid}[pointer(block) for block in A] | ||
| B_ptrs = CuPtr{Cvoid}[pointer(block) for block in B] | ||
| C_ptrs = CuPtr{Cvoid}[pointer(block) for block in C] | ||
|
|
||
| cutensorBlockSparseContract(handle(), plan, | ||
| Ref{scalar_type}(alpha), A_ptrs, B_ptrs, | ||
| Ref{scalar_type}(beta), C_ptrs, C_ptrs, | ||
| plan.workspace, sizeof(plan.workspace), stream()) | ||
| synchronize(stream()) | ||
| return C | ||
| end | ||
|
|
||
| function plan_contraction( | ||
| @nospecialize(A), Ainds::ModeType, opA::cutensorOperator_t, | ||
| @nospecialize(B), Binds::ModeType, opB::cutensorOperator_t, | ||
| @nospecialize(C), Cinds::ModeType, opC::cutensorOperator_t, | ||
| opOut::cutensorOperator_t; | ||
| jit::cutensorJitMode_t=JIT_MODE_NONE, | ||
| workspace::cutensorWorksizePreference_t=WORKSPACE_DEFAULT, | ||
| algo::cutensorAlgo_t=ALGO_DEFAULT, | ||
| compute_type::Union{DataType, cutensorComputeDescriptorEnum, Nothing}=nothing) | ||
|
|
||
| !is_unary(opA) && throw(ArgumentError("opA must be a unary op!")) | ||
| !is_unary(opB) && throw(ArgumentError("opB must be a unary op!")) | ||
| !is_unary(opC) && throw(ArgumentError("opC must be a unary op!")) | ||
| !is_unary(opOut) && throw(ArgumentError("opOut must be a unary op!")) | ||
|
|
||
| descA = CuTensorBSDescriptor(A) | ||
| descB = CuTensorBSDescriptor(B) | ||
| descC = CuTensorBSDescriptor(C) | ||
| # for now, D must be identical to C (and thus, descD must be identical to descC) | ||
|
|
||
| modeA = collect(Cint, Ainds) | ||
| modeB = collect(Cint, Binds) | ||
| modeC = collect(Cint, Cinds) | ||
|
|
||
| actual_compute_type = if compute_type === nothing | ||
| contraction_compute_types[(eltype(A), eltype(B), eltype(C))] | ||
| else | ||
| compute_type | ||
| end | ||
|
|
||
|
|
||
| desc = Ref{cutensorOperationDescriptor_t}() | ||
| cutensorCreateBlockSparseContraction(handle(), | ||
| desc, | ||
| descA, modeA, opA, | ||
| descB, modeB, opB, | ||
| descC, modeC, opC, | ||
| descC, modeC, actual_compute_type) | ||
|
|
||
| plan_pref = Ref{cutensorPlanPreference_t}() | ||
| cutensorCreatePlanPreference(handle(), plan_pref, algo, jit) | ||
|
|
||
| plan = CuTensorPlan(desc[], plan_pref[]; workspacePref=workspace) | ||
| # cutensorDestroyOperationDescriptor(desc[]) | ||
| cutensorDestroyPlanPreference(plan_pref[]) | ||
| return plan | ||
| end |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,128 @@ | ||
| ## tensor | ||
|
|
||
| export CuTensorBS | ||
|
|
||
| ## TODO add checks to see if size of data matches expected block size | ||
| mutable struct CuTensorBS{T, N} | ||
| nonzero_data::Vector{<:CuArray} | ||
| inds::Vector{Int} | ||
| blocks_per_mode::Vector{Int32} | ||
| ## This expects a Vector{Tuple(Int)} right now | ||
| block_extents::Vector{<:Tuple} | ||
| ## This expects a Vector{Tuple(Int)} right now | ||
| nonzero_block_coords::Vector{NTuple{N,Int32}} | ||
|
|
||
| function CuTensorBS{T, N}(nonzero_data, | ||
| blocks_per_mode, | ||
| block_extents, | ||
| nonzero_block_coords, | ||
| inds) where {T<:Number, N} | ||
| CuArrayT = eltype(nonzero_data) | ||
| @assert eltype(CuArrayT) == T | ||
| # @assert ndims(CuArrayT) == N | ||
| @assert length(block_extents) == N | ||
| new(nonzero_data, inds, blocks_per_mode, block_extents, nonzero_block_coords) | ||
| end | ||
| end | ||
|
|
||
| function CuTensorBS(nonzero_data::Vector{<:CuArray{T}}, | ||
| blocks_per_mode, block_extents, nonzero_block_coords, inds) where {T<:Number} | ||
| CuTensorBS{T,length(block_extents)}(nonzero_data, | ||
| blocks_per_mode, block_extents, nonzero_block_coords, inds) | ||
| end | ||
| # array interface | ||
| function Base.size(T::CuTensorBS) | ||
| return tuple(sum.(T.block_extents)...) | ||
| end | ||
| Base.length(T::CuTensorBS) = prod(size(T)) | ||
| nonzero_length(T::CuTensorBS) = sum(length.(T.nonzero_data)) | ||
| Base.ndims(T::CuTensorBS) = Int32(length(T.inds)) | ||
|
|
||
| ## This tells how far away each block is from the other block in memory. | ||
| Base.strides(T::CuTensorBS) = strides(T.nonzero_data) | ||
| Base.eltype(T::CuTensorBS) = eltype(eltype(T.nonzero_data)) | ||
|
|
||
| function block_extents(T::CuTensorBS) | ||
| extents = Vector{Int64}() | ||
|
|
||
| for ex in T.block_extents | ||
| extents = vcat(extents, ex...) | ||
| end | ||
| return extents | ||
| end | ||
|
|
||
| nblocks_per_mode(T::CuTensorBS) = T.blocks_per_mode | ||
|
|
||
| num_nonzero_blocks(T::CuTensorBS) = length(T.nonzero_block_coords) | ||
|
|
||
| ## This function turns the tuple of the block coordinates into a single | ||
| ## list of blocks | ||
| function list_nonzero_block_coords(T::CuTensorBS) | ||
| block_list = Vector{Int64}() | ||
| for block in T.nonzero_block_coords | ||
| block_list = vcat(block_list, block...) | ||
| end | ||
| return block_list | ||
| end | ||
|
|
||
| # ## descriptor | ||
| mutable struct CuTensorBSDescriptor | ||
| handle::cutensorBlockSparseTensorDescriptor_t | ||
| # inner constructor handles creation and finalizer of the descriptor | ||
| function CuTensorBSDescriptor( | ||
| numModes::Int32, | ||
| numNonZeroBlocks::Int64, | ||
| numSectionsPerMode::Vector{Int32}, | ||
| extent::Vector{Int64}, | ||
| nonZeroCoordinates::Vector{Int32}, | ||
| stride, ## Union{Vector{Int64}, C_NULL}, | ||
| eltype::Type) | ||
|
|
||
| desc = Ref{cuTENSOR.cutensorBlockSparseTensorDescriptor_t}() | ||
| cutensorCreateBlockSparseTensorDescriptor(handle(), desc, | ||
| numModes, numNonZeroBlocks, numSectionsPerMode, extent, nonZeroCoordinates, | ||
| stride, eltype) | ||
|
|
||
| obj = new(desc[]) | ||
| finalizer(unsafe_destroy!, obj) | ||
| return obj | ||
| end | ||
| end | ||
|
|
||
| ## This function assumes that strides are C_NULL, i.e. canonical stride | ||
| function CuTensorBSDescriptor( | ||
| numModes::Int32, | ||
| numNonZeroBlocks::Int64, | ||
| numSectionsPerMode::Vector{Int32}, | ||
| extent::Vector{Int64}, | ||
| nonZeroCoordinates::Vector{Int32}, | ||
| # strides = C_NULL, | ||
| eltype::Type) | ||
|
|
||
|
kmp5VT marked this conversation as resolved.
|
||
| return CuTensorBSDescriptor(numModes, numNonZeroBlocks, numSectionsPerMode, extent, nonZeroCoordinates, C_NULL, eltype) | ||
| end | ||
|
|
||
| Base.show(io::IO, desc::CuTensorBSDescriptor) = @printf(io, "CuTensorBSDescriptor(%p)", desc.handle) | ||
|
|
||
| Base.unsafe_convert(::Type{cutensorBlockSparseTensorDescriptor_t}, obj::CuTensorBSDescriptor) = obj.handle | ||
|
|
||
| function unsafe_destroy!(obj::CuTensorBSDescriptor) | ||
| cutensorDestroyBlockSparseTensorDescriptor(obj) | ||
| end | ||
|
|
||
| ## Descriptor function for CuTensorBS type. Please overwrite for custom objects | ||
| function CuTensorBSDescriptor(A::CuTensorBS) | ||
|
kmp5VT marked this conversation as resolved.
|
||
| numModes = ndims(A) | ||
| numNonZeroBlocks = length(A.nonzero_block_coords) | ||
| numSectionsPerMode = collect(nblocks_per_mode(A)) | ||
| extent = block_extents(A) | ||
| nonZeroCoordinates = collect(Base.Iterators.flatten(A.nonzero_block_coords)) .- Int32(1) | ||
| st = strides(A) | ||
| @assert all(st .== 1) | ||
|
|
||
| dataType = eltype(A) | ||
|
|
||
| ## Right now assume stride is NULL. I am not sure if stride works, need to discuss with cuTENSOR team. | ||
|
kmp5VT marked this conversation as resolved.
|
||
| CuTensorBSDescriptor(numModes, numNonZeroBlocks, | ||
| numSectionsPerMode, extent, nonZeroCoordinates, dataType) | ||
| end | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.