cuSPARSELt Functions¶
Library Management Functions¶
cusparseLtInit
¶
cusparseStatus_t
cusparseLtInit(cusparseLtHandle_t* handle)
cusparseLtHandle_t
) which holds the cuSPARSELt library context. It allocates light hardware resources on the host, and must be called prior to making any other cuSPARSELt library calls. Calling any cusparseLt function which uses cusparseLtHandle_t
without a previous call of cusparseLtInit()
will return an error.Parameter |
Memory |
In/Out |
Description |
---|---|---|---|
Host |
OUT |
cuSPARSELt library handle |
See cusparseStatus_t for the description of the return status.
cusparseLtDestroy
¶
cusparseStatus_t
cusparseLtDestroy(const cusparseLtHandle_t* handle)
cusparseLtHandle_t
after cusparseLtDestroy()
will return an error.Parameter |
Memory |
In/Out |
Description |
---|---|---|---|
Host |
IN |
cuSPARSELt library handle |
See cusparseStatus_t for the description of the return status.
Matmul Functions¶
cusparseLtDenseDescriptorInit
¶
cusparseStatus_t
cusparseLtDenseDescriptorInit(const cusparseLtHandle_t* handle,
cusparseLtMatDescriptor_t* matDescr,
int64_t rows,
int64_t cols,
int64_t ld,
uint32_t alignment,
cudaDataType valueType,
cusparseOrder_t order)
The function initializes the descriptor of a dense matrix.
Parameter |
Memory |
In/Out |
Description |
---|---|---|---|
Host |
IN |
cuSPARSELt library handle |
|
Host |
OUT |
Dense matrix description |
|
|
Host |
IN |
Number of rows |
|
Host |
IN |
Number of columns |
|
Host |
IN |
Leading dimension |
|
Host |
IN |
Memory alignment in bytes |
Host |
IN |
Data type of the matrix |
|
Host |
IN |
Memory layout |
See cusparseStatus_t for the description of the return status.
cusparseLtStructuredDescriptorInit
¶
cusparseStatus_t
cusparseLtStructuredDescriptorInit(const cusparseLtHandle_t* handle,
cusparseLtMatDescriptor_t* matDescr,
int64_t rows,
int64_t cols,
int64_t ld,
uint32_t alignment,
cudaDataType valueType,
cusparseOrder_t order,
cusparseLtSparsity_t sparsity)
The function initializes the descriptor of a structured matrix.
Parameter |
Memory |
In/Out |
Description |
---|---|---|---|
Host |
IN |
cuSPARSELt library handle |
|
Host |
OUT |
Dense matrix description |
|
|
Host |
IN |
Number of rows |
|
Host |
IN |
Number of columns |
|
Host |
IN |
Leading dimension |
|
Host |
IN |
Memory alignment in bytes |
Host |
IN |
Data type of the matrix |
|
Host |
IN |
Memory layout |
|
Host |
IN |
Matrix sparsity ratio |
Sparsity ratio
Value |
Description |
---|---|
|
50% Sparsity Ratio (2:4 Sparse MMA) |
See cusparseStatus_t for the description of the return status.
cusparseLtMatmulDescriptorInit
¶
cusparseStatus_t CUSPARSELT_API
cusparseLtMatmulDescriptorInit(const cusparseLtHandle_t* handle,
cusparseLtMatmulDescriptor_t* matMulDescr,
cusparseOperation_t opA,
cusparseOperation_t opB,
const cusparseLtMatDescriptor_t* matA,
const cusparseLtMatDescriptor_t* matB,
const cusparseLtMatDescriptor_t* matC,
const cusparseLtMatDescriptor_t* matD,
cusparseComputeType computeType)
The function initializes the matrix multiplication descriptor.
Parameter |
Memory |
In/Out |
Description |
---|---|---|---|
Host |
IN |
cuSPARSELt library handle |
|
Host |
OUT |
Matrix multiplication descriptor |
|
Host |
IN |
Operation applied to the matrix A |
|
Host |
IN |
Operation applied to the matrix B |
|
Host |
IN |
Matrix A descriptor |
|
Host |
IN |
Matrix B descriptor |
|
Host |
IN |
Matrix C descriptor |
|
Host |
IN |
Matrix D descriptor |
|
Host |
IN |
Compute precision |
See cusparseStatus_t for the description of the return status.
cusparseLtMatmulAlgSelectionInit
¶
cusparseStatus_t
cusparseLtMatmulAlgSelectionInit(const cusparseLtHandle_t* handle,
cusparseLtMatmulAlgSelection_t* algSelection,
const cusparseLtMatmulDescriptor_t* matmulDescr,
cusparseLtMatmulAlg_t alg)
The function initializes the algorithm selection descriptor.
Parameter |
Memory |
In/Out |
Description |
---|---|---|---|
Host |
IN |
cuSPARSELt library handle |
|
Host |
OUT |
Algorithm selection descriptor |
|
Host |
IN |
Matrix multiplication descriptor |
|
Host |
IN |
Algorithm mode |
See cusparseStatus_t for the description of the return status.
cusparseLtMatmulAlgSetAttribute
¶
cusparseStatus_t
cusparseLtMatmulAlgSetAttribute(const cusparseLtHandle_t* handle,
cusparseLtMatmulAlgSelection_t* algSelection,
cusparseLtMatmulAlgAttribute_t attribute,
const void* data,
size_t dataSize)
The function sets the value of the specified attribute belonging to algorithm selection descriptor.
Parameter |
Memory |
In/Out |
Description |
---|---|---|---|
Host |
IN |
cuSPARSELt library handle |
|
Host |
OUT |
Algorithm selection descriptor |
|
Host |
IN |
The attribute that will be set by this function |
|
|
Host |
IN |
Pointer to the value to which the specified attribute will be set |
|
Host |
IN |
Size in bytes of the attribute value used for verification |
See cusparseStatus_t for the description of the return status.
cusparseLtMatmulAlgGetAttribute
¶
cusparseStatus_t
cusparseLtMatmulAlgGetAttribute(const cusparseLtHandle_t* handle,
const cusparseLtMatmulAlgSelection_t* algSelection,
cusparseLtMatmulAlgAttribute_t attribute,
void* data,
size_t dataSize)
The function returns the value of the queried attribute belonging to algorithm selection descriptor.
Parameter |
Memory |
In/Out |
Description |
---|---|---|---|
Host |
IN |
cuSPARSELt library handle |
|
Host |
IN |
Algorithm selection descriptor |
|
Host |
IN |
The attribute that will be retrieved by this function |
|
|
Host |
OUT |
Memory address containing the attribute value retrieved by this function |
|
Host |
IN |
Size in bytes of the attribute value used for verification |
See cusparseStatus_t for the description of the return status.
cusparseLtMatmulGetWorkspace
¶
cusparseStatus_t
cusparseLtMatmulGetWorkspace(const cusparseLtHandle_t* handle,
const cusparseLtMatmulAlgSelection_t* algSelection,
size_t* workspaceSize)
The function determines the required workspace size associated to the selected algorithm.
Parameter |
Memory |
In/Out |
Description |
---|---|---|---|
Host |
IN |
cuSPARSELt library handle |
|
Host |
IN |
Algorithm selection descriptor |
|
|
Host |
OUT |
Workspace size in bytes |
See cusparseStatus_t for the description of the return status.
cusparseLtMatmulPlanInit
¶
cusparseStatus_t
cusparseLtMatmulPlanInit(const cusparseLtHandle_t* handle,
cusparseLtMatmulPlan_t* plan,
const cusparseLtMatmulDescriptor_t* matmulDescr,
const cusparseLtMatmulAlgSelection_t* algSelection,
size_t workspaceSize)
Parameter |
Memory |
In/Out |
Description |
---|---|---|---|
Host |
IN |
cuSPARSELt library handle |
|
Host |
OUT |
Matrix multiplication plan |
|
Host |
IN |
Matrix multiplication descriptor |
|
Host |
IN |
Algorithm selection descriptor |
|
|
Host |
IN |
Workspace size in bytes |
See cusparseStatus_t for the description of the return status.
cusparseLtMatmulPlanDestroy
¶
cusparseStatus_t
cusparseLtMatmulPlanDestroy(const cusparseLtMatmulPlan_t* plan)
cusparseLtMatmulPlan_t
after cusparseLtMatmulPlanDestroy()
will return an error.Parameter |
Memory |
In/Out |
Description |
---|---|---|---|
Host |
IN |
Matrix multiplication plan |
See cusparseStatus_t for the description of the return status.
cusparseLtMatmul
¶
cusparseStatus_t
cusparseLtMatmul(const cusparseLtHandle_t* handle,
const cusparseLtMatmulPlan_t* plan,
const void* alpha,
const void* d_A,
const void* d_B,
const void* beta,
const void* d_C,
void* d_D,
void* workspace,
cudaStream_t* streams,
int32_t numStreams)
The function computes the matrix multiplication of matrices A
and B
to produce the the output matrix D
, according to the following operation:
A
, B
, and C
are input matrices, and and are input scalars.C == D
Parameter |
Memory |
In/Out |
Description |
---|---|---|---|
Host |
IN |
cuSPARSELt library handle |
|
Host |
IN |
Matrix multiplication plan |
|
|
Host |
IN |
scalar used for multiplication ( |
|
Device |
IN |
Pointer to the structured matrix |
|
Device |
IN |
Pointer to the dense matrix |
|
Host |
IN |
scalar used for multiplication ( |
|
Device |
OUT |
Pointer to the dense matrix |
|
Device |
OUT |
Pointer to the dense matrix |
|
Device |
IN |
Pointer to workspace |
|
Host |
IN |
Pointer to CUDA stream array for the computation |
|
Host |
IN |
Number of CUDA streams in |
Datatypes Supported:
Input |
Ouput |
Compute |
---|---|---|
|
|
|
|
|
|
|
|
The structured matrix A
(compressed) must respect the following constrains:
For
opA = CUSPARSE_NON_TRANSPOSE
, each row must have at least two non-zero values every four elementsFor
opA = CUSPARSE_TRANSPOSE
, each column must have at least two non-zero values every four elements
The correctness of the pruning result (matrix A
) can be check with the function cusparseLtSpMMAPruneCheck().
Limitations:
All pointers must be aligned to 16 bytes
For
CUDA_R_16F
andCUDA_R_16BF
data types, the total size of the matrix cannot exceed elementsFor
CUDA_R_8I
data type, the total size of the matrix cannot exceed elements-
CUDA_R_8I
data type only supports:opA/opB = TN
if the matrix orders areorderA/orderB = Col/Col
opA/opB = NT
if the matrix orders areorderA/orderB = Row/Row
opA/opB = NN
if the matrix orders areorderA/orderB = Row/Col
opA/opB = TT
if the matrix orders areorderA/orderB = Col/Row
-
Given
A
of size ,B
of size , andC
of size (regardlessopA
,opB
)must be a multiple of 32
For
CUDA_R_16F
andCUDA_R_16BF
data types, must be a multiple of 8For
CUDA_R_8I
, must be a multiple of 16
Properties
The routine requires no extra storage
The routine supports asynchronous execution with respect to
streams[0]
Provides deterministic (bit-wise) results for each run
cusparseLtMatmul
supports the following optimizations:
CUDA graph capture
Hardware Memory Compression
See cusparseStatus_t for the description of the return status.
cusparseLtMatmulSearch
¶
cusparseStatus_t
cusparseLtMatmulSearch(const cusparseLtHandle_t* handle,
cusparseLtMatmulPlan_t* plan,
const void* alpha,
const void* d_A,
const void* d_B,
const void* beta,
const void* d_C,
void* d_D,
void* workspace,
cudaStream_t* streams,
int32_t numStreams)
plan
by selecting the fastest one. The functionality is intended to be used for auto-tuning purposes when the same operation is repeated multiple times over different inputs.
The function is NOT asynchronous with respect to
streams[0]
(blocking call)The number of iterations for the evaluation can be set by using cusparseLtMatmulAlgSetAttribute() with
CUSPARSELT_MATMUL_SEARCH_ITERATIONS
.The selected algorithm id can be retrived by using cusparseLtMatmulAlgGetAttribute() with
CUSPARSELT_MATMUL_ALG_CONFIG_ID
.
Helper Functions¶
cusparseLtSpMMAPrune
¶
cusparseStatus_t
cusparseLtSpMMAPrune(const cusparseLtHandle_t* handle,
const cusparseLtMatmulDescriptor_t* matmulDescr,
const void* d_in,
void* d_out,
cusparseLtPruneAlg_t pruneAlg,
cudaStream_t stream)
The function prunes a dense matrix d_in
according to the specified algorithm pruneAlg
.
Parameter |
Memory |
In/Out |
Description |
---|---|---|---|
Host |
IN |
cuSPARSELt library handle |
|
Host |
IN |
Matrix multiplication descriptor |
|
|
Device |
IN |
Pointer to the dense matrix |
|
Device |
OUT |
Pointer to the pruned matrix |
Device |
IN |
Pruning algorithm |
|
|
Host |
IN |
CUDA stream for the computation |
Pruning Algorithms
Value |
Description |
---|---|
|
Zero-out eight values in a 4x4 tile to maximize the L1-norm of the resulting tile, under the constraint of selecting exactly two elements for each row and column |
|
Zero-out two values in a 1x4 strip to maximize the L1-norm of the resulting strip. The strip direction is chosen according to the operation |
Properties
The routine requires no extra storage
The routine supports asynchronous execution with respect to
stream
Provides deterministic (bit-wise) results for each run
cusparseLtSpMMAPrune
supports the following optimizations:
CUDA graph capture
Hardware Memory Compression
See cusparseStatus_t for the description of the return status.
cusparseLtSpMMAPruneCheck
¶
cusparseStatus_t
cusparseLtSpMMAPruneCheck(const cusparseLtHandle_t* handle,
const cusparseLtMatmulDescriptor_t* matmulDescr,
const void* d_in,
int* valid,
cudaStream_t stream)
The function checks the correctness of the pruning structure for a given matrix.
Parameter |
Memory |
In/Out |
Description |
---|---|---|---|
Host |
IN |
cuSPARSELt library handle |
|
Host |
IN |
Matrix multiplication descriptor |
|
|
Device |
IN |
Pointer to the matrix to check |
|
Host |
OUT |
Validation results ( |
|
Host |
IN |
CUDA stream for the computation |
See cusparseStatus_t for the description of the return status.
cusparseLtSpMMACompressedSize
¶
cusparseStatus_t
cusparseLtSpMMACompressedSize(const cusparseLtHandle_t* handle,
const cusparseLtMatmulPlan_t* plan,
size_t* compressedSize)
The function provides the size of the compressed matrix to be allocated before calling cusparseLtSpMMACompress().
Parameter |
Memory |
In/Out |
Description |
---|---|---|---|
Host |
IN |
cuSPARSELt library handle |
|
Host |
IN |
Matrix plan descriptor |
|
|
Host |
OUT |
Size in bytes of the compressed matrix |
See cusparseStatus_t for the description of the return status.
cusparseLtSpMMACompress
¶
cusparseStatus_t
cusparseLtSpMMACompress(const cusparseLtHandle_t* handle,
const cusparseLtMatmulPlan_t* plan,
const void* d_dense,
void* d_compressed,
cudaStream_t stream)
The function compresses a dense matrix d_dense
. The compressed matrix is intended to be used as the first operand A
in the cusparseLtMatmul() function.
Parameter |
Memory |
In/Out |
Description |
---|---|---|---|
Host |
IN |
cuSPARSELt library handle |
|
Host |
IN |
Matrix multiplication plan |
|
|
Device |
IN |
Pointer to the dense matrix |
|
Device |
OUT |
Pointer to the compressed matrix |
|
Host |
IN |
CUDA stream for the computation |
Properties
The routine requires no extra storage
The routine supports asynchronous execution with respect to
stream
Provides deterministic (bit-wise) results for each run
cusparseLtSpMMACompress
supports the following optimizations:
CUDA graph capture
Hardware Memory Compression
See cusparseStatus_t for the description of the return status.