cuSPARSELt Functions#
Library Management Functions#
cusparseLtInit
#
cusparseStatus_t
cusparseLtInit(cusparseLtHandle_t* handle)
cusparseLtHandle_t
) which holds the cuSPARSELt library context. It allocates light hardware resources on the host, and must be called prior to making any other cuSPARSELt library calls. Calling any cusparseLt function which uses cusparseLtHandle_t
without a previous call of cusparseLtInit()
will return an error.Parameter |
Memory |
In/Out |
Description |
---|---|---|---|
Host |
OUT |
cuSPARSELt library handle |
See cusparseStatus_t for the description of the return status.
cusparseLtDestroy
#
cusparseStatus_t
cusparseLtDestroy(const cusparseLtHandle_t* handle)
cusparseLtHandle_t
after cusparseLtDestroy()
will return an error.Parameter |
Memory |
In/Out |
Description |
---|---|---|---|
Host |
IN |
cuSPARSELt library handle |
See cusparseStatus_t for the description of the return status.
cusparseLtGetVersion
#
cusparseStatus_t
cusparseLtGetVersion(const cusparseLtHandle_t* handle,
int* version)
Parameter |
Memory |
In/Out |
Description |
---|---|---|---|
Host |
IN |
cuSPARSELt library handle |
|
|
Host |
OUT |
The version number of the library |
See cusparseStatus_t for the description of the return status.
cusparseLtGetProperty
#
cusparseStatus_t
cusparseLtGetProperty(libraryPropertyType propertyType,
int* value)
libraryPropertyType
for supported types.Parameter |
Memory |
In/Out |
Description |
---|---|---|---|
|
Host |
IN |
Requested property |
|
Host |
OUT |
Value of the requested property |
libraryPropertyType
(defined in library_types.h
):
Value |
Meaning |
---|---|
|
Enumerator to query the major version |
|
Enumerator to query the minor version |
|
Number to identify the patch level |
See cusparseStatus_t for the description of the return status.
Matrix Descriptor Functions#
cusparseLtDenseDescriptorInit
#
cusparseStatus_t
cusparseLtDenseDescriptorInit(const cusparseLtHandle_t* handle,
cusparseLtMatDescriptor_t* matDescr,
int64_t rows,
int64_t cols,
int64_t ld,
uint32_t alignment,
cudaDataType valueType,
cusparseOrder_t order)
The function initializes the descriptor of a dense matrix.
Parameter |
Memory |
In/Out |
Description |
Possible Values |
---|---|---|---|---|
Host |
IN |
cuSPARSELt library handle |
||
Host |
OUT |
Dense matrix description |
||
|
Host |
IN |
Number of rows |
|
|
Host |
IN |
Number of columns |
|
|
Host |
IN |
Leading dimension |
≥ rows if column-major, ≥ cols if row-major |
|
Host |
IN |
Memory alignment in bytes |
Multiple of 16 |
Host |
IN |
Data type of the matrix |
|
|
Host |
IN |
Memory layout |
|
Constrains:
rows
,cols
, andld
must be a multiple of
16 if
valueType
isCUDA_R_8I
,CUDA_R_8F_E4M3
orCUDA_R_8F_E5M2
8 if
valueType
isCUDA_R_16F
orCUDA_R_16BF
4 if
valueType
isCUDA_R_32F
See cusparseStatus_t for the description of the return status.
cusparseLtStructuredDescriptorInit
#
cusparseStatus_t
cusparseLtStructuredDescriptorInit(const cusparseLtHandle_t* handle,
cusparseLtMatDescriptor_t* matDescr,
int64_t rows,
int64_t cols,
int64_t ld,
uint32_t alignment,
cudaDataType valueType,
cusparseOrder_t order,
cusparseLtSparsity_t sparsity)
The function initializes the descriptor of a structured matrix.
Parameter |
Memory |
In/Out |
Description |
Possible Values |
---|---|---|---|---|
Host |
IN |
cuSPARSELt library handle |
||
Host |
OUT |
Sparse matrix description |
||
|
Host |
IN |
Number of rows |
|
|
Host |
IN |
Number of columns |
|
|
Host |
IN |
Leading dimension |
≥ rows if column-major, ≥ cols if row-major |
|
Host |
IN |
Memory alignment in bytes |
Multiple of 16 |
Host |
IN |
Data type of the matrix |
|
|
Host |
IN |
Memory layout |
|
|
Host |
IN |
Matrix sparsity ratio |
|
Constrains:
rows
,cols
, andld
must be a multiple of
32 if
valueType
isCUDA_R_8I
,CUDA_R_8F_E4M3
orCUDA_R_8F_E5M2
16 if
valueType
isCUDA_R_16F
, orCUDA_R_16BF
8 if
valueType
isCUDA_R_32F
See cusparseStatus_t for the description of the return status.
cusparseLtMatDescriptorDestroy
#
cusparseStatus_t
cusparseLtMatDescriptorDestroy(const cusparseLtMatDescriptor_t* matDescr)
Parameter |
Memory |
In/Out |
Description |
---|---|---|---|
Host |
IN |
Matrix descriptor |
See cusparseStatus_t for the description of the return status.
cusparseLtMatDescSetAttribute
#
cusparseStatus_t
cusparseLtMatDescSetAttribute(const cusparseLtHandle_t* handle,
cusparseLtMatDescriptor_t* matmulDescr,
cusparseLtMatDescAttribute_t matAttribute,
const void* data,
size_t dataSize)
Parameter |
Memory |
In/Out |
Description |
Possible Values |
---|---|---|---|---|
Host |
IN |
cuSPARSELt library handle |
||
Host |
OUT |
Matrix descriptor |
||
Host |
IN |
Attribute to set |
|
|
|
Host |
IN |
Pointer to the value to which the specified attribute will be set |
|
|
Host |
IN |
Size in bytes of the attribute value used for verification |
See cusparseStatus_t for the description of the return status.
cusparseLtMatDescGetAttribute
#
cusparseStatus_t
cusparseLtMatDescGetAttribute(const cusparseLtHandle_t* handle,
const cusparseLtMatDescriptor_t* matmulDescr,
cusparseLtMatDescAttribute_t matAttribute,
void* data,
size_t dataSize)
Parameter |
Memory |
In/Out |
Description |
Possible Values |
---|---|---|---|---|
Host |
IN |
cuSPARSELt library handle |
||
Host |
IN |
Matrix descriptor |
||
Host |
IN |
Attribute to retrieve |
|
|
|
Host |
OUT |
Memory address containing the attribute value retrieved by this function |
|
|
Host |
IN |
Size in bytes of the attribute value used for verification |
See cusparseStatus_t for the description of the return status.
Matmul Descriptor Functions#
cusparseLtMatmulDescriptorInit
#
cusparseStatus_t
cusparseLtMatmulDescriptorInit(const cusparseLtHandle_t* handle,
cusparseLtMatmulDescriptor_t* matmulDescr,
cusparseOperation_t opA,
cusparseOperation_t opB,
const cusparseLtMatDescriptor_t* matA,
const cusparseLtMatDescriptor_t* matB,
const cusparseLtMatDescriptor_t* matC,
const cusparseLtMatDescriptor_t* matD,
cusparseComputeType computeType)
The function initializes the matrix multiplication descriptor.
Parameter |
Memory |
In/Out |
Description |
Possible Values |
---|---|---|---|---|
Host |
IN |
cuSPARSELt library handle |
||
Host |
OUT |
Matrix multiplication descriptor |
||
Host |
IN |
Operation applied to the matrix |
|
|
Host |
IN |
Operation applied to the matrix |
|
|
Host |
IN |
Structured or dense matrix descriptor |
||
Host |
IN |
Structured or dense matrix descriptor |
||
Host |
IN |
Dense matrix descriptor |
||
Host |
IN |
Dense matrix descriptor |
||
Host |
IN |
Compute precision |
|
The structured matrix descriptor can used for matA
or matB
but not both.
Constrains:
See cusparseLtMatmul() for the supported data types.
CUDA_R_8I
,CUDA_R_8F_E4M3
andCUDA_R_8F_E5M2
data type only supports (the opposite ifB
is structured):
opA/opB = TN
if the matrix orders areorderA/orderB = Col/Col
opA/opB = NT
if the matrix orders areorderA/orderB = Row/Row
opA/opB = NN
if the matrix orders areorderA/orderB = Row/Col
opA/opB = TT
if the matrix orders areorderA/orderB = Col/Row
C
andD
must have the same leading dimension and memory layout (see cusparseOrder_t for different memory layouts).The maximum number of elements for each dimension (rows and columns) of matrices
C
andD
is limited to 2097120.
See cusparseStatus_t for the description of the return status.
cusparseLtMatmulDescSetAttribute
#
cusparseStatus_t
cusparseLtMatmulDescSetAttribute(const cusparseLtHandle_t* handle,
cusparseLtMatmulDescriptor_t* matmulDescr,
cusparseLtMatmulDescAttribute_t matmulAttribute,
const void* data,
size_t dataSize)
Parameter |
Memory |
In/Out |
Description |
|
---|---|---|---|---|
Host |
IN |
cuSPARSELt library handle |
||
Host |
OUT |
Matrix descriptor |
||
Host |
IN |
Attribute to set |
|
|
|
Host |
IN |
Pointer to the value to which the specified attribute will be set |
|
|
Host |
IN |
Size in bytes of the attribute value used for verification |
See cusparseStatus_t for the description of the return status.
cusparseLtMatmulDescGetAttribute
#
cusparseStatus_t
cusparseLtMatmulDescGetAttribute(const cusparseLtHandle_t* handle,
const cusparseLtMatmulDescriptor_t* matmulDescr,
cusparseLtMatmulDescAttribute_t matmulAttribute,
void* data,
size_t dataSize)
Parameter |
Memory |
In/Out |
Description |
|
---|---|---|---|---|
Host |
IN |
cuSPARSELt library handle |
||
Host |
OUT |
Matrix descriptor |
||
Host |
IN |
Attribute to retrieve |
|
|
|
Host |
OUT |
Memory address containing the attribute value retrieved by this function |
|
|
Host |
IN |
Size in bytes of the attribute value used for verification |
See cusparseStatus_t for the description of the return status.
Matmul Algorithm Functions#
cusparseLtMatmulAlgSelectionInit
#
cusparseStatus_t
cusparseLtMatmulAlgSelectionInit(const cusparseLtHandle_t* handle,
cusparseLtMatmulAlgSelection_t* algSelection,
const cusparseLtMatmulDescriptor_t* matmulDescr,
cusparseLtMatmulAlg_t alg)
The function initializes the algorithm selection descriptor.
Parameter |
Memory |
In/Out |
Description |
Possible Values |
---|---|---|---|---|
Host |
IN |
cuSPARSELt library handle |
||
Host |
OUT |
Algorithm selection descriptor |
||
Host |
IN |
Matrix multiplication descriptor |
||
Host |
IN |
Algorithm mode |
|
See cusparseStatus_t for the description of the return status.
cusparseLtMatmulAlgSetAttribute
#
cusparseStatus_t
cusparseLtMatmulAlgSetAttribute(const cusparseLtHandle_t* handle,
cusparseLtMatmulAlgSelection_t* algSelection,
cusparseLtMatmulAlgAttribute_t attribute,
const void* data,
size_t dataSize)
The function sets the value of the specified attribute belonging to algorithm selection descriptor.
Parameter |
Memory |
In/Out |
Description |
Possible Values |
---|---|---|---|---|
Host |
IN |
cuSPARSELt library handle |
||
Host |
OUT |
Algorithm selection descriptor |
||
Host |
IN |
The attribute to set |
|
|
|
Host |
IN |
Pointer to the value to which the specified attribute will be set |
|
|
Host |
IN |
Size in bytes of the attribute value used for verification |
See cusparseStatus_t for the description of the return status.
cusparseLtMatmulAlgGetAttribute
#
cusparseStatus_t
cusparseLtMatmulAlgGetAttribute(const cusparseLtHandle_t* handle,
const cusparseLtMatmulAlgSelection_t* algSelection,
cusparseLtMatmulAlgAttribute_t attribute,
void* data,
size_t dataSize)
The function returns the value of the queried attribute belonging to algorithm selection descriptor.
Parameter |
Memory |
In/Out |
Description |
Possible Values |
---|---|---|---|---|
Host |
IN |
cuSPARSELt library handle |
||
Host |
IN |
Algorithm selection descriptor |
||
Host |
IN |
The attribute that will be retrieved by this function |
|
|
|
Host |
OUT |
Memory address containing the attribute value retrieved by this function |
|
|
Host |
IN |
Size in bytes of the attribute value used for verification |
See cusparseStatus_t for the description of the return status.
Matmul Functions#
cusparseLtMatmulGetWorkspace
#
cusparseStatus_t
cusparseLtMatmulGetWorkspace(const cusparseLtHandle_t* handle,
const cusparseLtMatmulPlan_t* plan,
size_t* workspaceSize)
The function determines the required workspace size associated to the selected algorithm.
Parameter |
Memory |
In/Out |
Description |
---|---|---|---|
Host |
IN |
cuSPARSELt library handle |
|
Host |
IN |
Matrix multiplication plan |
|
|
Host |
OUT |
Workspace size in bytes |
See cusparseStatus_t for the description of the return status.
cusparseLtMatmulPlanInit
#
cusparseStatus_t
cusparseLtMatmulPlanInit(const cusparseLtHandle_t* handle,
cusparseLtMatmulPlan_t* plan,
const cusparseLtMatmulDescriptor_t* matmulDescr,
const cusparseLtMatmulAlgSelection_t* algSelection)
Parameter |
Memory |
In/Out |
Description |
---|---|---|---|
Host |
IN |
cuSPARSELt library handle |
|
Host |
OUT |
Matrix multiplication plan |
|
Host |
IN |
Matrix multiplication descriptor |
|
Host |
IN |
Algorithm selection descriptor |
See cusparseStatus_t for the description of the return status.
cusparseLtMatmulPlanDestroy
#
cusparseStatus_t
cusparseLtMatmulPlanDestroy(const cusparseLtMatmulPlan_t* plan)
cusparseLtMatmulPlan_t
after cusparseLtMatmulPlanDestroy()
will return an error.Parameter |
Memory |
In/Out |
Description |
---|---|---|---|
Host |
IN |
Matrix multiplication plan |
See cusparseStatus_t for the description of the return status.
cusparseLtMatmul
#
cusparseStatus_t
cusparseLtMatmul(const cusparseLtHandle_t* handle,
const cusparseLtMatmulPlan_t* plan,
const void* alpha,
const void* d_A,
const void* d_B,
const void* beta,
const void* d_C,
void* d_D,
void* workspace,
cudaStream_t* streams,
int32_t numStreams)
The function computes the matrix multiplication of matrices A
and B
to produce the the output matrix D
, according to the following operation:
A
, B
, and C
are input matrices, and and are input scalars or vectors of scalars (device-side pointers).d_A
or d_B
structured sparse matrix pointer should be output of cusparseLtSpMMACompress() or cusparseLtSpMMACompress2().D
has the same shape of C
Parameter |
Memory |
In/Out |
Description |
---|---|---|---|
Host |
IN |
cuSPARSELt library handle |
|
Host |
IN |
Matrix multiplication plan |
|
|
Host/Device |
IN |
scalar/vector of scalars used for multiplication ( |
|
Device |
IN |
Pointer to the structured or dense matrix |
|
Device |
IN |
Pointer to the structured or dense matrix |
|
Host/Device |
IN |
scalar/vector of scalars used for multiplication ( |
|
Device |
IN |
Pointer to the dense matrix |
|
Device |
OUT |
Pointer to the dense matrix |
|
Device |
IN |
Pointer to workspace |
|
Host |
IN |
Pointer to CUDA stream array for the computation |
|
Host |
IN |
Number of CUDA streams in |
Data types Supported:
Input A/B |
Input C |
Output D |
Compute |
---|---|---|---|
|
|
||
|
|
||
|
|
||
|
|
||
|
|||
|
|||
|
|||
|
|
|
|
|
|
||
|
|||
|
|||
|
|||
|
|
|
|
|
|
||
|
|||
|
|||
|
For detailed list of which GPU Compute Capabilities support which datatype combinations, see Key Features
Constrains:
All pointers must be aligned to 16 bytes
Properties
The routine requires no extra storage
The routine supports asynchronous execution with respect to
streams[0]
Provides deterministic (bit-wise) results for each run
cusparseLtMatmul
supports the following optimizations:
CUDA graph capture
Hardware Memory Compression
See cusparseStatus_t for the description of the return status.
cusparseLtMatmulSearch
#
cusparseStatus_t
cusparseLtMatmulSearch(const cusparseLtHandle_t* handle,
cusparseLtMatmulPlan_t* plan,
const void* alpha,
const void* d_A,
const void* d_B,
const void* beta,
const void* d_C,
void* d_D,
void* workspace,
cudaStream_t* streams,
int32_t numStreams)
plan
and automatically updates the cusparseLtMatmulAlgSelection_t
used to initialize the plan
by selecting the fastest one. The functionality is intended to be used for auto-tuning purposes when the same operation is repeated multiple times over different inputs.d_D
values may accumulate if the operation is performed in-place (d_C=d_D
).
The function is NOT asynchronous with respect to
streams[0]
(blocking call)The number of iterations for the evaluation can be set by using cusparseLtMatmulAlgSetAttribute() with
CUSPARSELT_MATMUL_SEARCH_ITERATIONS
.The selected algorithm id can be retrieved by using cusparseLtMatmulAlgGetAttribute() with
CUSPARSELT_MATMUL_ALG_CONFIG_ID
.The function also searches for optimal combination of Split-K parameters. The selected values can be retrieved by using cusparseLtMatmulAlgGetAttribute().
Parameter |
Memory |
In/Out |
Description |
---|---|---|---|
Host |
IN |
cuSPARSELt library handle |
|
Host |
OUT |
Matrix multiplication plan |
|
|
Host |
IN |
scalar/vector of scalars used for multiplication ( |
|
Device |
IN |
Pointer to the structured or dense matrix |
|
Device |
IN |
Pointer to the structured or dense matrix |
|
Host |
IN |
scalar/vector of scalars used for multiplication ( |
|
Device |
IN |
Pointer to the dense matrix |
|
Device |
OUT |
Pointer to the dense matrix |
|
Device |
IN |
Pointer to workspace |
|
Host |
IN |
Pointer to CUDA stream array for the computation |
|
Host |
IN |
Number of CUDA streams in |
Helper Functions#
cusparseLtSpMMAPrune
#
cusparseStatus_t
cusparseLtSpMMAPrune(const cusparseLtHandle_t* handle,
const cusparseLtMatmulDescriptor_t* matmulDescr,
const void* d_in,
void* d_out,
cusparseLtPruneAlg_t pruneAlg,
cudaStream_t stream)
The function prunes a dense matrix d_in
according to the specified algorithm pruneAlg
.
Parameter |
Memory |
In/Out |
Description |
Possible Values |
---|---|---|---|---|
Host |
IN |
cuSPARSELt library handle |
||
Host |
IN |
Matrix multiplication descriptor |
||
|
Device |
IN |
Pointer to the dense matrix |
|
|
Device |
OUT |
Pointer to the pruned matrix |
|
Device |
IN |
Pruning algorithm |
|
|
|
Host |
IN |
CUDA stream for the computation |
Properties
The routine requires no extra storage
The routine supports asynchronous execution with respect to
stream
Provides deterministic (bit-wise) results for each run
cusparseLtSpMMAPrune()
supports the following optimizations:
CUDA graph capture
Hardware Memory Compression
See cusparseStatus_t for the description of the return status.
cusparseLtSpMMAPrune2 [DEPRECATED]
#
cusparseStatus_t
cusparseLtSpMMAPrune2(const cusparseLtHandle_t* handle,
const cusparseLtMatDescriptor_t* sparseMatDescr,
int isSparseA,
cusparseOperation_t op,
const void* d_in,
void* d_out,
cusparseLtPruneAlg_t pruneAlg,
cudaStream_t stream);
The function prunes a dense matrix d_in
according to the specified algorithm pruneAlg
.
Parameter |
Memory |
In/Out |
Description |
Possible Values |
---|---|---|---|---|
Host |
IN |
cuSPARSELt library handle |
||
Host |
IN |
Structured (sparse) matrix descriptor |
||
|
Host |
IN |
Specify if the structured (sparse) matrix is in the first position ( |
|
|
Host |
IN |
Operation that will be applied to the structured (sparse) matrix in the multiplication |
|
|
Device |
IN |
Pointer to the dense matrix |
|
|
Device |
OUT |
Pointer to the pruned matrix |
|
Device |
IN |
Pruning algorithm |
|
|
|
Host |
IN |
CUDA stream for the computation |
If CUSPARSELT_PRUNE_SPMMA_TILE
is used, isSparseA
and op
are not relevant.
The function has the same properties of cusparseLtSpMMAPrune()
cusparseLtSpMMAPruneCheck
#
cusparseStatus_t
cusparseLtSpMMAPruneCheck(const cusparseLtHandle_t* handle,
const cusparseLtMatmulDescriptor_t* matmulDescr,
const void* d_in,
int* d_valid,
cudaStream_t stream)
The function checks the correctness of the pruning structure for a given matrix. Data pruned with cusparseLtSpMMAPrune() is guaranteed to be correct and this function can be skipped.
Parameter |
Memory |
In/Out |
Description |
---|---|---|---|
Host |
IN |
cuSPARSELt library handle |
|
Host |
IN |
Matrix multiplication descriptor |
|
|
Device |
IN |
Pointer to the matrix to check |
|
Device |
OUT |
Validation results ( |
|
Host |
IN |
CUDA stream for the computation |
See cusparseStatus_t for the description of the return status.
cusparseLtSpMMAPruneCheck2 [DEPRECATED]
#
cusparseStatus_t
cusparseLtSpMMAPruneCheck2(const cusparseLtHandle_t* handle,
const cusparseLtMatDescriptor_t* sparseMatDescr,
int isSparseA,
cusparseOperation_t op,
const void* d_in,
int* d_valid,
cudaStream_t stream)
The function checks the correctness of the pruning structure for a given matrix.
Parameter |
Memory |
In/Out |
Description |
Possible Values |
---|---|---|---|---|
Host |
IN |
cuSPARSELt library handle |
||
Host |
IN |
Structured (sparse) matrix descriptor |
||
|
Host |
IN |
Specify if the structured (sparse) matrix is in the first position ( |
|
|
Host |
IN |
Operation that will be applied to the structured (sparse) matrix in the multiplication |
|
|
Device |
IN |
Pointer to the matrix to check |
|
|
Device |
OUT |
Validation results ( |
|
|
Host |
IN |
CUDA stream for the computation |
The function has the same properties of cusparseLtSpMMAPruneCheck()
cusparseLtSpMMACompressedSize
#
cusparseStatus_t
cusparseLtSpMMACompressedSize(const cusparseLtHandle_t* handle,
const cusparseLtMatmulPlan_t* plan,
size_t* compressedSize,
size_t* compressBufferSize)
The function provides the size of the compressed matrix to be allocated before calling cusparseLtSpMMACompress().
Parameter |
Memory |
In/Out |
Description |
---|---|---|---|
Host |
IN |
cuSPARSELt library handle |
|
Host |
IN |
Matrix plan descriptor |
|
|
Host |
OUT |
Size in bytes for the compressed matrix |
|
Host |
OUT |
Size in bytes for the buffer needed for the matrix compression |
See cusparseStatus_t for the description of the return status.
cusparseLtSpMMACompressedSize2 [DEPRECATED]
#
cusparseStatus_t
cusparseLtSpMMACompressedSize2(const cusparseLtHandle_t* handle,
const cusparseLtMatDescriptor_t* sparseMatDescr,
size_t* compressedSize,
size_t* compressBufferSize)
The function provides the size of the compressed matrix to be allocated before calling cusparseLtSpMMACompress2(). It has to be called after cusparseLtMatmulPlanInit.
Parameter |
Memory |
In/Out |
Description |
---|---|---|---|
Host |
IN |
cuSPARSELt library handle |
|
Host |
IN |
Structured (sparse) matrix descriptor |
|
|
Host |
OUT |
Size in bytes of the compressed matrix |
|
Host |
OUT |
Size in bytes for the buffer needed for the matrix compression |
The function has the same properties of cusparseLtSpMMACompressedSize()
cusparseLtSpMMACompress
#
cusparseStatus_t
cusparseLtSpMMACompress(const cusparseLtHandle_t* handle,
const cusparseLtMatmulPlan_t* plan,
const void* d_dense,
void* d_compressed,
void* d_compressed_buffer,
cudaStream_t stream)
The function compresses a dense matrix d_dense
. The compressed matrix is intended to be used as the first/second operand A
/B
in the cusparseLtMatmul() or cusparseLtMatmulSearch() function.
Input matrix d_dense
to this function must be pruned either with cusparseLtSpMMAPrune() <cusparseLtSpMMAPrune-label>
or with custom function. Pruned data should respect the following constrains depending on the operation applied to this matrix in the cusparseLtMatmul() which is defined by cusparseLtMatmulDescriptor_t
created in the cusparseLtMatmulDescriptorInit():
For
op = CUSPARSE_NON_TRANSPOSE
CUDA_R_16F
,CUDA_R_16BF
,CUDA_R_8I
,CUDA_R_8F_E4M3
,CUDA_R_8F_E5M2
each row must have at least two non-zero values every four elementsCUDA_R_32F
each row must have at least one non-zero value every two elements
For
op = CUSPARSE_TRANSPOSE
CUDA_R_16F
,CUDA_R_16BF
,CUDA_R_8I
,CUDA_R_8F_E4M3
,CUDA_R_8F_E5M2
each column must have at least two non-zero values every four elementsCUDA_R_32F
each column must have at least one non-zero value every two elements
int8
, e4m3
and e5m2
kernels should run at high SM clocks for maximizing the performance.
The correctness of the pruning result (matrix A
/B
) can be checked with the function cusparseLtSpMMAPruneCheck(). Note that pruning with cusparseLtSpMMAPrune() <cusparseLtSpMMAPrune-label>
is guaranteed to be correct.
Parameter |
Memory |
In/Out |
Description |
---|---|---|---|
Host |
IN |
cuSPARSELt library handle |
|
Host |
IN |
Matrix multiplication plan |
|
|
Device |
IN |
Pointer to the dense matrix |
|
Device |
OUT |
Pointer to the compressed matrix |
|
Device |
OUT |
Pointer to temporary buffer for the compression |
|
Host |
IN |
CUDA stream for the computation |
Properties
The routine supports asynchronous execution with respect to
stream
Provides deterministic (bit-wise) results for each run
cusparseLtSpMMACompress()
has to be called each time after the algorithm ID is updated with cusparseLtMatmulAlgGetAttribute().cusparseLtSpMMACompress()
supports the following optimizations:
CUDA graph capture
Hardware Memory Compression
See cusparseStatus_t for the description of the return status.
cusparseLtSpMMACompress2 [DEPRECATED]
#
cusparseStatus_t
cusparseLtSpMMACompress2(const cusparseLtHandle_t* handle,
const cusparseLtMatDescriptor_t* sparseMatDescr,
int isSparseA,
cusparseOperation_t op,
const void* d_dense,
void* d_compressed,
void* d_compressed_buffer,
cudaStream_t stream)
Parameter |
Memory |
In/Out |
Description |
Possible Values |
---|---|---|---|---|
Host |
IN |
cuSPARSELt library handle |
||
Host |
IN |
Structured (sparse) matrix descriptor |
||
|
Host |
IN |
Specify if the structured (sparse) matrix is in the first position ( |
|
|
Host |
IN |
Operation that will be applied to the structured (sparse) matrix in the multiplication |
|
|
Device |
IN |
Pointer to the dense matrix |
|
|
Device |
OUT |
Pointer to the compressed matrix |
|
|
Device |
OUT |
Pointer to temporary buffer for the compression |
|
|
Host |
IN |
CUDA stream for the computation |
The function has the same properties of cusparseLtSpMMACompress()