Sparse Linear Algebra#

Overview#

Sparse tensors are vectors, matrices, and higher-dimensional generalizations with many zeros. Such tensors are crucial in various fields such as scientific computing, signal processing, and deep learning due to their efficiency in storage, computation, and power. Sparse linear algebra refers to any linear algebra where vectors or matrices are sparse. Higher-dimensional sparse tensors may occur in deep learning.

The sparse linear algebra module nvmath.sparse in nvmath-python leverages various NVIDIA math libraries to support sparse [1] linear algebra computations. As of the current Beta release, we offer the Universal Sparse Tensor (UST) to simplify managing sparsity as well as specialized sparse direct solver API based on the cuDSS library. Distributed (multi-node multi-GPU) execution is not currently supported.

Universal Sparse Tensor#

The Universal Sparse Tensor (UST) simplifies managing sparsity by decoupling a tensor’s inherent sparsity from its memory storage representation. With roots in sparse compiler technology (see e.g. MT1 or TACO), the UST uses a tensor format DSL (Domain Specific Language) to describe how the sparse tensor should be represented and uses type polymorphism on a small set of base operations to define the vast space of instances for these operations. Developers merely focus on the sparsity of a tensor. Runtime inspection of the eventually chosen format decides whether to dispatch to a heavily optimized library or kernel, or to fill in the “holes” with automatic sparse code generation when no such solution exists yet.

The nvmath-python UST implementation provides interoperability with tensors of PyTorch, SciPy, CuPy, and NumPy. The interoperability is zero-cost, which means that viewing dense or sparse formats like COO, CSR, CSC, BSR, BSC, and DIA as a UST object or back is done without data movement or copying. Instead, the UST object references the storage buffers of the original data structure.

UST DSL#

The tensor format DSL of the Universal Sparse Tensor maps tensor dimensions (logical tensor indices) to storage levels (physical memory indices) using an invertible function that defines how each level should be stored.

The DSL consists of:

  • An ordered sequence of dimension specifications, each of which includes:

    • a dimension-expression, which provides a reference to each dimension

  • An ordered sequence of level specifications, each of which includes:

    • a level expression, which defines what is stored in each level

    • a level type, which defines how the level is stored, including:

      • a required level format

      • a collection of level properties

The following level formats are supported:

  • dense: the level is dense, entries along the level are stored linearized without any metadata but indexing logic only

  • batch: a variant of the dense format, indicating that any subsequent compression is not linearized

  • range: a variant of the dense format, restricting the range based on a compression expression in the previous level

  • compressed: the level is sparse, only nonzeros along the level are stored with positions and coordinates in two arrays at that level

  • singleton: a variant of the compressed format, for when coordinates have no siblings which means that the positions array can be omitted at that level

  • delta(b): a variant of the compressed format, where coordinates denote the distance to the next stored entry; zero padding is used when this distance exceeds b-bits

Level formats have, at least, the following level properties:

  • non/unique : duplicates are (not) allowed at that level (unique by default)

  • un/ordered : coordinates (not) sorted at that level (ordered by default)

The UST type can easily define many common storage formats (such as dense vectors and matrices, sparse vectors, sparse matrices in formats like COO, CSR, CSC, DCSR, DCSC, BSR, BSC, DIA, and with generalizations to sparse tensors), as well as many less common and rather novel storage formats, as was demonstrated in this UST blog posting.

In nvmath-python, we adopt a Pythonic way of presenting the DSL, trading some inspection performance for extreme runtime flexibility. The CSC format, for instance, is expressed as follows:

CSC = TensorFormat( [i, j], {j: LevelFormat.DENSE, i: LevelFormat.COMPRESSED} )

A major advantage of such objects is that everything can be constructed dynamically at runtime (including parsing from strings). Performing format-specific tasks, however, requires inspecting the actual contents at runtime. Since such decisions generally happen outside performance-critical paths, trading off for generality seems an acceptable choice.

UST DSL Grammar#

The grammar of the tensor format DSL in Backus-Naur Form is as follows:

<tensor_format> ::= ( <dim_specs> ) -> ( <lvl_specs> )

<dim_specs> ::= <empty>  | <dim_list>
<dim_list>  ::= <dim_spec> | <dim_spec> , <dim_list>

<dim_spec>  ::= <dim_expr>
<dim_expr>  ::= <dim_var>  # only simple for now
<dim_var>   ::= <id>       # e.g. i or d0

<lvl_specs> ::= <empty>  | <lvl_list>
<lvl_list>  ::= <lvl_spec> | <lvl_spec> , <lvl_list>

<lvl_spec>  ::= <lvl_expr> : <lvl_type>

<lvl_expr>  ::= <lvl_expr> + <lvl_expr>    | <lvl_expr> - <lvl_expr>  |
                <lvl_expr> / <lvl_expr>    | <lvl_expr> % <lvl_expr>  |
                <lvl_expr> count <lvl_expr> |
                <dim_var>                   |
                <const>  # e.g. 4 or 200

<lvl_type>   ::= <lvl_format> | <lvl_format> ( <lvl_props> )
<lvl_format> ::= dense | compressed | singleton | range | delta(<const>)
<lvl_props>  ::= <lvl_prop> | <lvl_prop> , <lvl_props>
<lvl_prop>   ::= ordered | unordered | unique | nonunique

Even though the grammar allows for many formats that are syntactically correct, only specific grammar instances actually make sense. For example, dimension variables should only appear once in the dimension specification (e.g. (i,i) -> ... is invalid), level expressions are either a single dimension variable (like i), or a non-nested add/sub operation of two such variables (like i-j) or a div/mod operation with a constant (like i/4). Also, the mapping should always remain an invertible, one-to-one function between dimensions and levels. Lastly, not all property combinations make sense. For example, dense formats cannot be nonunique or unordered. The UST constructor gives an error for tensor formats that violate these constraints.

Note

The grammar can be expanded in the future to further generalize the UST to include storage formats that cannot be expressed currently.

API Reference#

Universal Sparse Tensor (UST) APIs (nvmath.sparse.ust)#

Tensor(extents, *[, tensor_format, ...])

The universal sparse tensor binds extents, data type, index type, etc with a universal sparse tensor format object.

Dimension(*, dimension_name)

A dimension object encapsulates a dimension name.

LevelFormat(value)

LevelProperty(value)

LevelExpr(*, expression1, operator, expression2)

A (binary) level expression is a triple: dimension object or level expression, operator, dimension or level expression or int.

TensorFormat(dimensions, levels, *[, name])

A universal sparse tensor format maps dimension specifications (dimensions for short) to level specifications (levels for short).

NamedFormats()

A number of pre-defined common tensor formats (direct or using builders).

UST Torch Interface (nvmath.sparse.ust.interfaces.torch_interface)#

TorchUST(shape, device, dtype, layout, ...)

This class wraps the universal sparse tensor as a torch.Tensor.

prune_model(model, *[, local, amount])

This is a convenience wrapper that uses the framework torch.nn.utils.prune to prune all the weights of linear layers in a model either locally (per layer) or globally (over all layers) with the given amount.

reformat_model(model, *[, func])

This function potentially converts the linear weights in a model into UST format.

Generic Linear Algebra APIs (nvmath.sparse)#

matmul_matrix_qualifiers_dtype

NumPy dtype object that encapsulates the matmul matrix qualifiers in sparse.

compile_matmul_prolog(prolog, *, ...[, ...])

Compile a unary Python function with argument type dtype returning a value of type dtype to device code.

compile_matmul_epilog(epilog, *, dtype[, ...])

Compile a unary Python function with argument type dtype returning a value of type dtype to device code.

compile_matmul_add(add, *, dtype[, ...])

Compile a binary Python function with arguments of type dtype returning a value of type dtype to device code.

compile_matmul_atomic_add(add, *, dtype[, ...])

Compile a binary Python function with arguments of type dtype returning a value of type dtype to device code.

compile_matmul_mul(add, *, dtype[, ...])

Compile a binary Python function with arguments of type dtype returning a value of type dtype to device code.

Matmul(a, b, /[, c, alpha, beta, ...])

Create a stateful object encapsulating the specified matrix multiplication computation, which is one of \(epilog(\alpha \, op_h(a) \, @ \, op_h(b) + \beta \, c)\) or \(epilog(prolog_a(op_t(a)) \, @ \, prolog_b(op_t(b)) + prolog_c(c))\), along with the required resources to perform the operation.

matmul(a, b, /[, c, alpha, beta, ...])

Perform the specified sparse matrix multiplication computation, which is one of \(epilog(\alpha \, op_h(a) \, @ \, op_h(b) + \beta \, c)\) or \(epilog(prolog_a(op_t(a)) \, @ \, prolog_b(op_t(b)) + prolog_c(c))\).

ComputeType(value)

The subset of cudaDataType_t that is supported by the generic sparse APIs.

ExecutionCUDA(*[, device_id])

A data class for providing GPU execution options.

MatmulOptions([codegen, compute_type, ...])

A data class for providing options to the Matmul object and the wrapper function matmul().

Specialized Linear Algebra APIs (nvmath.sparse.advanced)#

direct_solver(a, b, /, *[, options, ...])

Solve \(a @ x = b\) for \(x\).

DirectSolver(a, b, *[, options, execution, ...])

Create a stateful object that encapsulates the specified sparse direct solver computations and required resources.

DirectSolverFactorizationConfig(solver)

An interface to configure nvmath.sparse.advanced.DirectSolver.factorize().

DirectSolverFactorizationInfo(solver)

An interface to query information returned by nvmath.sparse.advanced.DirectSolver.factorize().

DirectSolverPlanConfig(solver)

An interface to configure nvmath.sparse.advanced.DirectSolver.plan().

DirectSolverPlanInfo(solver)

An interface to query information returned by nvmath.sparse.advanced.DirectSolver.plan().

DirectSolverSolutionConfig(solver)

An interface to configure nvmath.sparse.advanced.DirectSolver.solve().

memory_estimates_dtype

NumPy dtype object that encapsulates the memory estimates in sparse.advanced.

DirectSolverAlgType

alias of AlgType

DirectSolverMatrixType

alias of MatrixType

DirectSolverMatrixViewType

alias of MatrixViewType

DirectSolverOptions([sparse_system_type, ...])

A data class for providing options to the DirectSolver object and the wrapper function direct_solver().

ExecutionCUDA(device_id, ...)

A data class for providing GPU execution options to the DirectSolver object and the wrapper function direct_solver().

ExecutionHybrid([device_id, num_threads])

A data class for providing hybrid (CPU-GPU) execution options to the DirectSolver object and the wrapper function direct_solver().

HybridMemoryModeOptions([...])

A data class for providing options related to the use of hybrid (CPU-GPU) memory to those execution spaces that support it.

DirectSolverPlanPreferences(*[, ...])

A data class for providing plan preferences to the direct_solver() function.

DirectSolverFactorizationPreferences(*[, ...])

A data class for providing factorization preferences to the direct_solver() function.

DirectSolverSolutionPreferences(*[, ...])

A data class for providing solution preferences to the direct_solver() function.

Footnotes