Sparse Linear Algebra#
Overview#
Sparse tensors are vectors, matrices, and higher-dimensional generalizations with many zeros. Such tensors are crucial in various fields such as scientific computing, signal processing, and deep learning due to their efficiency in storage, computation, and power. Sparse linear algebra refers to any linear algebra where vectors or matrices are sparse. Higher-dimensional sparse tensors may occur in deep learning.
The sparse linear algebra module nvmath. in nvmath-python leverages various
NVIDIA math libraries to support sparse [1] linear algebra computations. As of the current
Beta release, we offer the Universal Sparse Tensor (UST) to simplify managing sparsity as
well as specialized sparse direct solver API based on the cuDSS library. Distributed (multi-node multi-GPU) execution is not
currently supported.
Universal Sparse Tensor#
The Universal Sparse Tensor (UST) simplifies managing sparsity by decoupling a tensor’s inherent sparsity from its memory storage representation. With roots in sparse compiler technology (see e.g. MT1 or TACO), the UST uses a tensor format DSL (Domain Specific Language) to describe how the sparse tensor should be represented and uses type polymorphism on a small set of base operations to define the vast space of instances for these operations. Developers merely focus on the sparsity of a tensor. Runtime inspection of the eventually chosen format decides whether to dispatch to a heavily optimized library or kernel, or to fill in the “holes” with automatic sparse code generation when no such solution exists yet.
The nvmath-python UST implementation provides interoperability with tensors of PyTorch, SciPy, CuPy, and NumPy. The interoperability is zero-cost, which means that viewing dense or sparse formats like COO, CSR, CSC, BSR, BSC, and DIA as a UST object or back is done without data movement or copying. Instead, the UST object references the storage buffers of the original data structure.
UST DSL#
The tensor format DSL of the Universal Sparse Tensor maps tensor dimensions (logical tensor indices) to storage levels (physical memory indices) using an invertible function that defines how each level should be stored.
The DSL consists of:
An ordered sequence of dimension specifications, each of which includes:
a dimension-expression, which provides a reference to each dimension
An ordered sequence of level specifications, each of which includes:
a level expression, which defines what is stored in each level
a level type, which defines how the level is stored, including:
a required level format
a collection of level properties
The following level formats are supported:
dense: the level is dense, entries along the level are stored linearized without any metadata but indexing logic only
batch: a variant of the dense format, indicating that any subsequent compression is not linearized
range: a variant of the dense format, restricting the range based on a compression expression in the previous level
compressed: the level is sparse, only nonzeros along the level are stored with positions and coordinates in two arrays at that level
singleton: a variant of the compressed format, for when coordinates have no siblings which means that the positions array can be omitted at that level
delta(b): a variant of the compressed format, where coordinates denote the distance to the next stored entry; zero padding is used when this distance exceeds b-bits
Level formats have, at least, the following level properties:
non/unique : duplicates are (not) allowed at that level (unique by default)
un/ordered : coordinates (not) sorted at that level (ordered by default)
The UST type can easily define many common storage formats (such as dense vectors and matrices, sparse vectors, sparse matrices in formats like COO, CSR, CSC, DCSR, DCSC, BSR, BSC, DIA, and with generalizations to sparse tensors), as well as many less common and rather novel storage formats, as was demonstrated in this UST blog posting.
In nvmath-python, we adopt a Pythonic way of presenting the DSL, trading some inspection performance for extreme runtime flexibility. The CSC format, for instance, is expressed as follows:
CSC = TensorFormat( [i, j], {j: LevelFormat.DENSE, i: LevelFormat.COMPRESSED} )
A major advantage of such objects is that everything can be constructed dynamically at runtime (including parsing from strings). Performing format-specific tasks, however, requires inspecting the actual contents at runtime. Since such decisions generally happen outside performance-critical paths, trading off for generality seems an acceptable choice.
UST DSL Grammar#
The grammar of the tensor format DSL in Backus-Naur Form is as follows:
<tensor_format> ::= ( <dim_specs> ) -> ( <lvl_specs> )
<dim_specs> ::= <empty> | <dim_list>
<dim_list> ::= <dim_spec> | <dim_spec> , <dim_list>
<dim_spec> ::= <dim_expr>
<dim_expr> ::= <dim_var> # only simple for now
<dim_var> ::= <id> # e.g. i or d0
<lvl_specs> ::= <empty> | <lvl_list>
<lvl_list> ::= <lvl_spec> | <lvl_spec> , <lvl_list>
<lvl_spec> ::= <lvl_expr> : <lvl_type>
<lvl_expr> ::= <lvl_expr> + <lvl_expr> | <lvl_expr> - <lvl_expr> |
<lvl_expr> / <lvl_expr> | <lvl_expr> % <lvl_expr> |
<lvl_expr> count <lvl_expr> |
<dim_var> |
<const> # e.g. 4 or 200
<lvl_type> ::= <lvl_format> | <lvl_format> ( <lvl_props> )
<lvl_format> ::= dense | compressed | singleton | range | delta(<const>)
<lvl_props> ::= <lvl_prop> | <lvl_prop> , <lvl_props>
<lvl_prop> ::= ordered | unordered | unique | nonunique
Even though the grammar allows for many formats that are syntactically correct,
only specific grammar instances actually make sense. For example, dimension variables
should only appear once in the dimension specification (e.g. (i,i) -> ... is
invalid), level expressions are either a single dimension variable (like i), or
a non-nested add/sub operation of two such variables (like i-j) or a div/mod
operation with a constant (like i/4). Also, the mapping should always remain an
invertible, one-to-one function between dimensions and levels. Lastly, not all property
combinations make sense. For example, dense formats cannot be nonunique or unordered. The
UST constructor gives an error for tensor formats that violate these constraints.
Note
The grammar can be expanded in the future to further generalize the UST to include storage formats that cannot be expressed currently.
API Reference#
Universal Sparse Tensor (UST) APIs (nvmath. sparse. ust)#
|
The universal sparse tensor binds extents, data type, index type, etc with a universal sparse tensor format object. |
|
A dimension object encapsulates a dimension name. |
|
|
|
|
|
A (binary) level expression is a triple: dimension object or level expression, operator, dimension or level expression or int. |
|
A universal sparse tensor format maps dimension specifications (dimensions for short) to level specifications (levels for short). |
A number of pre-defined common tensor formats (direct or using builders). |
UST Torch Interface (nvmath. sparse. ust. interfaces. torch_interface)#
|
This class wraps the universal sparse tensor as a |
|
This is a convenience wrapper that uses the framework |
|
This function potentially converts the linear weights in a model into UST format. |
Generic Linear Algebra APIs (nvmath. sparse)#
NumPy dtype object that encapsulates the matmul matrix qualifiers in sparse. |
|
|
Compile a unary Python function with argument type |
|
Compile a unary Python function with argument type |
|
Compile a binary Python function with arguments of type |
|
Compile a binary Python function with arguments of type |
|
Compile a binary Python function with arguments of type |
|
Create a stateful object encapsulating the specified matrix multiplication computation, which is one of \(epilog(\alpha \, op_h(a) \, @ \, op_h(b) + \beta \, c)\) or \(epilog(prolog_a(op_t(a)) \, @ \, prolog_b(op_t(b)) + prolog_c(c))\), along with the required resources to perform the operation. |
|
Perform the specified sparse matrix multiplication computation, which is one of \(epilog(\alpha \, op_h(a) \, @ \, op_h(b) + \beta \, c)\) or \(epilog(prolog_a(op_t(a)) \, @ \, prolog_b(op_t(b)) + prolog_c(c))\). |
|
The subset of |
|
A data class for providing GPU execution options. |
|
A data class for providing options to the |
Specialized Linear Algebra APIs (nvmath. sparse. advanced)#
|
Solve \(a @ x = b\) for \(x\). |
|
Create a stateful object that encapsulates the specified sparse direct solver computations and required resources. |
|
An interface to configure |
|
An interface to query information returned by |
|
An interface to configure |
|
An interface to query information returned by |
|
An interface to configure |
NumPy dtype object that encapsulates the memory estimates in sparse.advanced. |
alias of |
|
alias of |
|
alias of |
|
|
A data class for providing options to the |
|
A data class for providing GPU execution options to the |
|
A data class for providing hybrid (CPU-GPU) execution options to the |
|
A data class for providing options related to the use of hybrid (CPU-GPU) memory to those execution spaces that support it. |
|
A data class for providing plan preferences to the |
|
A data class for providing factorization preferences to the |
|
A data class for providing solution preferences to the |
Footnotes