Dense Arrays
Most NVIDIA cuVS APIs operate on dense vectors and matrices: datasets, queries, labels, distances, neighbors, centroids, and intermediate buffers. Each language binding exposes those arrays in the form that is most natural for that language, while the lower-level NVIDIA cuVS C API uses DLPack-compatible tensor metadata to describe the same shape, dtype, device, and layout. For APIs that explicitly accept sparse inputs, see Sparse Arrays.
Most APIs expect row-major matrices. A dataset is usually shaped n_rows x n_features, a query matrix is shaped n_queries x n_features, and top-k outputs are shaped n_queries x k.
NVIDIA cuVS uses RMM for arrays allocated in device-accessible memory, including device memory, managed memory, and pinned host memory. See Memory Management for allocator configuration and memory-resource guidance.
Common types and interoperability
Allocating dense device arrays
The examples below allocate a row-major n_rows x n_features float32 matrix that can be used as a dense NVIDIA cuVS input. Python has several interoperability paths: CuPy arrays can be passed directly, PyTorch and TensorFlow tensors can be shared through DLPack, and NumPy arrays are host arrays that need to be copied to device memory before GPU-only calls.
C
C++
Python: CuPy
Python: PyTorch
Python: TensorFlow
Python: DLPack-compatible
Python: NumPy
Java
Rust
Go
Using dense arrays in cuVS APIs
The examples below all pass a two-dimensional dataset into the same kind of NVIDIA cuVS operation. The array container changes by language, but the logical shape is the same: rows are vectors and columns are features.
C
C++
Python
Java
Rust
Go
Passing outputs
Many NVIDIA cuVS APIs allocate outputs for the caller in higher-level bindings and require explicit output arrays in lower-level bindings. Output arrays should have the expected shape before the API call.
SearchResults, so this section does not include a Java explicit-output example.C
C++
Python
Rust
Go
Dense arrays in C++
If you use NVIDIA cuVS from C++, dense inputs and outputs are usually described with RAFT multi-dimensional array types. These types make it clear whether the memory is on the host or device, what shape it has, and whether the NVIDIA cuVS call can write to it.
There are two main families:
- Non-owning views, such as
raft::device_matrix_view,raft::device_vector_view,raft::host_matrix_view,raft::pinned_matrix_view,raft::managed_mdspan, andraft::span. These wrap memory owned somewhere else. - Owning arrays, such as
raft::device_matrix,raft::device_vector,raft::host_matrix,raft::managed_matrix, andraft::pinned_matrix. These allocate and free their own storage.
Use views when your data already exists in memory. Use owning arrays when you want RAFT to allocate storage for an input, output, or staging buffer.
Creating non-owning views
Use device views when memory already exists on the GPU. Use host views for CPU-resident memory. A view stores a pointer, extents, layout, and accessor, but it does not own the allocation.
Device
Host
Use const element types for read-only inputs. This documents intent and lets C++ APIs reject accidental writes at compile time.
Use raft::make_const_mdspan() when you already have a mutable view or owning array and need to pass it to an API as read-only:
Creating owning arrays
Use device_matrix and device_vector when RAFT should allocate GPU memory. Use managed arrays when both host and device code need to access the same allocation through CUDA Unified Memory. Device, managed, and pinned arrays work with the memory policies described in Memory Management.
Device
Host
Managed
Pinned Host
An owning mdarray should outlive any view created from it. Passing dataset.view() to an API does not transfer ownership.
Creating one-dimensional spans
Use raft::span for simple one-dimensional buffers when the shape does not need matrix or vector metadata. For NVIDIA cuVS public APIs, prefer device_vector_view or host_vector_view when the memory space matters.
device_vector and host_vector types over span. Vector aliases provide the best API unity with higher-dimensional RAFT array abstractions.span is one-dimensional and does not encode row count, column count, layout, or memory space. Use it for lightweight buffer utilities, not for matrix-shaped NVIDIA cuVS inputs.
How dense arrays work
mdspan and span are views. They do not allocate memory and do not free memory. They only describe existing memory.
mdarray is an owning container. It allocates memory in a specific memory space and releases that memory when the object is destroyed. The .view() method returns an mdspan that refers to the same allocation.
The most important properties are:
- Memory space: device, host, managed, or pinned host.
- Shape: vector length, matrix rows and columns, or higher-dimensional extents.
- Layout: usually
raft::row_majorfor NVIDIA cuVS matrices unless an API explicitly requests another layout. - Constness: read-only inputs should use
constelement types orraft::make_const_mdspan(). - Lifetime: views must not outlive the memory they describe.
Choosing C++ array types
Using arrays safely
Check the API page for the expected shape, dtype, layout, and memory location before passing an array. Most NVIDIA cuVS matrices are row-major unless the API says otherwise.
Keep the backing allocation alive for as long as a view is used. A view does not own memory, so destroying the original allocation makes the view invalid.
Allocate output arrays with the exact shape requested by the API when the binding requires explicit outputs. For top-k search, that usually means n_queries x k arrays for neighbors and distances.
Synchronize appropriately before reading data on the host. Many NVIDIA cuVS operations enqueue GPU work asynchronously on the stream owned by raft::device_resources.
Use pinned host arrays when repeated host-device transfers are important. Ordinary host arrays are simpler and are usually the right choice for CPU-only data.