Dense Arrays

View as Markdown

Most NVIDIA cuVS APIs operate on dense vectors and matrices: datasets, queries, labels, distances, neighbors, centroids, and intermediate buffers. Each language binding exposes those arrays in the form that is most natural for that language, while the lower-level NVIDIA cuVS C API uses DLPack-compatible tensor metadata to describe the same shape, dtype, device, and layout. For APIs that explicitly accept sparse inputs, see Sparse Arrays.

Most APIs expect row-major matrices. A dataset is usually shaped n_rows x n_features, a query matrix is shaped n_queries x n_features, and top-k outputs are shaped n_queries x k.

NVIDIA cuVS uses RMM for arrays allocated in device-accessible memory, including device memory, managed memory, and pinned host memory. See Memory Management for allocator configuration and memory-resource guidance.

Common types and interoperability

LanguageCommon array typeWhat to watch
CDLManagedTensor*The tensor must describe shape, dtype, device, and layout. The caller owns the memory and lifetime.
C++RAFT mdspan views and mdarray containersC++ APIs usually accept non-owning views. Owning arrays are convenient when C++ should allocate storage.
PythonCuPy, NumPy, PyTorch, TensorFlow, DLPack-compatible, or other array-interface objectsGPU APIs commonly expect CUDA array interface inputs; some build or extend paths can accept host arrays.
JavaCuVSMatrixPrefer CuVSMatrix for matrix-shaped inputs so large datasets can live outside the Java heap.
Rustndarray arrays and ManagedTensorHost arrays can be wrapped and copied to device memory before device-only calls.
Gocuvs.Tensor[T]Tensors wrap DLPack metadata and can be copied between host and device with resource-aware helpers.
Use the specific API page to confirm whether a given argument must be on the GPU or can be in host memory. Search inputs and outputs usually need device memory. Build inputs vary by algorithm and language binding.

Allocating dense device arrays

The examples below allocate a row-major n_rows x n_features float32 matrix that can be used as a dense NVIDIA cuVS input. Python has several interoperability paths: CuPy arrays can be passed directly, PyTorch and TensorFlow tensors can be shared through DLPack, and NumPy arrays are host arrays that need to be copied to device memory before GPU-only calls.

1#include <cuvs/core/c_api.h>
2#include <dlpack/dlpack.h>
3
4#include <stddef.h>
5#include <stdint.h>
6
7void allocate_device_matrix(int64_t n_rows, int64_t n_features)
8{
9 cuvsResources_t res;
10 void* data = NULL;
11 int64_t shape[2] = {n_rows, n_features};
12 size_t bytes = (size_t)n_rows * (size_t)n_features * sizeof(float);
13
14 cuvsResourcesCreate(&res);
15 cuvsRMMAlloc(res, &data, bytes);
16
17 DLManagedTensor dataset = {0};
18 dataset.dl_tensor.data = data;
19 dataset.dl_tensor.device = (DLDevice){kDLCUDA, 0};
20 dataset.dl_tensor.ndim = 2;
21 dataset.dl_tensor.dtype = (DLDataType){kDLFloat, 32, 1};
22 dataset.dl_tensor.shape = shape;
23 dataset.dl_tensor.strides = NULL;
24 dataset.dl_tensor.byte_offset = 0;
25
26 // Pass &dataset to NVIDIA cuVS C APIs while shape and data remain alive.
27
28 cuvsRMMFree(res, data, bytes);
29 cuvsResourcesDestroy(res);
30}

Using dense arrays in cuVS APIs

The examples below all pass a two-dimensional dataset into the same kind of NVIDIA cuVS operation. The array container changes by language, but the logical shape is the same: rows are vectors and columns are features.

1#include <cuvs/core/c_api.h>
2#include <cuvs/neighbors/brute_force.h>
3#include <dlpack/dlpack.h>
4
5cuvsResources_t res;
6cuvsBruteForceIndex_t index;
7DLManagedTensor* dataset;
8
9// dataset describes a dense matrix with shape [n_rows, n_features].
10load_dataset(dataset);
11
12cuvsResourcesCreate(&res);
13cuvsBruteForceIndexCreate(&index);
14
15cuvsBruteForceBuild(res, dataset, L2Expanded, 0.0f, index);
16
17cuvsBruteForceIndexDestroy(index);
18cuvsResourcesDestroy(res);

Passing outputs

Many NVIDIA cuVS APIs allocate outputs for the caller in higher-level bindings and require explicit output arrays in lower-level bindings. Output arrays should have the expected shape before the API call.

The Java search APIs currently allocate output storage inside the binding and return SearchResults, so this section does not include a Java explicit-output example.
1#include <cuvs/core/c_api.h>
2#include <cuvs/neighbors/brute_force.h>
3#include <dlpack/dlpack.h>
4
5#include <stddef.h>
6#include <stdint.h>
7
8void search_with_outputs(cuvsResources_t res,
9 cuvsBruteForceIndex_t index,
10 DLManagedTensor* queries,
11 int64_t n_queries,
12 int64_t k)
13{
14 void* neighbors_data = NULL;
15 void* distances_data = NULL;
16 int64_t output_shape[2] = {n_queries, k};
17 size_t neighbors_bytes = (size_t)n_queries * (size_t)k * sizeof(int64_t);
18 size_t distances_bytes = (size_t)n_queries * (size_t)k * sizeof(float);
19
20 cuvsRMMAlloc(res, &neighbors_data, neighbors_bytes);
21 cuvsRMMAlloc(res, &distances_data, distances_bytes);
22
23 DLManagedTensor neighbors = {0};
24 neighbors.dl_tensor.data = neighbors_data;
25 neighbors.dl_tensor.device = (DLDevice){kDLCUDA, 0};
26 neighbors.dl_tensor.ndim = 2;
27 neighbors.dl_tensor.dtype = (DLDataType){kDLInt, 64, 1};
28 neighbors.dl_tensor.shape = output_shape;
29
30 DLManagedTensor distances = {0};
31 distances.dl_tensor.data = distances_data;
32 distances.dl_tensor.device = (DLDevice){kDLCUDA, 0};
33 distances.dl_tensor.ndim = 2;
34 distances.dl_tensor.dtype = (DLDataType){kDLFloat, 32, 1};
35 distances.dl_tensor.shape = output_shape;
36
37 cuvsFilter no_filter = {.addr = 0, .type = NO_FILTER};
38
39 cuvsBruteForceSearch(
40 res, index, queries, &neighbors, &distances, no_filter);
41
42 // Copy or consume neighbors_data and distances_data before freeing them.
43
44 cuvsRMMFree(res, distances_data, distances_bytes);
45 cuvsRMMFree(res, neighbors_data, neighbors_bytes);
46}

Dense arrays in C++

If you use NVIDIA cuVS from C++, dense inputs and outputs are usually described with RAFT multi-dimensional array types. These types make it clear whether the memory is on the host or device, what shape it has, and whether the NVIDIA cuVS call can write to it.

There are two main families:

  • Non-owning views, such as raft::device_matrix_view, raft::device_vector_view, raft::host_matrix_view, raft::pinned_matrix_view, raft::managed_mdspan, and raft::span. These wrap memory owned somewhere else.
  • Owning arrays, such as raft::device_matrix, raft::device_vector, raft::host_matrix, raft::managed_matrix, and raft::pinned_matrix. These allocate and free their own storage.

Use views when your data already exists in memory. Use owning arrays when you want RAFT to allocate storage for an input, output, or staging buffer.

Creating non-owning views

Use device views when memory already exists on the GPU. Use host views for CPU-resident memory. A view stores a pointer, extents, layout, and accessor, but it does not own the allocation.

1#include <raft/core/device_mdspan.hpp>
2
3#include <cstdint>
4
5void use_existing_device_buffers(float const* dataset_ptr,
6 int64_t* labels_ptr,
7 int64_t n_rows,
8 int64_t n_features)
9{
10 auto dataset = raft::make_device_matrix_view<const float, int64_t, raft::row_major>(
11 dataset_ptr, n_rows, n_features);
12
13 auto labels = raft::make_device_vector_view<int64_t, int64_t>(
14 labels_ptr, n_rows);
15
16 // dataset is a non-owning read-only matrix view.
17 // labels is a non-owning mutable vector view.
18}

Use const element types for read-only inputs. This documents intent and lets C++ APIs reject accidental writes at compile time.

Use raft::make_const_mdspan() when you already have a mutable view or owning array and need to pass it to an API as read-only:

1#include <raft/core/device_mdarray.hpp>
2#include <raft/core/mdspan.hpp>
3#include <raft/core/resources.hpp>
4
5#include <cstdint>
6
7void use_read_only_view(raft::device_resources const& res,
8 int64_t n_rows,
9 int64_t n_features)
10{
11 auto dataset = raft::make_device_matrix<float, int64_t>(
12 res, n_rows, n_features);
13
14 auto mutable_view = dataset.view();
15 auto read_only_view = raft::make_const_mdspan(mutable_view);
16
17 // Pass read_only_view to APIs that should not modify the dataset.
18}

Creating owning arrays

Use device_matrix and device_vector when RAFT should allocate GPU memory. Use managed arrays when both host and device code need to access the same allocation through CUDA Unified Memory. Device, managed, and pinned arrays work with the memory policies described in Memory Management.

1#include <raft/core/device_mdarray.hpp>
2#include <raft/core/resources.hpp>
3
4#include <cstdint>
5
6void allocate_device_arrays(raft::device_resources const& res,
7 int64_t n_rows,
8 int64_t n_features,
9 int64_t k)
10{
11 auto dataset = raft::make_device_matrix<float, int64_t>(
12 res, n_rows, n_features);
13
14 auto neighbors = raft::make_device_matrix<int64_t, int64_t>(
15 res, n_rows, k);
16
17 auto distances = raft::make_device_matrix<float, int64_t>(
18 res, n_rows, k);
19
20 auto dataset_view = dataset.view();
21 auto neighbors_view = neighbors.view();
22 auto distances_view = distances.view();
23}

An owning mdarray should outlive any view created from it. Passing dataset.view() to an API does not transfer ownership.

Creating one-dimensional spans

Use raft::span for simple one-dimensional buffers when the shape does not need matrix or vector metadata. For NVIDIA cuVS public APIs, prefer device_vector_view or host_vector_view when the memory space matters.

For most one-dimensional NVIDIA cuVS API arguments, prefer device_vector and host_vector types over span. Vector aliases provide the best API unity with higher-dimensional RAFT array abstractions.
1#include <raft/core/span.hpp>
2
3#include <cstddef>
4#include <cstdint>
5
6void normalize_ids(int64_t* ids, std::size_t n_ids)
7{
8 raft::span<int64_t> id_span(ids, n_ids);
9
10 for (auto& id : id_span) {
11 if (id < 0) { id = 0; }
12 }
13}

span is one-dimensional and does not encode row count, column count, layout, or memory space. Use it for lightweight buffer utilities, not for matrix-shaped NVIDIA cuVS inputs.

How dense arrays work

mdspan and span are views. They do not allocate memory and do not free memory. They only describe existing memory.

mdarray is an owning container. It allocates memory in a specific memory space and releases that memory when the object is destroyed. The .view() method returns an mdspan that refers to the same allocation.

The most important properties are:

  • Memory space: device, host, managed, or pinned host.
  • Shape: vector length, matrix rows and columns, or higher-dimensional extents.
  • Layout: usually raft::row_major for NVIDIA cuVS matrices unless an API explicitly requests another layout.
  • Constness: read-only inputs should use const element types or raft::make_const_mdspan().
  • Lifetime: views must not outlive the memory they describe.

Choosing C++ array types

TypeOwns memory?Typical use
raft::device_matrix_view, raft::device_vector_view, raft::device_scalar_viewNoGPU inputs and outputs already allocated by RAFT, RMM, CuPy, or another CUDA-aware library.
raft::host_matrix_view, raft::host_vector_view, raft::host_scalar_viewNoCPU-resident buffers that are passed into host-side APIs or copied to device arrays.
raft::pinned_matrix_view, raft::pinned_vector_view, raft::pinned_scalar_viewNoExisting page-locked host buffers used for transfers or host-device coordination.
raft::managed_mdspanNoExisting CUDA Unified Memory allocations that need a non-owning RAFT view.
raft::spanNoOne-dimensional utility buffers where matrix shape and memory-space metadata are unnecessary.
raft::device_matrix, raft::device_vector, raft::device_scalarYesOwning GPU allocations for NVIDIA cuVS C++ inputs, outputs, temporary arrays, and indexes.
raft::host_matrix, raft::host_vector, raft::host_scalarYesOwning CPU allocations for host data, small results, and CPU-side staging.
raft::managed_matrix, raft::managed_vector, raft::managed_scalarYesOwning CUDA Unified Memory allocations that can be accessed from host and device when that trade-off is useful.
raft::pinned_matrix, raft::pinned_vector, raft::pinned_scalarYesOwning page-locked host allocations for repeated host-device transfers or host-device coordination.
raft::device_mdarray, raft::host_mdarray, raft::managed_mdarray, raft::pinned_mdarrayYesGeneric owning arrays for ranks beyond scalar, vector, and matrix aliases.

Using arrays safely

Check the API page for the expected shape, dtype, layout, and memory location before passing an array. Most NVIDIA cuVS matrices are row-major unless the API says otherwise.

Keep the backing allocation alive for as long as a view is used. A view does not own memory, so destroying the original allocation makes the view invalid.

Allocate output arrays with the exact shape requested by the API when the binding requires explicit outputs. For top-k search, that usually means n_queries x k arrays for neighbors and distances.

Synchronize appropriately before reading data on the host. Many NVIDIA cuVS operations enqueue GPU work asynchronously on the stream owned by raft::device_resources.

Use pinned host arrays when repeated host-device transfers are important. Ordinary host arrays are simpler and are usually the right choice for CPU-only data.