Resources | cuVS

NVIDIA cuVS APIs use a resources object to keep track of CUDA execution state that should be reused across related calls. In C++ this is usually raft::device_resources; in C and most language bindings it appears as a wrapper around cuvsResources_t.

For simple examples, the default resource behavior is usually enough. Create and pass an explicit resources object when you want to chain several operations together, control the CUDA stream used by NVIDIA cuVS, reuse expensive CUDA library handles, configure temporary memory, or keep setup costs out of repeated calls.

Common Concepts

A CUDA stream is an ordered queue of GPU work. Kernel launches, copies, and library calls enqueued into the same stream run in order. Work in different streams can overlap when the GPU and workload allow it.

Most NVIDIA cuVS algorithms enqueue work on the stream stored in the resources object and return control to the host before that GPU work has necessarily finished. This is intentional: it lets users chain operations without forcing a synchronization point between every call.

Synchronization means waiting until queued GPU work has completed. CUDA provides explicit synchronization, and the CUDA Runtime API includes cudaStreamSynchronize() for waiting on one stream. NVIDIA cuVS resource wrappers expose stream synchronization helpers so users do not need to fetch the raw CUDA stream in common cases.

A stream pool is a small group of CUDA streams attached to a resources object. Some algorithms can use a stream pool to run independent pieces of work concurrently. Most users do not need to configure one unless an algorithm guide or benchmark suggests it.

Workspace memory is temporary memory used inside an algorithm. Configuring workspace resources can make allocation behavior more predictable for allocation-heavy workloads. See Memory Management for allocator choices such as device pools, pinned host memory, managed memory, and host memory.

Example API Usage

C resources API | Python resources API | Java resources API | Rust resources API | Go resources API

Creating and reusing resources

Create one resources object near the beginning of a workflow and pass it to related NVIDIA cuVS calls. Reusing the same object keeps those calls ordered on the same CUDA stream and lets NVIDIA cuVS reuse expensive state.

C

C++

Python

Java

Rust

Go

1 #include <cuvs/core/c_api.h>
2 
3 int main(void)
4 {
5   cuvsResources_t resources;
6 
7   if (cuvsResourcesCreate(&resources) != CUVS_SUCCESS) { return 1; }
8 
9   // Pass resources to cuVS C APIs, such as index build and search calls.
10 
11   if (cuvsStreamSync(resources) != CUVS_SUCCESS) { return 1; }
12   if (cuvsResourcesDestroy(resources) != CUVS_SUCCESS) { return 1; }
13 
14   return 0;
15 }

Synchronizing GPU work

Most NVIDIA cuVS algorithms do not synchronize before returning. This lets you enqueue several operations back-to-back on the same resources object, such as build, search, refine, and copy, without stopping the GPU between steps.

Synchronize explicitly before reading results on the host, measuring elapsed time, handing data to work on a different CUDA stream without another dependency, or destroying objects that may still be used by queued GPU work.

C

C++

Python

Rust

Go

1 #include <cuvs/core/c_api.h>
2 
3 cuvsResources_t resources;
4 cuvsResourcesCreate(&resources);
5 
6 // Enqueue one or more cuVS C API calls on resources.
7 
8 cuvsStreamSync(resources);
9 cuvsResourcesDestroy(resources);

In Python, many APIs create and synchronize an internal resources object when you do not pass one. When you pass your own Resources, you are also taking responsibility for calling resources.sync() at the point where your application needs the results to be complete.

Multi-threaded Applications

raft::resources and the resource wrappers built on top of it are not thread-safe for concurrent use. If an application uses multiple host threads, give each worker thread its own resources object. This keeps each thread’s CUDA stream, library handles, temporary memory, and queued work separate.

Do not share one resources object across threads unless the application uses its own locking and understands that the lock serializes access to that resource. For most workloads, one resources object per worker thread is simpler and faster.

C++

Python

Java

1 #include <raft/core/device_resources.hpp>
2 #include <raft/core/resource/cuda_stream.hpp>
3 
4 #include <thread>
5 #include <vector>
6 
7 void worker(int worker_id)
8 {
9   raft::device_resources resources;
10 
11   // Run this thread's cuVS work with its own resources object.
12   run_cuvs_workload(worker_id, resources);
13 
14   raft::resource::sync_stream(resources);
15 }
16 
17 int main()
18 {
19   std::vector<std::thread> threads;
20   for (int i = 0; i < 4; ++i) {
21     threads.emplace_back(worker, i);
22   }
23 
24   for (auto& thread : threads) {
25     thread.join();
26   }
27 }

Using an Application CUDA Stream

Most users can let NVIDIA cuVS create and own the CUDA stream inside the resources object. Supplying an application stream is useful when NVIDIA cuVS work must be ordered with other CUDA kernels, copies, or library calls from the same application.

C

C++

Python

Rust

Go

1 #include <cuvs/core/c_api.h>
2 #include <cuda_runtime_api.h>
3 
4 cudaStream_t stream;
5 cuvsResources_t resources;
6 
7 cudaStreamCreate(&stream);
8 cuvsResourcesCreate(&resources);
9 cuvsStreamSet(resources, stream);
10 
11 // cuVS calls using resources are enqueued on stream.
12 
13 cuvsStreamSync(resources);
14 cuvsResourcesDestroy(resources);
15 cudaStreamDestroy(stream);

Optional Advanced Configuration

Stream pools

A stream pool is useful only when an algorithm has independent work that can run concurrently. For example, an algorithm might search several independent shards or launch separate preprocessing work on different streams. Start with the default behavior, then configure a small stream pool only when benchmarks show it helps.

C++

1 #include <raft/core/device_resources.hpp>
2 #include <raft/core/resource/cuda_stream_pool.hpp>
3 #include <rmm/cuda_stream_pool.hpp>
4 
5 #include <memory>
6 
7 raft::device_resources resources;
8 
9 auto stream_pool = std::make_shared<rmm::cuda_stream_pool>(4);
10 raft::resource::set_cuda_stream_pool(resources, stream_pool);
11 
12 // Algorithms that use the stream pool can now run independent work
13 // across up to four CUDA streams.

Workspace resources

Workspace resources control where algorithms allocate temporary buffers. The default behavior is usually fine for small examples, but production workloads can benefit from configuring workspace memory explicitly before arrays, indexes, or algorithm state are created.

NVIDIA cuVS uses two related workspace concepts:

The workspace resource handles ordinary temporary allocations used during a computation.
The large workspace resource handles bigger temporary allocations that should stay separate from the ordinary workspace pool.

For allocator choices such as device pools, pinned host memory, managed memory, and host memory, see the Memory Management guide.

C++

1 #include <raft/core/device_resources.hpp>
2 #include <raft/core/resource/workspace_resource.hpp>
3 
4 raft::device_resources resources;
5 
6 // Give ordinary temporary allocations a bounded pool.
7 raft::resource::set_workspace_to_pool_resource(resources, 2 * 1024 * 1024 * 1024ull);
8 
9 // Call cuVS algorithms after the workspace has been configured.

The large workspace resource is the RAFT-side hook for what many applications treat as a big memory resource: memory reserved for especially large temporary buffers. This is useful in services or benchmarking harnesses that run workloads with very different temporary memory shapes.

C++

1 #include <raft/core/device_resources.hpp>
2 #include <raft/core/resource/workspace_resource.hpp>
3 
4 raft::device_resources resources;
5 
6 auto large_workspace = get_or_create_shared_large_workspace_resource();
7 raft::resource::set_large_workspace_resource(
8     resources, raft::mr::device_resource{large_workspace});
9 
10 // Run algorithms that may need large temporary allocations.

Important Resource Types

Type	Where it appears	Purpose
`raft::device_resources`	Single-GPU C++ APIs	The usual C++ resources object for one GPU.
`raft::resources`	Lower-level C++ RAFT and NVIDIA cuVS APIs	A resource container used by C++ algorithms and advanced applications.
`raft::device_resources_snmg`	Single-node multi-GPU C++ APIs	A convenience layer for one process controlling multiple GPUs. See Multi-GPU.
`cuvsResources_t`	C API and language bindings	Opaque handle over RAFT resources for ABI-stable bindings.
`Resources`	Python and Rust	Language wrapper around `cuvsResources_t`.
`Resource`	Go	Go wrapper around `cuvsResources_t`.
`CuVSResources`	Java	Java `AutoCloseable` wrapper around native NVIDIA cuVS resources.

Practical Guidance

Use the default resources behavior for simple one-off calls.

Create and reuse a resources object when a workflow has multiple related calls. Operations enqueued on the same resources object run in stream order, so you can chain work and synchronize once at the end.

Use one resources object per host thread. Resources are not thread-safe for concurrent use.

Synchronize explicitly before the host reads GPU results or before measuring runtime. Avoid synchronizing between every NVIDIA cuVS call unless the application actually needs the intermediate result on the host.

Configure memory resources, workspace resources, and stream pools before allocating inputs or building indexes. Changing resource configuration halfway through a workflow makes ownership and lifetime harder to reason about.

For multi-GPU workloads, choose the resource type first: raft::device_resources_snmg for single-node multi-GPU, or a RAFT resources object with an NCCL communicator for multi-node work. The Multi-GPU guide covers those setup patterns.