For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
GitHubCUDA-X
    • Home
    • Installation
  • Getting Started
    • Introduction
    • Integrations
    • Use-cases
  • User Guide
    • API Guide
      • Common Types
        • Array Types
        • Memory Management
        • Multi-GPU
        • Resources
      • Clustering Guide
      • Indexing Guide
      • Preprocessing Guide
      • Other APIs
    • Benchmarking Guide
    • Compatibility
    • Integration Patterns
    • Advanced Topics
    • References
  • Developer Guide
    • Coding Guidelines
    • ABI Stability
    • Link-time Optimization
    • Contributing
  • API Reference
    • C API Documentation
    • Cpp API Documentation
    • Python API Documentation
    • Java API Documentation
    • Rust API Documentation
    • Go API Documentation
NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Manage My Privacy | Do Not Sell or Share My Data | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.

LogoLogocuVS
GitHubCUDA-X
On this page
  • Common Concepts
  • Example API Usage
  • Creating and reusing resources
  • Synchronizing GPU work
  • Multi-threaded Applications
  • Using an Application CUDA Stream
  • Optional Advanced Configuration
  • Stream pools
  • Workspace resources
  • Important Resource Types
  • Practical Guidance
User GuideAPI GuideCommon Types

Resources

||View as Markdown|
Previous

Multi-GPU

Next

Clustering Guide

NVIDIA cuVS APIs use a resources object to keep track of CUDA execution state that should be reused across related calls. In C++ this is usually raft::device_resources; in C and most language bindings it appears as a wrapper around cuvsResources_t.

For simple examples, the default resource behavior is usually enough. Create and pass an explicit resources object when you want to chain several operations together, control the CUDA stream used by NVIDIA cuVS, reuse expensive CUDA library handles, configure temporary memory, or keep setup costs out of repeated calls.

Common Concepts

A CUDA stream is an ordered queue of GPU work. Kernel launches, copies, and library calls enqueued into the same stream run in order. Work in different streams can overlap when the GPU and workload allow it.

Most NVIDIA cuVS algorithms enqueue work on the stream stored in the resources object and return control to the host before that GPU work has necessarily finished. This is intentional: it lets users chain operations without forcing a synchronization point between every call.

Synchronization means waiting until queued GPU work has completed. CUDA provides explicit synchronization, and the CUDA Runtime API includes cudaStreamSynchronize() for waiting on one stream. NVIDIA cuVS resource wrappers expose stream synchronization helpers so users do not need to fetch the raw CUDA stream in common cases.

A stream pool is a small group of CUDA streams attached to a resources object. Some algorithms can use a stream pool to run independent pieces of work concurrently. Most users do not need to configure one unless an algorithm guide or benchmark suggests it.

Workspace memory is temporary memory used inside an algorithm. Configuring workspace resources can make allocation behavior more predictable for allocation-heavy workloads. See Memory Management for allocator choices such as device pools, pinned host memory, managed memory, and host memory.

Example API Usage

C resources API | Python resources API | Java resources API | Rust resources API | Go resources API

Creating and reusing resources

Create one resources object near the beginning of a workflow and pass it to related NVIDIA cuVS calls. Reusing the same object keeps those calls ordered on the same CUDA stream and lets NVIDIA cuVS reuse expensive state.

C
C++
Python
Java
Rust
Go
1#include <cuvs/core/c_api.h>
2
3int main(void)
4{
5 cuvsResources_t resources;
6
7 if (cuvsResourcesCreate(&resources) != CUVS_SUCCESS) { return 1; }
8
9 // Pass resources to cuVS C APIs, such as index build and search calls.
10
11 if (cuvsStreamSync(resources) != CUVS_SUCCESS) { return 1; }
12 if (cuvsResourcesDestroy(resources) != CUVS_SUCCESS) { return 1; }
13
14 return 0;
15}

Synchronizing GPU work

Most NVIDIA cuVS algorithms do not synchronize before returning. This lets you enqueue several operations back-to-back on the same resources object, such as build, search, refine, and copy, without stopping the GPU between steps.

Synchronize explicitly before reading results on the host, measuring elapsed time, handing data to work on a different CUDA stream without another dependency, or destroying objects that may still be used by queued GPU work.

C
C++
Python
Rust
Go
1#include <cuvs/core/c_api.h>
2
3cuvsResources_t resources;
4cuvsResourcesCreate(&resources);
5
6// Enqueue one or more cuVS C API calls on resources.
7
8cuvsStreamSync(resources);
9cuvsResourcesDestroy(resources);

In Python, many APIs create and synchronize an internal resources object when you do not pass one. When you pass your own Resources, you are also taking responsibility for calling resources.sync() at the point where your application needs the results to be complete.

Multi-threaded Applications

raft::resources and the resource wrappers built on top of it are not thread-safe for concurrent use. If an application uses multiple host threads, give each worker thread its own resources object. This keeps each thread’s CUDA stream, library handles, temporary memory, and queued work separate.

Do not share one resources object across threads unless the application uses its own locking and understands that the lock serializes access to that resource. For most workloads, one resources object per worker thread is simpler and faster.

C++
Python
Java
1#include <raft/core/device_resources.hpp>
2#include <raft/core/resource/cuda_stream.hpp>
3
4#include <thread>
5#include <vector>
6
7void worker(int worker_id)
8{
9 raft::device_resources resources;
10
11 // Run this thread's cuVS work with its own resources object.
12 run_cuvs_workload(worker_id, resources);
13
14 raft::resource::sync_stream(resources);
15}
16
17int main()
18{
19 std::vector<std::thread> threads;
20 for (int i = 0; i < 4; ++i) {
21 threads.emplace_back(worker, i);
22 }
23
24 for (auto& thread : threads) {
25 thread.join();
26 }
27}

Using an Application CUDA Stream

Most users can let NVIDIA cuVS create and own the CUDA stream inside the resources object. Supplying an application stream is useful when NVIDIA cuVS work must be ordered with other CUDA kernels, copies, or library calls from the same application.

C
C++
Python
Rust
Go
1#include <cuvs/core/c_api.h>
2#include <cuda_runtime_api.h>
3
4cudaStream_t stream;
5cuvsResources_t resources;
6
7cudaStreamCreate(&stream);
8cuvsResourcesCreate(&resources);
9cuvsStreamSet(resources, stream);
10
11// cuVS calls using resources are enqueued on stream.
12
13cuvsStreamSync(resources);
14cuvsResourcesDestroy(resources);
15cudaStreamDestroy(stream);

Optional Advanced Configuration

Stream pools

A stream pool is useful only when an algorithm has independent work that can run concurrently. For example, an algorithm might search several independent shards or launch separate preprocessing work on different streams. Start with the default behavior, then configure a small stream pool only when benchmarks show it helps.

C++
1#include <raft/core/device_resources.hpp>
2#include <raft/core/resource/cuda_stream_pool.hpp>
3#include <rmm/cuda_stream_pool.hpp>
4
5#include <memory>
6
7raft::device_resources resources;
8
9auto stream_pool = std::make_shared<rmm::cuda_stream_pool>(4);
10raft::resource::set_cuda_stream_pool(resources, stream_pool);
11
12// Algorithms that use the stream pool can now run independent work
13// across up to four CUDA streams.

Workspace resources

Workspace resources control where algorithms allocate temporary buffers. The default behavior is usually fine for small examples, but production workloads can benefit from configuring workspace memory explicitly before arrays, indexes, or algorithm state are created.

NVIDIA cuVS uses two related workspace concepts:

  • The workspace resource handles ordinary temporary allocations used during a computation.
  • The large workspace resource handles bigger temporary allocations that should stay separate from the ordinary workspace pool.

For allocator choices such as device pools, pinned host memory, managed memory, and host memory, see the Memory Management guide.

C++
1#include <raft/core/device_resources.hpp>
2#include <raft/core/resource/workspace_resource.hpp>
3
4raft::device_resources resources;
5
6// Give ordinary temporary allocations a bounded pool.
7raft::resource::set_workspace_to_pool_resource(resources, 2 * 1024 * 1024 * 1024ull);
8
9// Call cuVS algorithms after the workspace has been configured.

The large workspace resource is the RAFT-side hook for what many applications treat as a big memory resource: memory reserved for especially large temporary buffers. This is useful in services or benchmarking harnesses that run workloads with very different temporary memory shapes.

C++
1#include <raft/core/device_resources.hpp>
2#include <raft/core/resource/workspace_resource.hpp>
3
4raft::device_resources resources;
5
6auto large_workspace = get_or_create_shared_large_workspace_resource();
7raft::resource::set_large_workspace_resource(
8 resources, raft::mr::device_resource{large_workspace});
9
10// Run algorithms that may need large temporary allocations.

Important Resource Types

TypeWhere it appearsPurpose
raft::device_resourcesSingle-GPU C++ APIsThe usual C++ resources object for one GPU.
raft::resourcesLower-level C++ RAFT and NVIDIA cuVS APIsA resource container used by C++ algorithms and advanced applications.
raft::device_resources_snmgSingle-node multi-GPU C++ APIsA convenience layer for one process controlling multiple GPUs. See Multi-GPU.
cuvsResources_tC API and language bindingsOpaque handle over RAFT resources for ABI-stable bindings.
ResourcesPython and RustLanguage wrapper around cuvsResources_t.
ResourceGoGo wrapper around cuvsResources_t.
CuVSResourcesJavaJava AutoCloseable wrapper around native NVIDIA cuVS resources.

Practical Guidance

Use the default resources behavior for simple one-off calls.

Create and reuse a resources object when a workflow has multiple related calls. Operations enqueued on the same resources object run in stream order, so you can chain work and synchronize once at the end.

Use one resources object per host thread. Resources are not thread-safe for concurrent use.

Synchronize explicitly before the host reads GPU results or before measuring runtime. Avoid synchronizing between every NVIDIA cuVS call unless the application actually needs the intermediate result on the host.

Configure memory resources, workspace resources, and stream pools before allocating inputs or building indexes. Changing resource configuration halfway through a workflow makes ownership and lifetime harder to reason about.

For multi-GPU workloads, choose the resource type first: raft::device_resources_snmg for single-node multi-GPU, or a RAFT resources object with an NCCL communicator for multi-node work. The Multi-GPU guide covers those setup patterns.