1. NVIDIA cuObject Server Release Notes#
Release information for NVIDIA® cuObject server.
2. Release information#
Version |
Date |
cuObject server libraries with CUDA Release |
|---|---|---|
v1.2.0 |
May 26, 2026 |
13.3 |
v1.0.0 |
Jan 12, 2026 |
13.1.1 |
3. Introduction#
Release information for NVIDIA® cuObject server for developers and users.
cuObject is a high-performance suite of libraries designed to enable direct data transfers between GPU memory or system memory and an object storage (S3-compatible) solution via RDMA. By relying on RDMA operations, rather than TCP transfer methods, cuObject avoids using CPU kernel code for TCP processing and can bypass the CPU for data payloads. cuObject eliminates the traditional bottleneck of staging data in local scratch file systems, enabling high-throughput data ingestion for AI training and inference at scale.
3.1. Architecture overview#
cuObject consists of two primary components:
cuObjClient: Provides client-side APIs for GET and PUT operations with user-defined callbacks for control path communication, while data path operations leverage RDMA for high-performance transfers.
cuObjServer: Implements the RDMA-accelerated server side for S3-compatible object storage services, supporting multi-threaded concurrency, automatic connection management, and Dynamically Connected (DC) transport.
Refer to the following guides for more information about cuObject:
4. Key features and changes#
4.1. v1.2.0#
Following features have been added in v1.2.0:
IPv6 support
Support max_rd_atomic parameter in the RDMA configuration
4.2. v1.0.0#
Following features have been added in v1.0.0:
Multi-threaded, channel-based concurrency
Scatter-gather I/O support (up to 10 entries)
Asynchronous operation and polling
5. Known limitations#
5.1. Memory and size constraints#
Maximum operation size: 1 GiB per
handleGetObject()orhandlePutObject()callMaximum scatter-gather entries: 10 entries per operation
Maximum SGE per operation: configurable via
max_sgeparameter (default: 10)Async operations:
poll()capsmax_eventsto16MaxRdAtomic: For optimal PUT performance using
RDMA_READ, setMaxRdAtomicby usingsetMaxRdAtomic(16);0enables auto-detection
6. Getting started#
6.1. Hardware requirements#
x86_64 or ARM64 CPU architecture
Mellanox ConnectX-5 and above InfiniBand HCA, or RoCE-capable Ethernet adapter
Minimum 16 GB system RAM recommended
6.2. Software requirements#
Linux operating system (tested on Ubuntu 22.04 and 24.04, RHEL 8, 9, and 10)
InfiniBand and RDMA drivers and libraries (
libibverbs,librdmacm)C++14 or later compatible compiler
6.3. Installation#
6.3.1. Install DOCA and RDMA libraries#
Use the DOCA installation guide to install DOCA.
6.3.2. Install cuObject server library#
Follow the instructions from cuObject Server library downloads.
6.3.3. Verify installation#
# Check library presence
ldconfig -p | grep cuobj
# Verify cuObjServer installation
rpm -qa | grep cuobjserver # RHEL/CentOS
dpkg -l | grep cuobjserver # Ubuntu/Debian
7. Fixed issues#
7.1. v1.2.0#
Fixed
app_wr_id_dciallocation to use the configuredcq_depthinstead of a fixed send queue depth.Added
max_rd_atomichandling so that the RDMA READ depth is configurable or auto-detected instead of fixed at1.Improved RDMA error reporting to include actual
errnostrings in many failure paths.
7.2. v1.0.0#
This is the initial release of cuObject server v1.0.0. No prior versions exist.
8. Open issues#
None reported for the v1.0.0 initial release.