1. cuObjServer API Specification v1.0.0#
1.1. Overview#
The cuObjServer library provides server-side APIs for handling RDMA-based object operations with GPUDirect Storage support. It enables high-performance data transfers through RDMA protocols, specifically RDMA Dynamically Connected (DC) transports.
1.1.1. Key features#
Server-side RDMA connection management
Memory registration for RDMA transfers (single buffer and scatter-gather)
Synchronous and asynchronous RDMA operations (GET and PUT)
Multi-threaded operation support with channel management
Configurable RDMA tuning parameters
Telemetry and logging capabilities
Maximum operation size: 1 GiB per
handleGetObjectandhandlePutObjectcall
1.1.2. Protocol support#
CUOBJ_PROTO_RDMA_DC_V1(RDMA Dynamically Connected version 1)
1.2. I/O flow#
The cuObjServer library follows a connection-based architecture with channel multiplexing:
Server initialization: Create
cuObjServerwith IP, port, and protocol.Connection setup: Establish an RDMA connection using InfiniBand verbs.
Memory registration: Register host buffers for RDMA operations.
Channel allocation: Allocate unique channels for multi-threaded access.
RDMA operations: Handle GET (RDMA_WRITE) and PUT (RDMA_READ) requests.
Completion: Poll for asynchronous operation completions.
Cleanup: Deregister buffers, free channels, close the connection.
1.2.1. Key concepts#
GET operation: Server performs
RDMA_WRITEto client memory.PUT operation: Server performs
RDMA_READfrom client memory.Channels: Isolated RDMA queues for lock-free concurrent access.
Scatter-gather: Support for non-contiguous memory regions.
1.3. Core Types and Enumerations#
1.3.1. Error types#
typedef enum cuObjErr_enum {
CU_OBJ_SUCCESS = 0, /* Operation successfully completed */
CU_OBJ_FAIL = 1 /* Operation failed */
} cuObjErr_t;
1.3.2. Protocol types#
typedef enum cuObjProto_enum {
CUOBJ_PROTO_RDMA_DC_V1 = 1001, /* RDMA Dynamically Connected version 1 */
CUOBJ_PROTO_MAX
} cuObjProto_t;
1.3.3. Operation types#
typedef enum cuObjOpType_enum {
CUOBJ_GET = 0, /* GET operation (server writes to client) */
CUOBJ_PUT = 1, /* PUT operation (server reads from client) */
CUOBJ_INVALID = 9999
} cuObjOpType_t;
1.3.4. Channel ID states#
typedef enum cuObjChannelIdState_enum {
CHANNEL_ID_FREE = 0, /* Channel is available for allocation */
CHANNEL_ID_ALLOCATED = 1, /* Channel is allocated to a thread */
CHANNEL_ID_IN_USE = 2, /* Channel is actively processing operations */
INVALID_CHANNEL_ID = UINT16_MAX /* Invalid channel ID marker */
} cuObjChannelIdState_t;
1.3.5. Delay modes#
typedef enum cuObjDelayMode {
CUOBJ_DELAY_NONE = 0, /* No delay between polling attempts */
CUOBJ_DELAY_BATCH = 1, /* Delay after each batch of polling attempts (default) */
CUOBJ_DELAY_ENTRY = 2, /* Delay after each polling attempt */
CUOBJ_DELAY_ADAPTIVE = 3, /* Adaptively adjust delay between polling attempts */
CUOBJ_DELAY_INVALID = 4 /* Invalid delay mode */
} cuObjDelayMode_t;
Note
Delay modes are used in cuObjRDMATunableParam.delay_mode to control polling behavior.
CUOBJ_DELAY_BATCH is the default and recommended for most workloads.
1.4. RDMAConnection Class#
1.4.1. Class declaration#
RDMAConnection is a base class for RDMA connection management.
class RDMAConnection {
public:
cuObjRDMATunable params;
RDMAConnection(const char* ip, unsigned short port);
~RDMAConnection();
int startRDMASession();
void initRDMAConfigParams(cuObjRDMATunable config_params);
void closeRDMASession();
void handleDisconnectEvent(struct rdma_cm_id* id);
char* getChannelIP(struct rdma_cm_id* id);
int getChannelPort(struct rdma_cm_id* id);
void getConfigTunableParam(struct cuObjRDMATunableParam* param);
protected:
const char* myip;
struct sockaddr_in addr;
struct rdma_device* rdma_dev;
};
1.4.2. Notes#
cuObjServerinherits fromRDMAConnection.Manages RDMA device and connection lifecycle.
Provides configuration and status query methods.
1.5. cuObjServer Class API#
1.5.1. Class declaration#
class cuObjServer : public RDMAConnection {
public:
/* Constructors and destructor */
cuObjServer(const char* ip, unsigned short port, unsigned proto);
cuObjServer(const char* ip, unsigned short port, unsigned proto,
cuObjRDMATunable params);
cuObjServer() = default;
~cuObjServer();
/* Memory management */
void* allocHostBuffer(size_t size);
struct rdma_buffer* registerBuffer(void* ptr, size_t size);
struct rdma_buffer* registerBuffer(std::vector<cuObjScatterGatherEntry_t> sglist);
void deRegisterBuffer(struct rdma_buffer* rdma_buff);
/* Scatter-gather support */
struct rdma_buffer* getRDMABufferFromSgList(
struct rdma_buffer* rdma_buffer,
std::vector<cuObjScatterGatherEntryWithOffset_t> sglist);
void putRDMABufferFromSgList(struct rdma_buffer* rdma_buffer);
/* I/O operations (synchronous) */
ssize_t handleGetObject(const std::string& key,
struct rdma_buffer* local_rdma_buff,
uint64_t remote_buf_start, size_t size,
const std::string& rdma_descr,
uint16_t channel = 0, uint64_t local_offset = 0,
ibv_wc_status* status = nullptr,
void* async_handle = nullptr);
ssize_t handlePutObject(const std::string& key,
struct rdma_buffer* local_rdma_buff,
uint64_t remote_buf_start, size_t size,
const std::string& rdma_descr,
uint16_t channel = 0, uint64_t local_offset = 0,
ibv_wc_status* status = nullptr,
void* async_handle = nullptr);
/* I/O operations (with poll delay) */
ssize_t handleGetObject(const std::string& key,
struct rdma_buffer* local_rdma_buff,
uint64_t remote_buf_start, size_t size,
const std::string& rdma_descr,
uint32_t poll_delay,
uint16_t channel = 0, uint64_t local_offset = 0,
ibv_wc_status* status = nullptr,
void* async_handle = nullptr);
ssize_t handlePutObject(const std::string& key,
struct rdma_buffer* local_rdma_buff,
uint64_t remote_buf_start, size_t size,
const std::string& rdma_descr,
uint32_t poll_delay,
uint16_t channel = 0, uint64_t local_offset = 0,
ibv_wc_status* status = nullptr,
void* async_handle = nullptr);
/* Asynchronous operations */
int poll(cuObjAsyncEvent_t* events, int max_events, uint16_t channel = 0);
/* Channel management */
uint16_t allocateChannelId();
void freeChannelId(uint16_t channel_id);
/* Connection status */
bool isConnected();
/* Static telemetry */
static void setupTelemetry(bool use_OTEL, std::ostream* os);
static void shutdownTelemetry();
static void setTelemFlags(unsigned flags);
};
1.5.2. Constructors#
1.5.2.1. Basic constructor#
Creates a cuObjServer with default RDMA parameters.
cuObjServer(const char* ip, unsigned short port, unsigned proto);
Parameters
ip: Server IP address for RDMA connection.port: Server port for RDMA connection.proto: RDMA protocol (typicallyCUOBJ_PROTO_RDMA_DC_V1).
Notes
Uses default tunable parameters.
Automatically starts the RDMA session.
Call
isConnected()after construction to verify success.
1.5.2.2. Advanced constructor with tuning parameters#
Creates a cuObjServer with custom RDMA configuration.
cuObjServer(const char* ip, unsigned short port, unsigned proto,
cuObjRDMATunable params);
Parameters
ip: Server IP address for RDMA connection.port: Server port for RDMA connection.proto: RDMA protocol (typicallyCUOBJ_PROTO_RDMA_DC_V1).params: RDMA tuning parameters (see cuObjRDMATunableParam).
1.5.3. Memory Management APIs#
1.5.3.1. allocHostBuffer#
Allocates a 4 KB aligned host buffer suitable for RDMA operations.
void* allocHostBuffer(size_t size);
Parameters
size: Size of buffer to allocate in bytes.
Returns
Pointer to allocated buffer on success.
NULLon failure.
Notes
Buffer is aligned to 4 KB boundaries for performance.
Caller is responsible for freeing the buffer using
free().
1.5.3.2. registerBuffer (single buffer)#
Registers a contiguous host buffer for RDMA operations.
struct rdma_buffer* registerBuffer(void* ptr, size_t size);
Parameters
ptr: Start address of host buffer.size: Size of buffer in bytes.
Returns
Opaque RDMA handle for use in
handleGetObjectandhandlePutObject.NULLon failure.
Notes
Buffer must remain valid until
deRegisterBuffer()is called.Must be host memory (system RAM).
1.5.3.3. registerBuffer (scatter-gather list)#
Registers a scatter-gather list of non-contiguous host buffers.
struct rdma_buffer* registerBuffer(std::vector<cuObjScatterGatherEntry_t> sglist);
Parameters
sglist: Vector of scatter-gather entries (maximum 10 entries).
Returns
Opaque RDMA handle for use in
handleGetObjectandhandlePutObject.NULLon failure.
Notes
Total size is the sum of all entry sizes.
When used with
handleGetObjectandhandlePutObject,local_offsetis ignored.
1.5.3.4. getRDMABufferFromSgList#
Creates an RDMA buffer view from a registered buffer using scatter-gather offsets.
struct rdma_buffer* getRDMABufferFromSgList(
struct rdma_buffer* rdma_buffer,
std::vector<cuObjScatterGatherEntryWithOffset_t> sglist);
Parameters
rdma_buffer: Previously registered RDMA buffer.sglist: Vector specifying offset and size within the registered buffer.
Returns
New opaque RDMA handle representing the scatter-gather view.
NULLon failure.
Notes
Creates a derived view without re-registering memory.
Must call
putRDMABufferFromSgList()to free the view.
1.5.3.5. putRDMABufferFromSgList#
Frees an RDMA buffer view created by getRDMABufferFromSgList.
void putRDMABufferFromSgList(struct rdma_buffer* rdma_buffer);
Parameters
rdma_buffer: RDMA buffer returned bygetRDMABufferFromSgList().
Notes
Does not deregister the underlying buffer.
1.5.3.6. deRegisterBuffer#
Deregisters an RDMA buffer and releases associated resources.
void deRegisterBuffer(struct rdma_buffer* rdma_buff);
Parameters
rdma_buff: Handle returned byregisterBuffer().
Notes
Complete all operations before deregistration.
Does not free the underlying memory buffer.
1.5.4. I/O Operations#
1.5.4.1. handleGetObject (synchronous and async submission)#
Handles a GET operation by performing RDMA_WRITE to client memory.
ssize_t handleGetObject(const std::string& key,
struct rdma_buffer* local_rdma_buff,
uint64_t remote_buf_start, size_t size,
const std::string& rdma_descr,
uint16_t channel = 0, uint64_t local_offset = 0,
ibv_wc_status* status = nullptr,
void* async_handle = nullptr);
Parameters
key: Request identifier used for logging and telemetry.local_rdma_buff: Registered local memory handle (source data).remote_buf_start: Starting address in remote client memory.size: Size of operation in bytes (maximum 1 GiB).rdma_descr: RDMA descriptor string received from the client.channel: Channel ID fromallocateChannelId()(default: 0).local_offset: Offset into local buffer (ignored for scatter-gather buffers).status: Optional pointer to receive InfiniBand verbs completion status.async_handle: User handle for async operation (non-NULL enables async mode).
Returns
Positive: Number of bytes transferred (synchronous success).
Negative: Error code.
0: Successful async submission (when
async_handleis non-NULL).
Notes
Synchronous by default.
Populate source data in
local_rdma_buffbefore calling.In async mode, call
poll()to obtain completion.
1.5.4.2. handleGetObject (with poll delay)#
Per-request override for polling delay.
ssize_t handleGetObject(const std::string& key,
struct rdma_buffer* local_rdma_buff,
uint64_t remote_buf_start, size_t size,
const std::string& rdma_descr,
uint32_t poll_delay,
uint16_t channel = 0, uint64_t local_offset = 0,
ibv_wc_status* status = nullptr,
void* async_handle = nullptr);
Additional parameter
poll_delay: Sleeppoll_delaynanoseconds before polling (overrides server config).
1.5.4.3. handlePutObject (synchronous and async submission)#
Handles a PUT operation by performing RDMA_READ from client memory.
ssize_t handlePutObject(const std::string& key,
struct rdma_buffer* local_rdma_buff,
uint64_t remote_buf_start, size_t size,
const std::string& rdma_descr,
uint16_t channel = 0, uint64_t local_offset = 0,
ibv_wc_status* status = nullptr,
void* async_handle = nullptr);
Parameters
key: Request identifier used for logging and telemetry.local_rdma_buff: Registered local memory handle (destination buffer).remote_buf_start: Starting address in remote client memory.size: Size of operation in bytes (maximum 1 GiB).rdma_descr: RDMA descriptor string received from the client.channel: Channel ID fromallocateChannelId()(default: 0).local_offset: Offset into local buffer (ignored for scatter-gather buffers).status: Optional pointer to receive InfiniBand verbs completion status.async_handle: User handle for async operation (non-NULL enables async mode).
Returns
Positive: Number of bytes transferred (synchronous success).
Negative: Error code.
0: Successful async submission (when
async_handleis non-NULL).
Notes
In synchronous mode, data is available in
local_rdma_buffafter completion.In async mode, call
poll()to obtain completion.
1.5.4.4. handlePutObject (with poll delay)#
ssize_t handlePutObject(const std::string& key,
struct rdma_buffer* local_rdma_buff,
uint64_t remote_buf_start, size_t size,
const std::string& rdma_descr,
uint32_t poll_delay,
uint16_t channel = 0, uint64_t local_offset = 0,
ibv_wc_status* status = nullptr,
void* async_handle = nullptr);
1.5.5. Asynchronous Operations#
1.5.5.1. cuObjAsyncEvent_t#
typedef struct cuObjAsyncEvent_s {
void* async_handle; /* User-provided handle from async operation */
ssize_t status; /* Positive: bytes, Negative: error code */
} cuObjAsyncEvent_t;
1.5.5.2. poll#
Polls for completion of async handleGetObject and handlePutObject operations.
int poll(cuObjAsyncEvent_t* events, int max_events, uint16_t channel = 0);
Parameters
events: Array to store completion events.max_events: Maximum number of events to return.channel: Channel ID to poll (must match async operation channel).
Returns
Number of completed events (0 to
max_events).Negative value on error.
-EINVALifeventsisnullptrorchannelis invalid.-EIOif a polled entry does not complete successfully.
Notes
Non-blocking and returns immediately with available completions.
Events are returned in completion order.
1.5.6. Channel Management#
1.5.6.1. allocateChannelId#
Allocates a unique channel ID for use in multi-threaded applications.
uint16_t allocateChannelId();
Returns
Allocated channel ID for exclusive use by a thread.
INVALID_CHANNEL_IDon failure.
1.5.6.2. freeChannelId#
Releases a previously allocated channel ID.
void freeChannelId(uint16_t channel_id);
Parameters
channel_id: Channel ID to release.
Note
Do not free the default channel (0).
1.5.7. Connection Management#
1.5.7.1. isConnected#
Checks if the server is connected and ready for operations.
bool isConnected();
Returns
trueif connected and operational.falseotherwise.
1.5.8. Telemetry Management#
1.5.8.1. setupTelemetry#
Configures telemetry output stream for logging and monitoring.
static void setupTelemetry(bool use_OTEL, std::ostream* os);
Parameters
use_OTEL: Enables OpenTelemetry integration.os: Output stream for telemetry data.
Notes
The output stream must remain valid until
shutdownTelemetry()is called.Affects all
cuObjServerinstances.
1.5.8.2. shutdownTelemetry#
static void shutdownTelemetry();
1.5.8.3. setTelemFlags#
static void setTelemFlags(unsigned flags);
1.6. Supporting Structures#
1.6.1. cuObjScatterGatherEntry_t#
typedef struct cuObjScatterGatherEntry {
void* addr; /* Memory address */
size_t size; /* Size of memory segment */
} cuObjScatterGatherEntry_t;
1.6.2. cuObjScatterGatherEntryWithOffset_t#
typedef struct cuObjScatterGatherEntryWithOffset {
loff_t offset; /* Offset within registered buffer */
size_t size; /* Size of memory segment */
} cuObjScatterGatherEntryWithOffset_t;
1.6.3. cuObjRDMATunableParam#
RDMA connection tunable parameters for advanced configuration.
struct cuObjRDMATunableParam {
int num_dcis; /* default: 128 */
unsigned cq_depth; /* default: 640 */
unsigned long dc_key; /* default: 0xffeeddcc */
int ibv_poll_max_comp_event; /* unused */
int service_level; /* default: 0 */
uint8_t timeout; /* default: 16 */
unsigned hop_limit; /* default: 4 */
int pkey_index; /* default: 0 */
int max_wr; /* unused */
int max_sge; /* default: 10 */
uint32_t delay_interval; /* default: 5000 ns */
cuObjDelayMode_t delay_mode; /* default: CUOBJ_DELAY_BATCH */
bool qp_reset_on_failure; /* default: true */
uint8_t retry_cnt; /* default: 7 */
unsigned traffic_class; /* default: 96 */
};
1.6.4. Key parameters#
num_dcis: Number of Dynamically Connected Initiators (affects concurrency).cq_depth: Completion Queue depth (must accommodate outstanding operations).timeout: QP timeout value (higher is more tolerant to network delays).retry_cnt: Number of retries before declaring failure.delay_interval: Polling delay in nanoseconds (CPU versus latency tradeoff).qp_reset_on_failure: Automatically reset QP on errors (recommended: true).traffic_class: Traffic class absolute value including dscp2prio and ECN bits.
1.7. Error Handling#
1.7.1. Return value conventions#
I/O operations return
ssize_t: * Positive: bytes transferred * Negative: error code * 0: async submitted (whenasync_handleis non-NULL)Memory management returns pointers (
NULLon error) orvoid.Channel management returns channel ID (
INVALID_CHANNEL_IDon error).Connection status returns boolean.
1.7.2. InfiniBand verbs status#
The optional ibv_wc_status* status parameter in handleGetObject and handlePutObject provides IB verbs completion status such as:
IBV_WC_SUCCESSIBV_WC_LOC_LEN_ERRIBV_WC_LOC_PROT_ERRIBV_WC_REM_ACCESS_ERRIBV_WC_RETRY_EXC_ERRIBV_WC_RNR_RETRY_EXC_ERR
1.7.3. Example#
ibv_wc_status ib_status;
ssize_t result = server.handleGetObject(key, buf, addr, size, desc,
channel, 0, &ib_status, nullptr);
if (result < 0) {
printf("RDMA error: %zd (IB status: %d)\n", result, (int)ib_status);
}
1.7.4. Best practices#
Always check return values for error conditions.
Use
statusfor detailed RDMA error diagnostics.Implement cleanup in error paths.
Enable
qp_reset_on_failurefor automatic recovery when appropriate.Check
isConnected()after construction.
1.8. Usage Patterns#
1.8.1. Basic server usage (single-threaded)#
cuObjServer server("192.168.1.100", 8080, CUOBJ_PROTO_RDMA_DC_V1);
if (!server.isConnected()) {
/* Handle error */
}
void* buffer = server.allocHostBuffer(1024 * 1024);
struct rdma_buffer* rdma_buf = server.registerBuffer(buffer, 1024 * 1024);
if (!rdma_buf) {
/* Handle error */
}
memcpy(buffer, data_source, data_size);
ssize_t result = server.handleGetObject("request_123",
rdma_buf,
remote_addr,
data_size,
rdma_descriptor,
0);
if (result < 0) {
/* Handle error */
}
result = server.handlePutObject("request_456",
rdma_buf,
remote_addr,
data_size,
rdma_descriptor,
0);
if (result > 0) {
process_data(buffer, (size_t)result);
}
server.deRegisterBuffer(rdma_buf);
free(buffer);
1.8.2. Multi-threaded server usage#
cuObjServer server("192.168.1.100", 8080, CUOBJ_PROTO_RDMA_DC_V1);
void worker_thread(cuObjServer& server) {
uint16_t channel = server.allocateChannelId();
if (channel == INVALID_CHANNEL_ID) {
return;
}
void* buffer = server.allocHostBuffer(1024 * 1024);
struct rdma_buffer* rdma_buf = server.registerBuffer(buffer, 1024 * 1024);
while (running) {
Request req = get_next_request();
ssize_t result = server.handleGetObject(req.key,
rdma_buf,
req.remote_addr,
req.size,
req.rdma_descr,
channel);
/* Handle result */
}
server.deRegisterBuffer(rdma_buf);
free(buffer);
server.freeChannelId(channel);
}
1.8.3. Asynchronous operations#
cuObjServer server("192.168.1.100", 8080, CUOBJ_PROTO_RDMA_DC_V1);
uint16_t channel = server.allocateChannelId();
const int NUM_REQ = 10;
struct rdma_buffer* bufs[NUM_REQ];
for (int i = 0; i < NUM_REQ; i++) {
void* buf = server.allocHostBuffer(1024 * 1024);
bufs[i] = server.registerBuffer(buf, 1024 * 1024);
}
for (int i = 0; i < NUM_REQ; i++) {
void* handle = (void*)(uintptr_t)(i + 1);
ssize_t submit = server.handleGetObject("async_req",
bufs[i],
remote_addrs[i],
sizes[i],
rdma_descriptors[i],
channel,
0,
nullptr,
handle);
if (submit < 0) {
/* Submission failed */
}
}
cuObjAsyncEvent_t events[NUM_REQ];
int completed = 0;
while (completed < NUM_REQ) {
int n = server.poll(events, NUM_REQ, channel);
for (int i = 0; i < n; i++) {
/* events[i].async_handle identifies the request */
/* events[i].status contains bytes or error */
}
completed += n;
}
for (int i = 0; i < NUM_REQ; i++) {
server.deRegisterBuffer(bufs[i]);
}
server.freeChannelId(channel);
1.8.4. Scatter-gather buffer usage#
cuObjServer server("192.168.1.100", 8080, CUOBJ_PROTO_RDMA_DC_V1);
void* buf1 = server.allocHostBuffer(4096);
void* buf2 = server.allocHostBuffer(8192);
void* buf3 = server.allocHostBuffer(4096);
std::vector<cuObjScatterGatherEntry_t> sglist = {
{ buf1, 4096 },
{ buf2, 8192 },
{ buf3, 4096 }
};
struct rdma_buffer* rdma_buf = server.registerBuffer(sglist);
ssize_t result = server.handleGetObject("sg_request",
rdma_buf,
remote_addr,
16384,
rdma_descr,
0);
server.deRegisterBuffer(rdma_buf);
free(buf1);
free(buf2);
free(buf3);
1.8.5. Advanced: RDMA buffer views#
void* large_buffer = server.allocHostBuffer(1024 * 1024);
struct rdma_buffer* base_buf = server.registerBuffer(large_buffer, 1024 * 1024);
std::vector<cuObjScatterGatherEntryWithOffset_t> view_sglist = {
{ (loff_t)0, 4096 },
{ (loff_t)16384, 8192 },
{ (loff_t)32768, 4096 }
};
struct rdma_buffer* view_buf = server.getRDMABufferFromSgList(base_buf, view_sglist);
server.handleGetObject("view_req", view_buf, remote_addr, 16384, rdma_descr, 0);
server.putRDMABufferFromSgList(view_buf);
server.deRegisterBuffer(base_buf);
free(large_buffer);
1.8.6. Custom RDMA tuning#
cuObjRDMATunable params;
params.num_dcis = 256;
params.cq_depth = 1024;
params.timeout = 20;
params.retry_cnt = 10;
params.delay_interval = 10000;
params.qp_reset_on_failure = true;
cuObjServer server("192.168.1.100", 8080, CUOBJ_PROTO_RDMA_DC_V1, params);
1.8.7. Telemetry configuration#
std::ofstream log_file("cuobj_server.log");
cuObjServer::setupTelemetry(false, &log_file);
cuObjServer::setTelemFlags(0xFFFF);
cuObjServer server("192.168.1.100", 8080, CUOBJ_PROTO_RDMA_DC_V1);
cuObjServer::shutdownTelemetry();
log_file.close();
1.9. Constants and Limits#
1.9.1. Operation limits#
Maximum operation size: 1 GiB per
handleGetObjectandhandlePutObjectcall.Maximum scatter-gather entries: 10 per
registerBuffer(sglist)call.Maximum SGE per operation: configurable via
max_sge(default: 10).
1.9.2. Channel limits#
Default channel: 0 (always available, no allocation required).
Channel ID range: 0 to 65534.
INVALID_CHANNEL_ID: 65535.
1.9.3. RDMA defaults#
Number of DCIs: 128
CQ depth: 640
DC key: 0xffeeddcc
Service level: 0
Timeout: 16
Hop limit: 4
Partition key index: 0
Retry count: 7
Polling delay: 5000 ns
1.9.4. Memory requirements#
Buffer alignment: 4 KB recommended (
allocHostBuffer).Memory type: host memory (system RAM).
Registered buffers are pinned for RDMA.
1.10. Thread Safety and Channel Management#
1.10.1. Thread safety model#
Constructor and destructor: not thread-safe. Create and destroy from a single thread.
Memory registration: thread-safe when different buffers are registered and deregistered concurrently.
I/O operations: thread-safe only when each thread uses a unique channel ID for concurrent calls.
Channel management: thread-safe.
allocateChannelId()andfreeChannelId()use internal synchronization.Static methods: thread-safe with internal synchronization.
1.10.2. Channel usage rules#
1.10.2.1. Default channel (0)#
Always available and no allocation needed.
Suitable for single-threaded applications.
Do not use in multi-threaded applications without external synchronization.
1.10.2.2. Allocated channels#
Each thread must allocate its own channel via
allocateChannelId().A channel can only be used by one thread at a time.
Free channels via
freeChannelId()when done.
1.10.2.3. Async operations#
poll()must be called on the same channel used for async submissions.Only the owning thread should poll its channel.
Do not share channels between threads.
1.10.3. Correct and incorrect patterns#
/* Correct: each thread has its own channel */
void thread_func(cuObjServer& server) {
uint16_t my_channel = server.allocateChannelId();
server.handleGetObject(..., my_channel, ...);
server.handlePutObject(..., my_channel, ...);
server.freeChannelId(my_channel);
}
/* Incorrect: multiple threads sharing the same channel */
void bad_thread_func(cuObjServer& server) {
server.handleGetObject(..., 0, ...);
}
/* Correct: single-threaded use of default channel */
void single_thread_app(cuObjServer& server) {
server.handleGetObject(..., 0, ...);
}
1.11. Best Practices#
Memory alignment: Use
allocHostBuffer()for 4 KB aligned buffers.Error recovery: Set
qp_reset_on_failure = truefor automatic recovery.Polling strategy:
Lower
delay_intervalreduces latency and increases CPU usage.Higher
delay_intervalreduces CPU usage and increases latency.Tune based on workload characteristics.
Async versus sync:
Use async for batch operations and throughput.
Use sync for simplicity and low-latency single operations.
Resource limits:
Limit concurrent operations based on
cq_depth.Ensure
num_dcisis at least the number of concurrent channels.
Buffer reuse:
Reuse buffers after completion.
For async, wait for
poll()completion before reuse.
Cleanup order:
Complete all operations before deregistering buffers.
Deregister buffers before freeing memory.
Free channels before destroying the server.
Network configuration:
Ensure RDMA devices are configured.
Verify the InfiniBand subnet manager is running.
Check firewall rules for RDMA ports.