1. cuObjServer API Specification v1.0.0#

1.1. Overview#

The cuObjServer library provides server-side APIs for handling RDMA-based object operations with GPUDirect Storage support. It enables high-performance data transfers through RDMA protocols, specifically RDMA Dynamically Connected (DC) transports.

1.1.1. Key features#

  • Server-side RDMA connection management

  • Memory registration for RDMA transfers (single buffer and scatter-gather)

  • Synchronous and asynchronous RDMA operations (GET and PUT)

  • Multi-threaded operation support with channel management

  • Configurable RDMA tuning parameters

  • Telemetry and logging capabilities

  • Maximum operation size: 1 GiB per handleGetObject and handlePutObject call

1.1.2. Protocol support#

  • CUOBJ_PROTO_RDMA_DC_V1 (RDMA Dynamically Connected version 1)

1.2. I/O flow#

The cuObjServer library follows a connection-based architecture with channel multiplexing:

  1. Server initialization: Create cuObjServer with IP, port, and protocol.

  2. Connection setup: Establish an RDMA connection using InfiniBand verbs.

  3. Memory registration: Register host buffers for RDMA operations.

  4. Channel allocation: Allocate unique channels for multi-threaded access.

  5. RDMA operations: Handle GET (RDMA_WRITE) and PUT (RDMA_READ) requests.

  6. Completion: Poll for asynchronous operation completions.

  7. Cleanup: Deregister buffers, free channels, close the connection.

1.2.1. Key concepts#

  • GET operation: Server performs RDMA_WRITE to client memory.

  • PUT operation: Server performs RDMA_READ from client memory.

  • Channels: Isolated RDMA queues for lock-free concurrent access.

  • Scatter-gather: Support for non-contiguous memory regions.

1.3. Core Types and Enumerations#

1.3.1. Error types#

typedef enum cuObjErr_enum {
    CU_OBJ_SUCCESS = 0,    /* Operation successfully completed */
    CU_OBJ_FAIL    = 1     /* Operation failed */
} cuObjErr_t;

1.3.2. Protocol types#

typedef enum cuObjProto_enum {
    CUOBJ_PROTO_RDMA_DC_V1 = 1001,  /* RDMA Dynamically Connected version 1 */
    CUOBJ_PROTO_MAX
} cuObjProto_t;

1.3.3. Operation types#

typedef enum cuObjOpType_enum {
    CUOBJ_GET     = 0,     /* GET operation (server writes to client) */
    CUOBJ_PUT     = 1,     /* PUT operation (server reads from client) */
    CUOBJ_INVALID = 9999
} cuObjOpType_t;

1.3.4. Channel ID states#

typedef enum cuObjChannelIdState_enum {
    CHANNEL_ID_FREE      = 0,          /* Channel is available for allocation */
    CHANNEL_ID_ALLOCATED = 1,          /* Channel is allocated to a thread */
    CHANNEL_ID_IN_USE    = 2,          /* Channel is actively processing operations */
    INVALID_CHANNEL_ID   = UINT16_MAX  /* Invalid channel ID marker */
} cuObjChannelIdState_t;

1.3.5. Delay modes#

typedef enum cuObjDelayMode {
    CUOBJ_DELAY_NONE     = 0,  /* No delay between polling attempts */
    CUOBJ_DELAY_BATCH    = 1,  /* Delay after each batch of polling attempts (default) */
    CUOBJ_DELAY_ENTRY    = 2,  /* Delay after each polling attempt */
    CUOBJ_DELAY_ADAPTIVE = 3,  /* Adaptively adjust delay between polling attempts */
    CUOBJ_DELAY_INVALID  = 4   /* Invalid delay mode */
} cuObjDelayMode_t;

Note

Delay modes are used in cuObjRDMATunableParam.delay_mode to control polling behavior. CUOBJ_DELAY_BATCH is the default and recommended for most workloads.

1.4. RDMAConnection Class#

1.4.1. Class declaration#

RDMAConnection is a base class for RDMA connection management.

class RDMAConnection {
public:
    cuObjRDMATunable params;

    RDMAConnection(const char* ip, unsigned short port);
    ~RDMAConnection();

    int  startRDMASession();
    void initRDMAConfigParams(cuObjRDMATunable config_params);
    void closeRDMASession();
    void handleDisconnectEvent(struct rdma_cm_id* id);

    char* getChannelIP(struct rdma_cm_id* id);
    int   getChannelPort(struct rdma_cm_id* id);
    void  getConfigTunableParam(struct cuObjRDMATunableParam* param);

protected:
    const char* myip;
    struct sockaddr_in addr;
    struct rdma_device* rdma_dev;
};

1.4.2. Notes#

  • cuObjServer inherits from RDMAConnection.

  • Manages RDMA device and connection lifecycle.

  • Provides configuration and status query methods.

1.5. cuObjServer Class API#

1.5.1. Class declaration#

class cuObjServer : public RDMAConnection {
public:
    /* Constructors and destructor */
    cuObjServer(const char* ip, unsigned short port, unsigned proto);
    cuObjServer(const char* ip, unsigned short port, unsigned proto,
                cuObjRDMATunable params);
    cuObjServer() = default;
    ~cuObjServer();

    /* Memory management */
    void* allocHostBuffer(size_t size);
    struct rdma_buffer* registerBuffer(void* ptr, size_t size);
    struct rdma_buffer* registerBuffer(std::vector<cuObjScatterGatherEntry_t> sglist);
    void deRegisterBuffer(struct rdma_buffer* rdma_buff);

    /* Scatter-gather support */
    struct rdma_buffer* getRDMABufferFromSgList(
        struct rdma_buffer* rdma_buffer,
        std::vector<cuObjScatterGatherEntryWithOffset_t> sglist);
    void putRDMABufferFromSgList(struct rdma_buffer* rdma_buffer);

    /* I/O operations (synchronous) */
    ssize_t handleGetObject(const std::string& key,
                            struct rdma_buffer* local_rdma_buff,
                            uint64_t remote_buf_start, size_t size,
                            const std::string& rdma_descr,
                            uint16_t channel = 0, uint64_t local_offset = 0,
                            ibv_wc_status* status = nullptr,
                            void* async_handle = nullptr);

    ssize_t handlePutObject(const std::string& key,
                            struct rdma_buffer* local_rdma_buff,
                            uint64_t remote_buf_start, size_t size,
                            const std::string& rdma_descr,
                            uint16_t channel = 0, uint64_t local_offset = 0,
                            ibv_wc_status* status = nullptr,
                            void* async_handle = nullptr);

    /* I/O operations (with poll delay) */
    ssize_t handleGetObject(const std::string& key,
                            struct rdma_buffer* local_rdma_buff,
                            uint64_t remote_buf_start, size_t size,
                            const std::string& rdma_descr,
                            uint32_t poll_delay,
                            uint16_t channel = 0, uint64_t local_offset = 0,
                            ibv_wc_status* status = nullptr,
                            void* async_handle = nullptr);

    ssize_t handlePutObject(const std::string& key,
                            struct rdma_buffer* local_rdma_buff,
                            uint64_t remote_buf_start, size_t size,
                            const std::string& rdma_descr,
                            uint32_t poll_delay,
                            uint16_t channel = 0, uint64_t local_offset = 0,
                            ibv_wc_status* status = nullptr,
                            void* async_handle = nullptr);

    /* Asynchronous operations */
    int poll(cuObjAsyncEvent_t* events, int max_events, uint16_t channel = 0);

    /* Channel management */
    uint16_t allocateChannelId();
    void freeChannelId(uint16_t channel_id);

    /* Connection status */
    bool isConnected();

    /* Static telemetry */
    static void setupTelemetry(bool use_OTEL, std::ostream* os);
    static void shutdownTelemetry();
    static void setTelemFlags(unsigned flags);
};

1.5.2. Constructors#

1.5.2.1. Basic constructor#

Creates a cuObjServer with default RDMA parameters.

cuObjServer(const char* ip, unsigned short port, unsigned proto);

Parameters

  • ip: Server IP address for RDMA connection.

  • port: Server port for RDMA connection.

  • proto: RDMA protocol (typically CUOBJ_PROTO_RDMA_DC_V1).

Notes

  • Uses default tunable parameters.

  • Automatically starts the RDMA session.

  • Call isConnected() after construction to verify success.

1.5.2.2. Advanced constructor with tuning parameters#

Creates a cuObjServer with custom RDMA configuration.

cuObjServer(const char* ip, unsigned short port, unsigned proto,
            cuObjRDMATunable params);

Parameters

  • ip: Server IP address for RDMA connection.

  • port: Server port for RDMA connection.

  • proto: RDMA protocol (typically CUOBJ_PROTO_RDMA_DC_V1).

  • params: RDMA tuning parameters (see cuObjRDMATunableParam).

1.5.3. Memory Management APIs#

1.5.3.1. allocHostBuffer#

Allocates a 4 KB aligned host buffer suitable for RDMA operations.

void* allocHostBuffer(size_t size);

Parameters

  • size: Size of buffer to allocate in bytes.

Returns

  • Pointer to allocated buffer on success.

  • NULL on failure.

Notes

  • Buffer is aligned to 4 KB boundaries for performance.

  • Caller is responsible for freeing the buffer using free().

1.5.3.2. registerBuffer (single buffer)#

Registers a contiguous host buffer for RDMA operations.

struct rdma_buffer* registerBuffer(void* ptr, size_t size);

Parameters

  • ptr: Start address of host buffer.

  • size: Size of buffer in bytes.

Returns

  • Opaque RDMA handle for use in handleGetObject and handlePutObject.

  • NULL on failure.

Notes

  • Buffer must remain valid until deRegisterBuffer() is called.

  • Must be host memory (system RAM).

1.5.3.3. registerBuffer (scatter-gather list)#

Registers a scatter-gather list of non-contiguous host buffers.

struct rdma_buffer* registerBuffer(std::vector<cuObjScatterGatherEntry_t> sglist);

Parameters

  • sglist: Vector of scatter-gather entries (maximum 10 entries).

Returns

  • Opaque RDMA handle for use in handleGetObject and handlePutObject.

  • NULL on failure.

Notes

  • Total size is the sum of all entry sizes.

  • When used with handleGetObject and handlePutObject, local_offset is ignored.

1.5.3.4. getRDMABufferFromSgList#

Creates an RDMA buffer view from a registered buffer using scatter-gather offsets.

struct rdma_buffer* getRDMABufferFromSgList(
    struct rdma_buffer* rdma_buffer,
    std::vector<cuObjScatterGatherEntryWithOffset_t> sglist);

Parameters

  • rdma_buffer: Previously registered RDMA buffer.

  • sglist: Vector specifying offset and size within the registered buffer.

Returns

  • New opaque RDMA handle representing the scatter-gather view.

  • NULL on failure.

Notes

  • Creates a derived view without re-registering memory.

  • Must call putRDMABufferFromSgList() to free the view.

1.5.3.5. putRDMABufferFromSgList#

Frees an RDMA buffer view created by getRDMABufferFromSgList.

void putRDMABufferFromSgList(struct rdma_buffer* rdma_buffer);

Parameters

  • rdma_buffer: RDMA buffer returned by getRDMABufferFromSgList().

Notes

  • Does not deregister the underlying buffer.

1.5.3.6. deRegisterBuffer#

Deregisters an RDMA buffer and releases associated resources.

void deRegisterBuffer(struct rdma_buffer* rdma_buff);

Parameters

  • rdma_buff: Handle returned by registerBuffer().

Notes

  • Complete all operations before deregistration.

  • Does not free the underlying memory buffer.

1.5.4. I/O Operations#

1.5.4.1. handleGetObject (synchronous and async submission)#

Handles a GET operation by performing RDMA_WRITE to client memory.

ssize_t handleGetObject(const std::string& key,
                        struct rdma_buffer* local_rdma_buff,
                        uint64_t remote_buf_start, size_t size,
                        const std::string& rdma_descr,
                        uint16_t channel = 0, uint64_t local_offset = 0,
                        ibv_wc_status* status = nullptr,
                        void* async_handle = nullptr);

Parameters

  • key: Request identifier used for logging and telemetry.

  • local_rdma_buff: Registered local memory handle (source data).

  • remote_buf_start: Starting address in remote client memory.

  • size: Size of operation in bytes (maximum 1 GiB).

  • rdma_descr: RDMA descriptor string received from the client.

  • channel: Channel ID from allocateChannelId() (default: 0).

  • local_offset: Offset into local buffer (ignored for scatter-gather buffers).

  • status: Optional pointer to receive InfiniBand verbs completion status.

  • async_handle: User handle for async operation (non-NULL enables async mode).

Returns

  • Positive: Number of bytes transferred (synchronous success).

  • Negative: Error code.

  • 0: Successful async submission (when async_handle is non-NULL).

Notes

  • Synchronous by default.

  • Populate source data in local_rdma_buff before calling.

  • In async mode, call poll() to obtain completion.

1.5.4.2. handleGetObject (with poll delay)#

Per-request override for polling delay.

ssize_t handleGetObject(const std::string& key,
                        struct rdma_buffer* local_rdma_buff,
                        uint64_t remote_buf_start, size_t size,
                        const std::string& rdma_descr,
                        uint32_t poll_delay,
                        uint16_t channel = 0, uint64_t local_offset = 0,
                        ibv_wc_status* status = nullptr,
                        void* async_handle = nullptr);

Additional parameter

  • poll_delay: Sleep poll_delay nanoseconds before polling (overrides server config).

1.5.4.3. handlePutObject (synchronous and async submission)#

Handles a PUT operation by performing RDMA_READ from client memory.

ssize_t handlePutObject(const std::string& key,
                        struct rdma_buffer* local_rdma_buff,
                        uint64_t remote_buf_start, size_t size,
                        const std::string& rdma_descr,
                        uint16_t channel = 0, uint64_t local_offset = 0,
                        ibv_wc_status* status = nullptr,
                        void* async_handle = nullptr);

Parameters

  • key: Request identifier used for logging and telemetry.

  • local_rdma_buff: Registered local memory handle (destination buffer).

  • remote_buf_start: Starting address in remote client memory.

  • size: Size of operation in bytes (maximum 1 GiB).

  • rdma_descr: RDMA descriptor string received from the client.

  • channel: Channel ID from allocateChannelId() (default: 0).

  • local_offset: Offset into local buffer (ignored for scatter-gather buffers).

  • status: Optional pointer to receive InfiniBand verbs completion status.

  • async_handle: User handle for async operation (non-NULL enables async mode).

Returns

  • Positive: Number of bytes transferred (synchronous success).

  • Negative: Error code.

  • 0: Successful async submission (when async_handle is non-NULL).

Notes

  • In synchronous mode, data is available in local_rdma_buff after completion.

  • In async mode, call poll() to obtain completion.

1.5.4.4. handlePutObject (with poll delay)#

ssize_t handlePutObject(const std::string& key,
                        struct rdma_buffer* local_rdma_buff,
                        uint64_t remote_buf_start, size_t size,
                        const std::string& rdma_descr,
                        uint32_t poll_delay,
                        uint16_t channel = 0, uint64_t local_offset = 0,
                        ibv_wc_status* status = nullptr,
                        void* async_handle = nullptr);

1.5.5. Asynchronous Operations#

1.5.5.1. cuObjAsyncEvent_t#

typedef struct cuObjAsyncEvent_s {
    void* async_handle;  /* User-provided handle from async operation */
    ssize_t status;      /* Positive: bytes, Negative: error code */
} cuObjAsyncEvent_t;

1.5.5.2. poll#

Polls for completion of async handleGetObject and handlePutObject operations.

int poll(cuObjAsyncEvent_t* events, int max_events, uint16_t channel = 0);

Parameters

  • events: Array to store completion events.

  • max_events: Maximum number of events to return.

  • channel: Channel ID to poll (must match async operation channel).

Returns

  • Number of completed events (0 to max_events).

  • Negative value on error.

  • -EINVAL if events is nullptr or channel is invalid.

  • -EIO if a polled entry does not complete successfully.

Notes

  • Non-blocking and returns immediately with available completions.

  • Events are returned in completion order.

1.5.6. Channel Management#

1.5.6.1. allocateChannelId#

Allocates a unique channel ID for use in multi-threaded applications.

uint16_t allocateChannelId();

Returns

  • Allocated channel ID for exclusive use by a thread.

  • INVALID_CHANNEL_ID on failure.

1.5.6.2. freeChannelId#

Releases a previously allocated channel ID.

void freeChannelId(uint16_t channel_id);

Parameters

  • channel_id: Channel ID to release.

Note

Do not free the default channel (0).

1.5.7. Connection Management#

1.5.7.1. isConnected#

Checks if the server is connected and ready for operations.

bool isConnected();

Returns

  • true if connected and operational.

  • false otherwise.

1.5.8. Telemetry Management#

1.5.8.1. setupTelemetry#

Configures telemetry output stream for logging and monitoring.

static void setupTelemetry(bool use_OTEL, std::ostream* os);

Parameters

  • use_OTEL: Enables OpenTelemetry integration.

  • os: Output stream for telemetry data.

Notes

  • The output stream must remain valid until shutdownTelemetry() is called.

  • Affects all cuObjServer instances.

1.5.8.2. shutdownTelemetry#

static void shutdownTelemetry();

1.5.8.3. setTelemFlags#

static void setTelemFlags(unsigned flags);

1.6. Supporting Structures#

1.6.1. cuObjScatterGatherEntry_t#

typedef struct cuObjScatterGatherEntry {
    void*  addr;  /* Memory address */
    size_t size;  /* Size of memory segment */
} cuObjScatterGatherEntry_t;

1.6.2. cuObjScatterGatherEntryWithOffset_t#

typedef struct cuObjScatterGatherEntryWithOffset {
    loff_t offset;  /* Offset within registered buffer */
    size_t size;    /* Size of memory segment */
} cuObjScatterGatherEntryWithOffset_t;

1.6.3. cuObjRDMATunableParam#

RDMA connection tunable parameters for advanced configuration.

struct cuObjRDMATunableParam {
    int            num_dcis;                /* default: 128 */
    unsigned       cq_depth;                /* default: 640 */
    unsigned long  dc_key;                  /* default: 0xffeeddcc */
    int            ibv_poll_max_comp_event; /* unused */
    int            service_level;           /* default: 0 */
    uint8_t        timeout;                 /* default: 16 */
    unsigned       hop_limit;               /* default: 4 */
    int            pkey_index;              /* default: 0 */
    int            max_wr;                  /* unused */
    int            max_sge;                 /* default: 10 */
    uint32_t       delay_interval;          /* default: 5000 ns */
    cuObjDelayMode_t delay_mode;            /* default: CUOBJ_DELAY_BATCH */
    bool           qp_reset_on_failure;     /* default: true */
    uint8_t        retry_cnt;               /* default: 7 */
    unsigned       traffic_class;           /* default: 96 */
};

1.6.4. Key parameters#

  • num_dcis: Number of Dynamically Connected Initiators (affects concurrency).

  • cq_depth: Completion Queue depth (must accommodate outstanding operations).

  • timeout: QP timeout value (higher is more tolerant to network delays).

  • retry_cnt: Number of retries before declaring failure.

  • delay_interval: Polling delay in nanoseconds (CPU versus latency tradeoff).

  • qp_reset_on_failure: Automatically reset QP on errors (recommended: true).

  • traffic_class: Traffic class absolute value including dscp2prio and ECN bits.

1.7. Error Handling#

1.7.1. Return value conventions#

  • I/O operations return ssize_t: * Positive: bytes transferred * Negative: error code * 0: async submitted (when async_handle is non-NULL)

  • Memory management returns pointers (NULL on error) or void.

  • Channel management returns channel ID (INVALID_CHANNEL_ID on error).

  • Connection status returns boolean.

1.7.2. InfiniBand verbs status#

The optional ibv_wc_status* status parameter in handleGetObject and handlePutObject provides IB verbs completion status such as:

  • IBV_WC_SUCCESS

  • IBV_WC_LOC_LEN_ERR

  • IBV_WC_LOC_PROT_ERR

  • IBV_WC_REM_ACCESS_ERR

  • IBV_WC_RETRY_EXC_ERR

  • IBV_WC_RNR_RETRY_EXC_ERR

1.7.3. Example#

ibv_wc_status ib_status;
ssize_t result = server.handleGetObject(key, buf, addr, size, desc,
                                        channel, 0, &ib_status, nullptr);
if (result < 0) {
    printf("RDMA error: %zd (IB status: %d)\n", result, (int)ib_status);
}

1.7.4. Best practices#

  • Always check return values for error conditions.

  • Use status for detailed RDMA error diagnostics.

  • Implement cleanup in error paths.

  • Enable qp_reset_on_failure for automatic recovery when appropriate.

  • Check isConnected() after construction.

1.8. Usage Patterns#

1.8.1. Basic server usage (single-threaded)#

cuObjServer server("192.168.1.100", 8080, CUOBJ_PROTO_RDMA_DC_V1);
if (!server.isConnected()) {
    /* Handle error */
}

void* buffer = server.allocHostBuffer(1024 * 1024);
struct rdma_buffer* rdma_buf = server.registerBuffer(buffer, 1024 * 1024);
if (!rdma_buf) {
    /* Handle error */
}

memcpy(buffer, data_source, data_size);

ssize_t result = server.handleGetObject("request_123",
                                        rdma_buf,
                                        remote_addr,
                                        data_size,
                                        rdma_descriptor,
                                        0);
if (result < 0) {
    /* Handle error */
}

result = server.handlePutObject("request_456",
                                rdma_buf,
                                remote_addr,
                                data_size,
                                rdma_descriptor,
                                0);
if (result > 0) {
    process_data(buffer, (size_t)result);
}

server.deRegisterBuffer(rdma_buf);
free(buffer);

1.8.2. Multi-threaded server usage#

cuObjServer server("192.168.1.100", 8080, CUOBJ_PROTO_RDMA_DC_V1);

void worker_thread(cuObjServer& server) {
    uint16_t channel = server.allocateChannelId();
    if (channel == INVALID_CHANNEL_ID) {
        return;
    }

    void* buffer = server.allocHostBuffer(1024 * 1024);
    struct rdma_buffer* rdma_buf = server.registerBuffer(buffer, 1024 * 1024);

    while (running) {
        Request req = get_next_request();
        ssize_t result = server.handleGetObject(req.key,
                                                rdma_buf,
                                                req.remote_addr,
                                                req.size,
                                                req.rdma_descr,
                                                channel);
        /* Handle result */
    }

    server.deRegisterBuffer(rdma_buf);
    free(buffer);
    server.freeChannelId(channel);
}

1.8.3. Asynchronous operations#

cuObjServer server("192.168.1.100", 8080, CUOBJ_PROTO_RDMA_DC_V1);
uint16_t channel = server.allocateChannelId();

const int NUM_REQ = 10;
struct rdma_buffer* bufs[NUM_REQ];

for (int i = 0; i < NUM_REQ; i++) {
    void* buf = server.allocHostBuffer(1024 * 1024);
    bufs[i] = server.registerBuffer(buf, 1024 * 1024);
}

for (int i = 0; i < NUM_REQ; i++) {
    void* handle = (void*)(uintptr_t)(i + 1);
    ssize_t submit = server.handleGetObject("async_req",
                                            bufs[i],
                                            remote_addrs[i],
                                            sizes[i],
                                            rdma_descriptors[i],
                                            channel,
                                            0,
                                            nullptr,
                                            handle);
    if (submit < 0) {
        /* Submission failed */
    }
}

cuObjAsyncEvent_t events[NUM_REQ];
int completed = 0;

while (completed < NUM_REQ) {
    int n = server.poll(events, NUM_REQ, channel);
    for (int i = 0; i < n; i++) {
        /* events[i].async_handle identifies the request */
        /* events[i].status contains bytes or error */
    }
    completed += n;
}

for (int i = 0; i < NUM_REQ; i++) {
    server.deRegisterBuffer(bufs[i]);
}
server.freeChannelId(channel);

1.8.4. Scatter-gather buffer usage#

cuObjServer server("192.168.1.100", 8080, CUOBJ_PROTO_RDMA_DC_V1);

void* buf1 = server.allocHostBuffer(4096);
void* buf2 = server.allocHostBuffer(8192);
void* buf3 = server.allocHostBuffer(4096);

std::vector<cuObjScatterGatherEntry_t> sglist = {
    { buf1, 4096 },
    { buf2, 8192 },
    { buf3, 4096 }
};

struct rdma_buffer* rdma_buf = server.registerBuffer(sglist);

ssize_t result = server.handleGetObject("sg_request",
                                        rdma_buf,
                                        remote_addr,
                                        16384,
                                        rdma_descr,
                                        0);

server.deRegisterBuffer(rdma_buf);
free(buf1);
free(buf2);
free(buf3);

1.8.5. Advanced: RDMA buffer views#

void* large_buffer = server.allocHostBuffer(1024 * 1024);
struct rdma_buffer* base_buf = server.registerBuffer(large_buffer, 1024 * 1024);

std::vector<cuObjScatterGatherEntryWithOffset_t> view_sglist = {
    { (loff_t)0,     4096  },
    { (loff_t)16384, 8192  },
    { (loff_t)32768, 4096  }
};

struct rdma_buffer* view_buf = server.getRDMABufferFromSgList(base_buf, view_sglist);
server.handleGetObject("view_req", view_buf, remote_addr, 16384, rdma_descr, 0);
server.putRDMABufferFromSgList(view_buf);

server.deRegisterBuffer(base_buf);
free(large_buffer);

1.8.6. Custom RDMA tuning#

cuObjRDMATunable params;
params.num_dcis = 256;
params.cq_depth = 1024;
params.timeout = 20;
params.retry_cnt = 10;
params.delay_interval = 10000;
params.qp_reset_on_failure = true;

cuObjServer server("192.168.1.100", 8080, CUOBJ_PROTO_RDMA_DC_V1, params);

1.8.7. Telemetry configuration#

std::ofstream log_file("cuobj_server.log");
cuObjServer::setupTelemetry(false, &log_file);
cuObjServer::setTelemFlags(0xFFFF);

cuObjServer server("192.168.1.100", 8080, CUOBJ_PROTO_RDMA_DC_V1);

cuObjServer::shutdownTelemetry();
log_file.close();

1.9. Constants and Limits#

1.9.1. Operation limits#

  • Maximum operation size: 1 GiB per handleGetObject and handlePutObject call.

  • Maximum scatter-gather entries: 10 per registerBuffer(sglist) call.

  • Maximum SGE per operation: configurable via max_sge (default: 10).

1.9.2. Channel limits#

  • Default channel: 0 (always available, no allocation required).

  • Channel ID range: 0 to 65534.

  • INVALID_CHANNEL_ID: 65535.

1.9.3. RDMA defaults#

  • Number of DCIs: 128

  • CQ depth: 640

  • DC key: 0xffeeddcc

  • Service level: 0

  • Timeout: 16

  • Hop limit: 4

  • Partition key index: 0

  • Retry count: 7

  • Polling delay: 5000 ns

1.9.4. Memory requirements#

  • Buffer alignment: 4 KB recommended (allocHostBuffer).

  • Memory type: host memory (system RAM).

  • Registered buffers are pinned for RDMA.

1.10. Thread Safety and Channel Management#

1.10.1. Thread safety model#

  • Constructor and destructor: not thread-safe. Create and destroy from a single thread.

  • Memory registration: thread-safe when different buffers are registered and deregistered concurrently.

  • I/O operations: thread-safe only when each thread uses a unique channel ID for concurrent calls.

  • Channel management: thread-safe. allocateChannelId() and freeChannelId() use internal synchronization.

  • Static methods: thread-safe with internal synchronization.

1.10.2. Channel usage rules#

1.10.2.1. Default channel (0)#

  • Always available and no allocation needed.

  • Suitable for single-threaded applications.

  • Do not use in multi-threaded applications without external synchronization.

1.10.2.2. Allocated channels#

  • Each thread must allocate its own channel via allocateChannelId().

  • A channel can only be used by one thread at a time.

  • Free channels via freeChannelId() when done.

1.10.2.3. Async operations#

  • poll() must be called on the same channel used for async submissions.

  • Only the owning thread should poll its channel.

  • Do not share channels between threads.

1.10.3. Correct and incorrect patterns#

/* Correct: each thread has its own channel */
void thread_func(cuObjServer& server) {
    uint16_t my_channel = server.allocateChannelId();
    server.handleGetObject(..., my_channel, ...);
    server.handlePutObject(..., my_channel, ...);
    server.freeChannelId(my_channel);
}
/* Incorrect: multiple threads sharing the same channel */
void bad_thread_func(cuObjServer& server) {
    server.handleGetObject(..., 0, ...);
}
/* Correct: single-threaded use of default channel */
void single_thread_app(cuObjServer& server) {
    server.handleGetObject(..., 0, ...);
}

1.11. Best Practices#

  • Memory alignment: Use allocHostBuffer() for 4 KB aligned buffers.

  • Error recovery: Set qp_reset_on_failure = true for automatic recovery.

  • Polling strategy:

    • Lower delay_interval reduces latency and increases CPU usage.

    • Higher delay_interval reduces CPU usage and increases latency.

    • Tune based on workload characteristics.

  • Async versus sync:

    • Use async for batch operations and throughput.

    • Use sync for simplicity and low-latency single operations.

  • Resource limits:

    • Limit concurrent operations based on cq_depth.

    • Ensure num_dcis is at least the number of concurrent channels.

  • Buffer reuse:

    • Reuse buffers after completion.

    • For async, wait for poll() completion before reuse.

  • Cleanup order:

    • Complete all operations before deregistering buffers.

    • Deregister buffers before freeing memory.

    • Free channels before destroying the server.

  • Network configuration:

    • Ensure RDMA devices are configured.

    • Verify the InfiniBand subnet manager is running.

    • Check firewall rules for RDMA ports.