Function TRITONSERVER_ResponseAllocatorNew

Function Documentation

TRITONSERVER_Error *TRITONSERVER_ResponseAllocatorNew(TRITONSERVER_ResponseAllocator **allocator, TRITONSERVER_ResponseAllocatorAllocFn_t alloc_fn, TRITONSERVER_ResponseAllocatorReleaseFn_t release_fn, TRITONSERVER_ResponseAllocatorStartFn_t start_fn)

Create a new response allocator object.

The response allocator object is used by Triton to allocate buffers to hold the output tensors in inference responses. Most models generate a single response for each inference request (TRITONSERVER_TXN_ONE_TO_ONE). For these models the order of callbacks will be:

TRITONSERVER_ServerInferAsync called

  • start_fn : optional (and typically not required)

  • alloc_fn : called once for each output tensor in response TRITONSERVER_InferenceResponseDelete called

  • release_fn: called once for each output tensor in response

For models that generate multiple responses for each inference request (TRITONSERVER_TXN_DECOUPLED), the start_fn callback can be used to determine sets of alloc_fn callbacks that belong to the same response:

TRITONSERVER_ServerInferAsync called

  • start_fn

  • alloc_fn : called once for each output tensor in response

  • start_fn

  • alloc_fn : called once for each output tensor in response … For each response, TRITONSERVER_InferenceResponseDelete called

  • release_fn: called once for each output tensor in the response

In all cases the start_fn, alloc_fn and release_fn callback functions must be thread-safe. Typically making these functions thread-safe does not require explicit locking. The recommended way to implement these functions is to have each inference request provide a ‘response_allocator_userp’ object that is unique to that request with TRITONSERVER_InferenceRequestSetResponseCallback. The callback functions then operate only on this unique state. Locking is required only when the callback function needs to access state that is shared across inference requests (for example, a common allocation pool).

Return

a TRITONSERVER_Error indicating success or failure.

Parameters
  • allocator: Returns the new response allocator object.

  • alloc_fn: The function to call to allocate buffers for result tensors.

  • release_fn: The function to call when the server no longer holds a reference to an allocated buffer.

  • start_fn: The function to call to indicate that the subsequent ‘alloc_fn’ calls are for a new response. This callback is optional (use nullptr to indicate that it should not be invoked).