Function TRITONSERVER_ResponseAllocatorNew¶
Defined in File tritonserver.h
Function Documentation¶
-
TRITONSERVER_Error *
TRITONSERVER_ResponseAllocatorNew
(TRITONSERVER_ResponseAllocator **allocator, TRITONSERVER_ResponseAllocatorAllocFn_t alloc_fn, TRITONSERVER_ResponseAllocatorReleaseFn_t release_fn, TRITONSERVER_ResponseAllocatorStartFn_t start_fn)¶ Create a new response allocator object.
The response allocator object is used by Triton to allocate buffers to hold the output tensors in inference responses. Most models generate a single response for each inference request (TRITONSERVER_TXN_ONE_TO_ONE). For these models the order of callbacks will be:
TRITONSERVER_ServerInferAsync called
start_fn : optional (and typically not required)
alloc_fn : called once for each output tensor in response TRITONSERVER_InferenceResponseDelete called
release_fn: called once for each output tensor in response
For models that generate multiple responses for each inference request (TRITONSERVER_TXN_DECOUPLED), the start_fn callback can be used to determine sets of alloc_fn callbacks that belong to the same response:
TRITONSERVER_ServerInferAsync called
start_fn
alloc_fn : called once for each output tensor in response
start_fn
alloc_fn : called once for each output tensor in response … For each response, TRITONSERVER_InferenceResponseDelete called
release_fn: called once for each output tensor in the response
In all cases the start_fn, alloc_fn and release_fn callback functions must be thread-safe. Typically making these functions thread-safe does not require explicit locking. The recommended way to implement these functions is to have each inference request provide a ‘response_allocator_userp’ object that is unique to that request with TRITONSERVER_InferenceRequestSetResponseCallback. The callback functions then operate only on this unique state. Locking is required only when the callback function needs to access state that is shared across inference requests (for example, a common allocation pool).
- Return
a TRITONSERVER_Error indicating success or failure.
- Parameters
allocator
: Returns the new response allocator object.alloc_fn
: The function to call to allocate buffers for result tensors.release_fn
: The function to call when the server no longer holds a reference to an allocated buffer.start_fn
: The function to call to indicate that the subsequent ‘alloc_fn’ calls are for a new response. This callback is optional (use nullptr to indicate that it should not be invoked).