Class CudaObjectHandler
Defined in File gxf_cuda.hpp
-
class CudaObjectHandler
This class handles usage of CUDA streams for operators.
When using CUDA operations the default stream ‘0’ synchronizes with all other streams in the same context, see https://docs.nvidia.com/cuda/cuda-runtime-api/stream-sync-behavior.html#stream-sync-behavior. This can reduce performance. The CudaObjectHandler class manages CUDA streams and events across operators and makes sure that CUDA operations are properly chained.
Usage:
This class is automatically added as an internal data member of each operators
ExecutionContext
. It will be automatically configured byExecutionContext::init_cuda_object_handler(op)
byGXFWrapper::start()
, just beforeOperator::start
is called.A stream pool for use by
CudaObjectHandler
can be added to the operator either by explicitly adding a parameter with typestd::shared_ptr<CudaStreamPool>
and namecuda_stream_pool
or by passing anArg<std::shared_ptr<CudaStreamPool>>
toFragment::make_operator
when creating the operator. It is not required to provide a stream pool, but allocation of an internal stream or allocation of additional streams viaallocate_cuda_stream
is only possible if a stream pool is present.This class is not intended for direct use by Application authors, but instead to support the public methods available on
InputContext
,OutputContext
andExecutionContext
as described below.When the
InputContext::receive
method is called for a given port, the operator’sCudaObjectHandler
class will update its internal mapping of the streams available on the input ports.When
InputContext::receive_cuda_stream
is called, any received streams found by the priorreceive
call for the specified port will be synchronized to the operator’s internal stream and then that internal stream will returned as a standard CUDA Runtime APIcudaStream_t
. If noCudaStreamPool
was configured, it will not be possible to create the internal stream, so in that case, the first CUDA stream found on the input will be returned and any remaining streams on the input are synchronized to it. If there are no streams on the input port and there is no internalCudaStreamPool
, thencudaStreamDefault
is returned. When a non-default stream is returned, this method callscudaSetDevice
to set the active device to match the stream that is returned. When a non-default stream is returned, this method also will have automatically configured the output ports of the operator to emit that stream, so manually callingOutputContext::set_cuda_stream
is not necessary when using this method.The
InputContext::receive_cuda_streams
method is intended for advanced use cases where the user wants to handle all streams found and their synchronization manually. It just returns avector<std::optional<cudaStream_t>>
where the size of the vector is equal to the number of messages found on the input port. Any messages without a stream will have astd::nullopt
entry in the vector.The
ExecutionContext::allocate_cuda_stream
method can be used if it is necessary to allocate an additional stream for use by the operator. In most cases, this will not be necessary and the stream that is returned byInputContext::receive_cuda_stream
can be used.The
ExecutionContext::device_from_stream
method can be used to determined which CUDA device id a givencudaStream_t
returned byInputContext::receive_cuda_stream
orInputContext::receive_cuda_streams
belongs to.The
OutputContext::set_cuda_stream
method can be used to emit specific streams on specific output ports. Any non-default stream received byInputContext::receive_cuda_stream
would already automatically be output, so this method is mainly useful if doing manual management of the streams received viaInputContext::receive_cuda_streams
or if additional internal streams were allocated viaExecutionContext::allocate_cuda_stream
.
Public Functions
-
~CudaObjectHandler()
Destroy the CudaObjectHandler object.
-
void init_from_operator(Operator *op)
Use a CudaStreamPool from the specified Operator if one is present.
- Parameters
op – : The operator this instance of CudaObjectHandler is attached to. This operator must have already been initialized.
-
gxf_result_t add_stream(const CudaStreamHandle &stream_handle, const std::string &output_port_name)
Add stream to output port (must be called before any emit call using that port)
- Parameters
stream_handle – The stream to add
output_port_name – The name of the output port
- Returns
gxf_result_t
-
gxf_result_t add_stream(const cudaStream_t stream, const std::string &output_port_name)
Add stream to output port (must be called before any emit call using that port)
- Parameters
stream – The stream to add
output_port_name – The name of the output port
- Returns
gxf_result_t
-
expected<CudaStreamHandle, RuntimeError> get_cuda_stream_handle(gxf_context_t context, const std::string &input_port_name, bool allocate = true, bool sync_to_default = false)
Get the CUDA stream handle which should be used for CUDA commands involving data from the specified input port.
For multi-receivers or input ports with queue size > 1, the first stream found is returned after any remaining streams are synchronized to it.
See
get_cuda_stream_handles()
instead to receive a vector of (optional) CUDA stream handles (one for each message).If no message stream is set and the
allocate
flag is true, a stream will be allocated from the internal CudaStreamPool. Only if this allocation fails, would an unexpected be returned.- Parameters
context – The GXF context of the operator.
input_port_name – The name of the input port from which to retrieve the stream.
allocate – If true, allocate a new stream via a cuda_stream_pool parameter if no stream is found.
sync_to_default – If true, synchronize any streams to the default stream. If false, synchronization is done to the internal stream instead.
- Returns
CudaStreamHandle
-
expected<std::vector<std::optional<CudaStreamHandle>>, RuntimeError> get_cuda_stream_handles(gxf_context_t context, const std::string &input_port_name)
Get the CUDA stream handles which should be used for CUDA commands involving data from the specified input port.
The size of the vector returned will be equal to the number of messages received on the input port. Any messages which did not contain a stream will result in a std::nullopt in the vector.
- Parameters
context – The GXF context of the operator.
input_port_name – The name of the input port from which to retrieve the stream.
- Returns
vector<std::optional<CudaStreamHandle>>
-
cudaStream_t get_cuda_stream(gxf_context_t context, const std::string &input_port_name, bool allocate = false, bool sync_to_default = true)
Get the CUDA stream which should be used for CUDA commands involving data from the specified input port.
For multi-receivers or input ports with queue size > 1, see
get_cuda_streams()
instead to receive a vector of CUDA streams (one for each message).If no message stream is set and no stream can be allocated from the internal CudaStreamPool, returns CudaStreamDefault.
- Parameters
context – The GXF context of the operator.
input_port_name – The name of the input port from which to retrieve the stream
allocate – If true, allocate a new stream via a cuda_stream_pool parameter if none is found on the input port. Otherwise, cudaStreamDefault will be returned.
sync_to_default – If true, synchronize any streams to the default stream. If false, synchronization is done to the first stream found on the port instead.
- Returns
cudaStream_t
-
std::vector<std::optional<cudaStream_t>> get_cuda_streams(gxf_context_t context, const std::string &input_port_name)
Get the CUDA stream which should be used for CUDA commands involving data from the specified input port.
The size of the vector returned will be equal to the number of messages received on the input port. Any messages which did not contain a stream will result in a cudaStreamDefault in the vector.
- Parameters
context – The GXF context of the operator.
input_port_name – The name of the input port from which to retrieve the stream
- Returns
vector<std::optional<cudaStream_t>>
-
gxf_result_t synchronize_streams(std::vector<std::optional<CudaStreamHandle>> stream_handles, CudaStreamHandle target_stream_handle, bool sync_to_default_stream = true)
Sync all streams in stream_handles with target_stream_handle.
Any streams in stream_handles that are not valid will be ignored.
- Parameters
stream_handles – The vector of streams to sync.
target_stream_handle – The stream to sync to.
sync_to_default_stream – If true, also synchronize the target stream to the default stream
- Returns
gxf_result_t GXF_SUCCESS if all streams were successfully synced.
-
gxf_result_t synchronize_streams(std::vector<cudaStream_t> cuda_streams, cudaStream_t target_stream, bool sync_to_default_stream = true)
Sync all streams in stream_handles with target_stream_handle.
Any streams in stream_handles that are not valid will be ignored.
- Parameters
cuda_streams – The vector of streams to sync.
target_stream – The stream to sync to.
sync_to_default_stream – If true, also synchronize the target stream to the default stream
- Returns
gxf_result_t GXF_SUCCESS if all streams were successfully synced.
-
cudaStream_t stream_from_stream_handle(CudaStreamHandle stream_handle)
Get the cudaStream_t value corresponding to a CudaStreamHandle.
- Parameters
cuda_stream_handle – The CudaStreamHandle
- Returns
The CUDA stream contained within the CudaStream object
-
expected<CudaStreamHandle, RuntimeError> stream_handle_from_stream(cudaStream_t stream)
Get the CudaStreamHandle corresponding to a cudaStream_t.
- Parameters
cuda_stream_handle – The CUDA stream
- Returns
GXF Handle to the CudaStream object if found, otherwise an unexpected is returned.
-
expected<gxf_uid_t, ErrorCode> get_output_stream_cid(const std::string &output_port_name)
Get the GXF component ID for any stream to be emitted on the specified output port.
- Parameters
output_port_name – The name of the output port
- Returns
expected<gxf_uid_t>
-
gxf_result_t streams_from_message(gxf_context_t context, const nvidia::gxf::Entity &message, const std::string &input_name)
Get the GXF component IDs for any events to be emitted on the specified output port.
- Parameters
output_port_name – The name of the output port
- Returns
expected<std::vector<gxf_uid_t>>
-
expected<CudaStreamHandle, RuntimeError> allocate_internal_stream(gxf_context_t context, const std::string &stream_name)
Allocate an internal CUDA stream and store it in the mapping for the given input port
- Parameters
context – The GXF context
port_name – The name of the input port
- Returns
GXF Handle to the allocated CudaStream component
-
gxf_result_t release_internal_streams(gxf_context_t context)
Release all internally allocated CUDA streams.
-
void clear_received_streams()
Retain the existing unordered_maps and vectors of received streams, but clear the contents.
This is used to refresh the state of the received streams before each
Operator::compute
call.