holoscan::gxf::CudaObjectHandler
holoscan::gxf::CudaObjectHandler
holoscan::gxf::CudaObjectHandler
This class handles usage of CUDA streams for operators.
When using CUDA operations the default stream ‘0’ synchronizes with all other streams in the same context, see https://docs.nvidia.com/cuda/cuda-runtime-api/stream-sync-behavior.html#stream-sync-behavior. This can reduce performance. The CudaObjectHandler class manages CUDA streams and events across operators and makes sure that CUDA operations are properly chained.
Usage:
ExecutionContext. It will be automatically configured by ExecutionContext::init_cuda_object_handler(op) by GXFWrapper::start(), just before Operator::start is called.CudaObjectHandler can be added to the operator either by explicitly adding a parameter with type std::shared_ptr<CudaStreamPool> and name cuda_stream_pool or by passing an Arg<std::shared_ptr<CudaStreamPool>> to Fragment::make_operator when creating the operator. It is not required to provide a stream pool, but allocation of an internal stream or allocation of additional streams via allocate_cuda_stream is only possible if a stream pool is present.InputContext, OutputContext and ExecutionContext as described below.InputContext::receive method is called for a given port, the operator’s CudaObjectHandler class will update its internal mapping of the streams available on the input ports.InputContext::receive_cuda_stream is called, any received streams found by the prior receive call for the specified port will be synchronized to the operator’s internal stream and then that internal stream will returned as a standard CUDA Runtime API cudaStream_t. If no CudaStreamPool was configured, it will not be possible to create the internal stream, so in that case, the first CUDA stream found on the input will be returned and any remaining streams on the input are synchronized to it. If there are no streams on the input port and there is no internal CudaStreamPool, then cudaStreamDefault is returned. When a non-default stream is returned, this method calls cudaSetDevice to set the active device to match the stream that is returned. When a non-default stream is returned, this method also will have automatically configured the output ports of the operator to emit that stream, so manually calling OutputContext::set_cuda_stream is not necessary when using this method.InputContext::receive_cuda_streams method is intended for advanced use cases where the user wants to handle all streams found and their synchronization manually. It just returns a vector<std::optional<``cudaStream_t``>> where the size of the vector is equal to the number of messages found on the input port. Any messages without a stream will have a std::nullopt entry in the vector.ExecutionContext::allocate_cuda_stream method can be used if it is necessary to allocate an additional stream for use by the operator. In most cases, this will not be necessary and the stream that is returned by InputContext::receive_cuda_stream can be used.ExecutionContext::device_from_stream method can be used to determined which CUDA device id a given cudaStream_t returned by InputContext::receive_cuda_stream or InputContext::receive_cuda_streams belongs to.OutputContext::set_cuda_stream method can be used to emit specific streams on specific output ports. Any non-default stream received by InputContext::receive_cuda_stream would already automatically be output, so this method is mainly useful if doing manual management of the streams received via InputContext::receive_cuda_streams or if additional internal streams were allocated via ExecutionContext::allocate_cuda_stream.Inherits from: holoscan::CudaObjectHandler (public)
Destroy the CudaObjectHandler object.
Use a CudaStreamPool from the specified Operator if one is present.
Parameters
: The operator this instance of CudaObjectHandler is attached to. This operator must have already been initialized.
Check if GPU capability is present on the system.
Returns: true if GPU(s) are available, false if no GPU is present
Add stream to output port (must be called before any emit call using that port).
Returns: gxf_result_t
Parameters
The stream to add
The name of the output port
Get the CUDA stream handle which should be used for CUDA commands involving data from the specified input port.
For multi-receivers or input ports with queue size > 1, the first stream found is returned after any remaining streams are synchronized to it.
See get_cuda_stream_handles() instead to receive a vector of (optional) CUDA stream handles (one for each message).
If no message stream is set and the allocate flag is true, a stream will be allocated from the internal CudaStreamPool. Only if this allocation fails, would an unexpected be returned.
Returns: CudaStreamHandle
Parameters
The GXF context of the operator.
The name of the input port from which to retrieve the stream.
If true, allocate a new stream via a cuda_stream_pool parameter if no stream is found.
If true, synchronize any streams to the default stream. If false, synchronization is done to the internal stream instead.
Get the CUDA stream handles which should be used for CUDA commands involving data from the specified input port.
The size of the vector returned will be equal to the number of messages received on the input port. Any messages which did not contain a stream will result in a std::nullopt in the vector.
Returns: vector<std::optional<CudaStreamHandle>>
Parameters
The GXF context of the operator.
The name of the input port from which to retrieve the stream.
Get the CUDA stream which should be used for CUDA commands involving data from the specified input port.
For multi-receivers or input ports with queue size > 1, see get_cuda_streams() instead to receive a vector of CUDA streams (one for each message).
If no message stream is set and no stream can be allocated from the internal CudaStreamPool, returns CudaStreamDefault.
Returns: cudaStream_t
Parameters
The GXF context of the operator.
The name of the input port from which to retrieve the stream
If true, allocate a new stream via a cuda_stream_pool parameter if none is found on the input port. Otherwise, cudaStreamDefault will be returned.
If true, synchronize any streams to the default stream. If false, synchronization is done to the first stream found on the port instead.
Get the CUDA stream which should be used for CUDA commands involving data from the specified input port.
The size of the vector returned will be equal to the number of messages received on the input port. Any messages which did not contain a stream will result in a cudaStreamDefault in the vector.
Returns: vector<std::optional<cudaStream_t>>
Parameters
The GXF context of the operator.
The name of the input port from which to retrieve the stream
Sync all streams in stream_handles with target_stream_handle.
Any streams in stream_handles that are not valid will be ignored.
Returns: gxf_result_t GXF_SUCCESS if all streams were successfully synced.
Parameters
The vector of streams to sync.
The stream to sync to.
If true, also synchronize the target stream to the default stream
Get the cudaStream_t value corresponding to a CudaStreamHandle.
Returns: The CUDA stream contained within the CudaStream object
Parameters
The CudaStreamHandle
Get the CudaStreamHandle corresponding to a cudaStream_t.
Returns: GXF Handle to the CudaStream object if found, otherwise an unexpected is returned.
Parameters
The CUDA stream
Get the GXF component ID for any stream to be emitted on the specified output port.
Returns: expected<gxf_uid_t>
Parameters
The name of the output port
Get the GXF component IDs for any events to be emitted on the specified output port.
Returns: expected<std::vector<gxf_uid_t>>
Parameters
The GXF context
The GXF message entity
The name of the input port
Allocate an internal CUDA stream and store it in the mapping for the given input port.
Returns: GXF Handle to the allocated CudaStream component
Parameters
The GXF context
The name of the stream
Release all internally allocated CUDA streams.
Retain the existing unordered_maps and vectors of received streams, but clear the contents.
This is used to refresh the state of the received streams before each Operator::compute call.
allocate a new stream from the internal stream pool