For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
  • Introduction
    • Overview
    • Relevant Technologies
    • Getting Started
  • Setup
    • SDK Installation
    • Additional Setup
    • Third Party Hardware Setup
  • Using the SDK
    • Holoscan Core
    • GPU Resident Execution
    • Holoscan by Example
    • Create an Application
    • Create a Distributed Application
    • Create an Operator
    • Create an Operator via Decorator
    • Create a Condition
    • Dynamic Flow Control
    • CUDA Stream Handling
    • Logging
    • Data Logging
    • Debugging
    • Python Operator Bindings
  • Operators
    • Operators and Extensions
    • Visualization
    • Inference
    • Testing
    • Video I/O Vendor Implementation Guide
  • Components
    • Schedulers
    • Conditions
    • Resources
    • Analytics
  • AI Skills
    • Ai Skills
  • API reference
              • Clock
              • ComponentInfo
              • CudaObjectHandler
              • Endpoint
              • Entity
              • EntityGroup
              • GXFComponent
              • GXFCondition
              • GXFExecutionContext
              • GXFExecutor
              • GXFExtensionManager
              • GXFExtensionRegistrar
              • GXFInputContext
              • GXFLogger
              • GXFNetworkContext
              • GXFOutputContext
              • GXFParameterAdaptor
              • GXFResource
              • GXFScheduler
              • GXFSchedulingTermWrapper
              • GXFSystemResourceBase
              • GXFWrapper
  • Performance
    • Performance Considerations
    • Flow Tracking
    • GXF Job Statistics
    • Nsight Profiling
  • HoloHub
    • HoloHub Overview
  • FAQ
    • FAQ
NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Your Privacy Choices | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.

LogoLogoDocumentation
On this page
  • Constructors
  • Destructor
  • ~CudaObjectHandler
  • Methods
  • init_from_operator
  • is_gpu_available
  • add_stream
  • get_cuda_stream_handle
  • get_cuda_stream_handles
  • get_cuda_stream
  • get_cuda_streams
  • synchronize_streams
  • stream_from_stream_handle
  • stream_handle_from_stream
  • get_output_stream_cid
  • streams_from_message
  • allocate_internal_stream
  • release_internal_streams
  • clear_received_streams
  • allocate_cuda_stream
  • from_messages
  • cuda_stream_pool_handle
  • Member variables
API referenceC++ APIHoloscanNamespacesGxfClasses

holoscan::gxf::CudaObjectHandler

Beta
||View as Markdown|
Previous

holoscan::gxf::ComponentInfo

Next

holoscan::gxf::Endpoint

This class handles usage of CUDA streams for operators.

When using CUDA operations the default stream ‘0’ synchronizes with all other streams in the same context, see https://docs.nvidia.com/cuda/cuda-runtime-api/stream-sync-behavior.html#stream-sync-behavior. This can reduce performance. The CudaObjectHandler class manages CUDA streams and events across operators and makes sure that CUDA operations are properly chained.

Usage:

  • This class is automatically added as an internal data member of each operators ExecutionContext. It will be automatically configured by ExecutionContext::init_cuda_object_handler(op) by GXFWrapper::start(), just before Operator::start is called.
  • A stream pool for use by CudaObjectHandler can be added to the operator either by explicitly adding a parameter with type std::shared_ptr<CudaStreamPool> and name cuda_stream_pool or by passing an Arg<std::shared_ptr<CudaStreamPool>> to Fragment::make_operator when creating the operator. It is not required to provide a stream pool, but allocation of an internal stream or allocation of additional streams via allocate_cuda_stream is only possible if a stream pool is present.
  • This class is not intended for direct use by Application authors, but instead to support the public methods available on InputContext, OutputContext and ExecutionContext as described below.
  • When the InputContext::receive method is called for a given port, the operator’s CudaObjectHandler class will update its internal mapping of the streams available on the input ports.
  • When InputContext::receive_cuda_stream is called, any received streams found by the prior receive call for the specified port will be synchronized to the operator’s internal stream and then that internal stream will returned as a standard CUDA Runtime API cudaStream_t. If no CudaStreamPool was configured, it will not be possible to create the internal stream, so in that case, the first CUDA stream found on the input will be returned and any remaining streams on the input are synchronized to it. If there are no streams on the input port and there is no internal CudaStreamPool, then cudaStreamDefault is returned. When a non-default stream is returned, this method calls cudaSetDevice to set the active device to match the stream that is returned. When a non-default stream is returned, this method also will have automatically configured the output ports of the operator to emit that stream, so manually calling OutputContext::set_cuda_stream is not necessary when using this method.
  • The InputContext::receive_cuda_streams method is intended for advanced use cases where the user wants to handle all streams found and their synchronization manually. It just returns a vector<std::optional<``cudaStream_t``>> where the size of the vector is equal to the number of messages found on the input port. Any messages without a stream will have a std::nullopt entry in the vector.
  • The ExecutionContext::allocate_cuda_stream method can be used if it is necessary to allocate an additional stream for use by the operator. In most cases, this will not be necessary and the stream that is returned by InputContext::receive_cuda_stream can be used.
  • The ExecutionContext::device_from_stream method can be used to determined which CUDA device id a given cudaStream_t returned by InputContext::receive_cuda_stream or InputContext::receive_cuda_streams belongs to.
  • The OutputContext::set_cuda_stream method can be used to emit specific streams on specific output ports. Any non-default stream received by InputContext::receive_cuda_stream would already automatically be output, so this method is mainly useful if doing manual management of the streams received via InputContext::receive_cuda_streams or if additional internal streams were allocated via ExecutionContext::allocate_cuda_stream.
#include <holoscan/gxf/gxf_cuda.hpp>

Inherits from: holoscan::CudaObjectHandler (public)


Constructors

Destructor

~CudaObjectHandler

holoscan::gxf::CudaObjectHandler::~CudaObjectHandler() overrideholoscan::gxf::CudaObjectHandler::~CudaObjectHandler() override

Destroy the CudaObjectHandler object.


Methods

init_from_operator

void holoscan::gxf::CudaObjectHandler::init_from_operator(
Operator *op
) override

Use a CudaStreamPool from the specified Operator if one is present.

Parameters

op
Operator *

: The operator this instance of CudaObjectHandler is attached to. This operator must have already been initialized.

is_gpu_available

bool holoscan::gxf::CudaObjectHandler::is_gpu_available() const override

Check if GPU capability is present on the system.

Returns: true if GPU(s) are available, false if no GPU is present

add_stream

Overload 1
Overload 2
gxf_result_t holoscan::gxf::CudaObjectHandler::add_stream(
const CudaStreamHandle &stream_handle,
const std::string &output_port_name
)

Add stream to output port (must be called before any emit call using that port).

Returns: gxf_result_t

Parameters

stream_handle
const CudaStreamHandle &

The stream to add

output_port_name
const std::string &

The name of the output port

get_cuda_stream_handle

expected<CudaStreamHandle, RuntimeError> holoscan::gxf::CudaObjectHandler::get_cuda_stream_handle(expected<CudaStreamHandle, RuntimeError> holoscan::gxf::CudaObjectHandler::get_cuda_stream_handle(
gxf_context_t context,
const std::string &input_port_name,
bool allocate = true,
bool sync_to_default = false
)

Get the CUDA stream handle which should be used for CUDA commands involving data from the specified input port.

For multi-receivers or input ports with queue size > 1, the first stream found is returned after any remaining streams are synchronized to it.

See get_cuda_stream_handles() instead to receive a vector of (optional) CUDA stream handles (one for each message).

If no message stream is set and the allocate flag is true, a stream will be allocated from the internal CudaStreamPool. Only if this allocation fails, would an unexpected be returned.

Returns: CudaStreamHandle

Parameters

context
gxf_context_t

The GXF context of the operator.

input_port_name
const std::string &

The name of the input port from which to retrieve the stream.

allocate
boolDefaults to true

If true, allocate a new stream via a cuda_stream_pool parameter if no stream is found.

sync_to_default
boolDefaults to false

If true, synchronize any streams to the default stream. If false, synchronization is done to the internal stream instead.

get_cuda_stream_handles

expected<std::vector<std::optional<CudaStreamHandle>>, RuntimeError> holoscan::gxf::CudaObjectHandler::get_cuda_stream_handles(expected<std::vector<std::optional<CudaStreamHandle>>, RuntimeError> holoscan::gxf::CudaObjectHandler::get_cuda_stream_handles(
gxf_context_t context,
const std::string &input_port_name
)

Get the CUDA stream handles which should be used for CUDA commands involving data from the specified input port.

The size of the vector returned will be equal to the number of messages received on the input port. Any messages which did not contain a stream will result in a std::nullopt in the vector.

Returns: vector<std::optional<CudaStreamHandle>>

Parameters

context
gxf_context_t

The GXF context of the operator.

input_port_name
const std::string &

The name of the input port from which to retrieve the stream.

get_cuda_stream

cudaStream_t holoscan::gxf::CudaObjectHandler::get_cuda_stream(
void *context,
const std::string &input_port_name,
bool allocate = false,
bool sync_to_default = true
) override

Get the CUDA stream which should be used for CUDA commands involving data from the specified input port.

For multi-receivers or input ports with queue size > 1, see get_cuda_streams() instead to receive a vector of CUDA streams (one for each message).

If no message stream is set and no stream can be allocated from the internal CudaStreamPool, returns CudaStreamDefault.

Returns: cudaStream_t

Parameters

context
void *

The GXF context of the operator.

input_port_name
const std::string &

The name of the input port from which to retrieve the stream

allocate
boolDefaults to false

If true, allocate a new stream via a cuda_stream_pool parameter if none is found on the input port. Otherwise, cudaStreamDefault will be returned.

sync_to_default
boolDefaults to true

If true, synchronize any streams to the default stream. If false, synchronization is done to the first stream found on the port instead.

get_cuda_streams

std::vector<std::optional<cudaStream_t>> holoscan::gxf::CudaObjectHandler::get_cuda_streams(
void *context,
const std::string &input_port_name
) override

Get the CUDA stream which should be used for CUDA commands involving data from the specified input port.

The size of the vector returned will be equal to the number of messages received on the input port. Any messages which did not contain a stream will result in a cudaStreamDefault in the vector.

Returns: vector<std::optional<cudaStream_t>>

Parameters

context
void *

The GXF context of the operator.

input_port_name
const std::string &

The name of the input port from which to retrieve the stream

synchronize_streams

Overload 1
Overload 2
gxf_result_t holoscan::gxf::CudaObjectHandler::synchronize_streams(
std::vector<std::optional<CudaStreamHandle>> stream_handles,
CudaStreamHandle target_stream_handle,
bool sync_to_default_stream = true
)

Sync all streams in stream_handles with target_stream_handle.

Any streams in stream_handles that are not valid will be ignored.

Returns: gxf_result_t GXF_SUCCESS if all streams were successfully synced.

Parameters

stream_handles
std::vector<std::optional<CudaStreamHandle>>

The vector of streams to sync.

target_stream_handle
CudaStreamHandle

The stream to sync to.

sync_to_default_stream
boolDefaults to true

If true, also synchronize the target stream to the default stream

stream_from_stream_handle

cudaStream_t holoscan::gxf::CudaObjectHandler::stream_from_stream_handle(
CudaStreamHandle stream_handle
)

Get the cudaStream_t value corresponding to a CudaStreamHandle.

Returns: The CUDA stream contained within the CudaStream object

Parameters

stream_handle
CudaStreamHandle

The CudaStreamHandle

stream_handle_from_stream

expected<CudaStreamHandle, RuntimeError> holoscan::gxf::CudaObjectHandler::stream_handle_from_stream(expected<CudaStreamHandle, RuntimeError> holoscan::gxf::CudaObjectHandler::stream_handle_from_stream(
cudaStream_t stream
)

Get the CudaStreamHandle corresponding to a cudaStream_t.

Returns: GXF Handle to the CudaStream object if found, otherwise an unexpected is returned.

Parameters

stream
cudaStream_t

The CUDA stream

get_output_stream_cid

expected<gxf_uid_t, ErrorCode> holoscan::gxf::CudaObjectHandler::get_output_stream_cid(
const std::string &output_port_name
)

Get the GXF component ID for any stream to be emitted on the specified output port.

Returns: expected<gxf_uid_t>

Parameters

output_port_name
const std::string &

The name of the output port

streams_from_message

gxf_result_t holoscan::gxf::CudaObjectHandler::streams_from_message(
gxf_context_t context,
const nvidia::gxf::Entity &message,
const std::string &input_name
)

Get the GXF component IDs for any events to be emitted on the specified output port.

Returns: expected<std::vector<gxf_uid_t>>

Parameters

context
gxf_context_t

The GXF context

message
const nvidia::gxf::Entity &

The GXF message entity

input_name
const std::string &

The name of the input port

allocate_internal_stream

expected<CudaStreamHandle, RuntimeError> holoscan::gxf::CudaObjectHandler::allocate_internal_stream(expected<CudaStreamHandle, RuntimeError> holoscan::gxf::CudaObjectHandler::allocate_internal_stream(
gxf_context_t context,
const std::string &stream_name
)

Allocate an internal CUDA stream and store it in the mapping for the given input port.

Returns: GXF Handle to the allocated CudaStream component

Parameters

context
gxf_context_t

The GXF context

stream_name
const std::string &

The name of the stream

release_internal_streams

int holoscan::gxf::CudaObjectHandler::release_internal_streams(
void *context
) override

Release all internally allocated CUDA streams.

clear_received_streams

void holoscan::gxf::CudaObjectHandler::clear_received_streams() override

Retain the existing unordered_maps and vectors of received streams, but clear the contents.

This is used to refresh the state of the received streams before each Operator::compute call.

allocate_cuda_stream

expected<CudaStreamHandle, RuntimeError> holoscan::gxf::CudaObjectHandler::allocate_cuda_stream(expected<CudaStreamHandle, RuntimeError> holoscan::gxf::CudaObjectHandler::allocate_cuda_stream(
gxf_context_t context
)

allocate a new stream from the internal stream pool

from_messages

gxf_result_t holoscan::gxf::CudaObjectHandler::from_messages(
gxf_context_t context,
size_t message_count,
const nvidia::gxf::Entity *messages
)

cuda_stream_pool_handle

expected<nvidia::gxf::Handle<nvidia::gxf::CudaStreamPool>, RuntimeError> holoscan::gxf::CudaObjectHandler::cuda_stream_pool_handle(expected<nvidia::gxf::Handle<nvidia::gxf::CudaStreamPool>, RuntimeError> holoscan::gxf::CudaObjectHandler::cuda_stream_pool_handle(expected<nvidia::gxf::Handle<nvidia::gxf::CudaStreamPool>, RuntimeError> holoscan::gxf::CudaObjectHandler::cuda_stream_pool_handle(
gxf_context_t context
)

Member variables

NameTypeDescription
cuda_stream_pool_Parameter< std::shared_ptr< CudaStreamPool > >CUDA stream pool used to allocate the internal CUDA stream.
cuda_green_context_Parameter< std::shared_ptr< CudaGreenContext > >CUDA green context used to create cuda stream pool.
cuda_green_context_pool_Parameter< std::shared_ptr< CudaGreenContextPool > >CUDA green context pool used to allocate the internal CUDA stream using green context partitions.
default_stream_warning_boolIf the CUDA stream pool is not set and we can’t use the incoming CUDA stream, issue a warning once.
cuda_event_cudaEvent_tCUDA event used to synchronize the internal CUDA stream with multiple incoming streams.
event_created_boolFlag to indicate if the internal CUDA event has been created yet.
received_cuda_stream_ids_std::unordered_map< std::string, std::vector< std::optional< CudaStreamId > > >Mapping from input port name to any CUDA stream found in the incoming Message.
received_cuda_stream_handles_std::unordered_map< std::string, std::vector< std::optional< CudaStreamHandle > > >
allocated_cuda_stream_handles_std::unordered_map< std::string, CudaStreamHandle >Allocated internal CUDA stream handles Mapping from input port name to an internally allocated CUDA stream.
emitted_cuda_stream_cids_std::unordered_map< std::string, gxf_uid_t >Mapping from output port name to the GXF Component Id of any stream to be emitted on that port.
stream_to_stream_handle_std::unordered_map< cudaStream_t, CudaStreamHandle >
gpu_present_boolFlag to indicate if GPU is present on the system.