For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
  • Introduction
    • Overview
    • Relevant Technologies
    • Getting Started
  • Setup
    • SDK Installation
    • Additional Setup
    • Third Party Hardware Setup
  • Using the SDK
    • Holoscan Core
    • GPU Resident Execution
    • Holoscan by Example
    • Create an Application
    • Create a Distributed Application
    • Create an Operator
    • Create an Operator via Decorator
    • Create a Condition
    • Dynamic Flow Control
    • CUDA Stream Handling
    • Logging
    • Data Logging
    • Debugging
    • Python Operator Bindings
  • Operators
    • Operators and Extensions
    • Visualization
    • Inference
    • Testing
    • Video I/O Vendor Implementation Guide
  • Components
    • Schedulers
    • Conditions
    • Resources
    • Analytics
  • AI Skills
    • Ai Skills
  • API reference
          • Allocator
          • AnnotatedDoubleBufferReceiver
          • AnnotatedDoubleBufferTransmitter
          • AppDriver
          • Application
          • AppWorker
          • Arg
          • ArgList
          • ArgType
          • ArgumentSetter
          • AsyncBufferReceiver
          • AsyncBufferTransmitter
          • AsyncDataLoggerBackend
          • AsyncDataLoggerResource
          • AsynchronousCondition
          • BlockMemoryPool
          • BooleanCondition
          • CLIParser
          • Clock
          • ClockInterface
          • CodecRegistry
          • Component
          • ComponentBase
          • ComponentSpec
          • Condition
          • ConditionCombiner
          • Config
          • CountCondition
          • CPUResourceMonitor
          • CPUThread
          • CsvDataExporter
          • CudaAllocator
          • CudaBufferAvailableCondition
          • CudaContextScopedPush
          • CudaEventCondition
          • CudaFunctionLauncher
          • CudaGreenContext
          • CudaGreenContextPool
          • CudaObjectHandler
          • CudaStreamCondition
          • CudaStreamHandler
          • CudaStreamPool
          • DataExporter
          • DataFlowTracker
          • DataLogger
          • DataLoggerQueue
          • DataLoggerResource
          • DefaultFragmentService
          • DFFTCollector
          • DistributedAppService
          • DLManagedMemoryBufferVersioned
          • DoubleBufferReceiver
          • DoubleBufferTransmitter
          • DownstreamMessageAffordableCondition
          • Endpoint
          • EventBasedScheduler
          • ExecutionContext
          • Executor
          • ExpiringMessageAvailableCondition
          • ExtensionManager
          • FastDdsDiscovery
          • FastDdsEndpoint
          • FastDdsHoloscanEntityTypeSupport
          • FastDdsNativeBufferAdapter
          • FastDdsPubSubContext
          • FastDdsPubSubNetworkContext
          • FastDdsSerializer
          • FastDdsTransport
          • FileFIFOMutex
          • FirstFitAllocator
          • FirstFitAllocatorBase
          • FirstPixelOutCondition
          • FlowGraph
          • FlowGraphImpl
          • Fragment
          • FragmentAllocationStrategy
          • FragmentScheduler
          • FragmentService
          • FragmentServiceProvider
          • GPUDevice
          • GPUResidentDeck
          • GPUResidentExecutor
          • GPUResidentOperator
          • GPUResourceMonitor
          • GreedyFragmentAllocationStrategy
          • GreedyScheduler
          • GXFComponentResource
          • HoloEntitySerializerBase
          • HoloIpcCudaNativeBufferAdapterBase
          • HoloscanAsyncBufferReceiver
          • HoloscanAsyncBufferTransmitter
          • HoloscanLogger
          • HoloscanUcxReceiver
          • HoloscanUcxTransmitter
          • InMemoryPubSubNetworkContext
          • InMemoryPubSubSession
          • InputContext
          • IOSpec
          • LockFreeQueue
          • Logger
          • ManualClock
          • Map
          • MatXAllocator
          • MemoryAvailableCondition
          • Message
          • MessageAvailableCondition
          • MessageLabel
          • MetadataDictionary
          • MetaParameter
          • MultiMessageAvailableCondition
          • MultiMessageAvailableTimeoutCondition
          • MultiThreadScheduler
          • NativeBufferProtocolAdapter
          • NetworkContext
          • Nullable
          • Operator
          • OperatorSpec
          • OrConditionCombiner
          • OrderedQueue
          • OutputContext
          • ParameterWrapper
          • PathMetrics
          • PeriodicCondition
          • PoseTree
          • PoseTreeEdgeHistory
          • PoseTreeManager
          • PoseTreeUCXClient
          • PoseTreeUCXServer
          • PresentDoneCondition
          • PublisherAvailableCondition
          • PubSubContext
          • PubSubReceiver
          • PubSubTransmitter
          • RealtimeClock
          • Receiver
          • Resource
          • RMMAllocator
          • RuntimeError
          • Scheduler
          • ScopedFlock
          • ScopedWaitedFlock
          • SerializationBuffer
          • SessionDiscoveryFrontend
          • SessionTransportFrontend
          • SidecarDispatchQueue
          • SignalHandler
          • SO2
          • SO3
          • StdComponentSerializer
          • StdEntitySerializer
          • StdPubSubEntitySerializer
          • StreamOrderedAllocator
          • Subgraph
          • SubscriberAvailableCondition
          • SyntheticClock
          • SystemResourceManager
          • Tensor
          • TensorMap
          • ThreadPool
          • Timer
          • Topology
          • Transmitter
          • UcxComponentSerializer
          • UcxContext
          • UcxEntitySerializer
          • UcxHoloscanComponentSerializer
          • UcxReceiver
          • UcxSerializationBuffer
          • UcxTransmitter
          • UnboundedAllocator
  • Performance
    • Performance Considerations
    • Flow Tracking
    • GXF Job Statistics
    • Nsight Profiling
  • HoloHub
    • HoloHub Overview
  • FAQ
    • FAQ
NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Your Privacy Choices | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.

LogoLogoDocumentation
On this page
  • Constructors
  • GPUResidentExecutor
  • Destructor
  • ~GPUResidentExecutor
  • Methods
  • run
  • run_async
  • context
  • initialize_fragment
  • initialize_operator
  • initialize_scheduler
  • initialize_network_context
  • initialize_fragment_services
  • prepare_data_flow
  • initialize_cuda
  • device_memory
  • verify_graph_topology
  • timeout_ms
  • tear_down
  • result_ready
  • data_ready
  • is_launched
  • execution_context
  • graph_capture_stream
  • data_ready_handler_capture_stream
  • workload_graph_clone
  • data_ready_device_address
  • result_ready_device_address
  • tear_down_device_address
  • data_ready_handler
  • data_ready_handler_fragment
  • data_not_ready_sleep_interval_us
  • sync_with_host
  • enable_perf_measurement
  • execution_times_us
  • interrupt
  • wait
  • fragment
  • owns_context
  • context_uint64
  • extension_manager
  • exception
  • connect_ports
  • allocate_io_device_buffer
  • connect_io_device_ptr
  • create_gpu_resident_cuda_graph
  • create_cuda_graph_from_operators
  • verify_distinct_operator_names
  • set_unique_ids
  • add_receivers
  • add_control_flow
  • Member variables
API referenceC++ APIHoloscanClasses

holoscan::GPUResidentExecutor

Beta
||View as Markdown|
Previous

holoscan::GPUResidentDeck

Next

holoscan::GPUResidentOperator

#include <holoscan/gpu_resident_executor.hpp>

Inherits from: holoscan::Executor (public)


Constructors

GPUResidentExecutor

From raw pointer
Deleted overload 1
holoscan::GPUResidentExecutor::GPUResidentExecutor(holoscan::GPUResidentExecutor::GPUResidentExecutor(
Fragment *fragment
)

Construct a new GPUResidentExecutor object.

Parameters

fragment
Fragment *

The pointer to the fragment of the executor.

Destructor

~GPUResidentExecutor

holoscan::GPUResidentExecutor::~GPUResidentExecutor()holoscan::GPUResidentExecutor::~GPUResidentExecutor()

Methods

run

void holoscan::GPUResidentExecutor::run(
OperatorFlowGraph &graph
) override

Run the graph.

Parameters

graph
OperatorFlowGraph &

The reference to the graph.

run_async

std::future<void> holoscan::GPUResidentExecutor::run_async(
OperatorFlowGraph &graph
) override

Run the graph asynchronously.

Returns: The future object.

Parameters

graph
OperatorFlowGraph &

The reference to the graph.

context

Set the context
Get the context
virtual
void holoscan::GPUResidentExecutor::context(
void *context
) override

Set the context.

Parameters

context
void *

The context.

initialize_fragment

bool holoscan::GPUResidentExecutor::initialize_fragment() override

Initialize the fragment_ in this Executor.

This method is called by run() to initialize the fragment and the graph of operators in the fragment before execution.

Returns: true if fragment initialization is successful. Otherwise, false.

initialize_operator

bool holoscan::GPUResidentExecutor::initialize_operator(
Operator *op
) override

Initialize the given operator.

This method is called by Operator::initialize() to initialize the operator.

Depending on the type of the operator, this method may be overridden to initialize the operator. For example, the default executor (GXFExecutor) initializes the operator using the GXF API and sets the operator’s ID to the ID of the GXF codelet.

Returns: true if the operator is initialized successfully. Otherwise, false.

Parameters

op
Operator *

The pointer to the operator.

initialize_scheduler

bool holoscan::GPUResidentExecutor::initialize_scheduler(
Scheduler *sch
) override

Initialize the given scheduler.

This method is called by Scheduler::initialize() to initialize the operator.

Depending on the type of the scheduler, this method may be overridden to initialize the scheduler. For example, the default executor (GXFExecutor) initializes the scheduler using the GXF API and sets the operator’s ID to the ID of the GXF scheduler.

Returns: true if the scheduler is initialized successfully. Otherwise, false.

Parameters

sch
Scheduler *

The pointer to the scheduler.

initialize_network_context

bool holoscan::GPUResidentExecutor::initialize_network_context(
NetworkContext *network_context
) override

Initialize the given network context.

This method is called by NetworkContext::initialize() to initialize the operator.

Depending on the type of the network context, this method may be overridden to initialize the network context. For example, the default executor (GXFExecutor) initializes the network context using the GXF API and sets the operator’s ID to the ID of the GXF network context.

Returns: true if the network context is initialized successfully. Otherwise, false.

Parameters

network_context
NetworkContext *

The pointer to the network context.

initialize_fragment_services

bool holoscan::GPUResidentExecutor::initialize_fragment_services() override

Initialize the fragment services for the executor.

This method is called during executor initialization to set up any required fragment services.

Depending on the type of executor, this method may be overridden to initialize specific fragment services. For example, the default executor (GXFExecutor) may initialize fragment services using the GXF API.

Returns: true if the fragment services are initialized successfully. Otherwise, false.

prepare_data_flow

void holoscan::GPUResidentExecutor::prepare_data_flow(
std::shared_ptr<OperatorFlowGraph> graph,
const std::vector<std::shared_ptr<Operator>> &topo_ordered_operators
)

Prepare data flow connections for a topologically ordered GPU-resident graph.

This initializes operator specs, assigns per-port unique IDs, and allocates/connects device memory for every supported edge in the graph.

Parameters

graph
std::shared_ptr<OperatorFlowGraph>

The operator graph.

topo_ordered_operators
const std::vector<std::shared_ptr<Operator>> &

Operators flattened in deterministic topological order.

initialize_cuda

void holoscan::GPUResidentExecutor::initialize_cuda()

This function initializes CUDA.

Currently, it sets the device to 0 by default. Setting a different GPU device for GPU-resident graph execution is not yet supported.

device_memory

void * holoscan::GPUResidentExecutor::device_memory(
std::shared_ptr<Operator> op,
const std::string &port_name
)

This function returns the device memory address of an input or output port corresponding to a given port name.

GPU-resident operators use this function to get the device memory address of the input or output port.

Returns: The device memory address of the input or output port

Parameters

op
std::shared_ptr<Operator>

The operator

port_name
const std::string &

The name of the input or output port

verify_graph_topology

virtual bool holoscan::GPUResidentExecutor::verify_graph_topology(
std::shared_ptr<OperatorFlowGraph> graph,
std::vector<std::shared_ptr<Operator>> &topo_ordered_operators
)

Verify the graph topology and flatten it in topological order.

GPU-resident execution currently supports acyclic graphs with exactly one source operator. This method only validates and flattens the operator graph itself.

Returns: True if the graph topology is supported by GPU-resident execution, false otherwise.

Parameters

graph
std::shared_ptr<OperatorFlowGraph>

The operator graph.

topo_ordered_operators
std::vector<std::shared_ptr<Operator>> &

Output vector populated in deterministic topological order.

timeout_ms

void holoscan::GPUResidentExecutor::timeout_ms(
unsigned long long timeout_ms
)

tear_down

void holoscan::GPUResidentExecutor::tear_down()

Sends a tear down signal to the GPU-resident CUDA graph.

result_ready

bool holoscan::GPUResidentExecutor::result_ready()

Indicates whether the result of a single iteration of the GPU-resident CUDA graph is ready or not.

Returns: true if the result is ready, false otherwise.

data_ready

void holoscan::GPUResidentExecutor::data_ready()

This function informs GPU-resident CUDA graph that the data is ready for the main workload.

is_launched

bool holoscan::GPUResidentExecutor::is_launched()

Indicates whether the GPU-resident CUDA graph has been launched.

Returns: true if the CUDA graph has been launched, false otherwise.

execution_context

std::shared_ptr<ExecutionContext> holoscan::GPUResidentExecutor::execution_context()std::shared_ptr<ExecutionContext> holoscan::GPUResidentExecutor::execution_context()

Get the execution context - currently, this has no meaning for GPU-resident graph execution When we need to store something for execution context, we will store a pointer in the exec_context_ for a ExecutionContext object.

graph_capture_stream

std::shared_ptr<cudaStream_t> holoscan::GPUResidentExecutor::graph_capture_stream()

data_ready_handler_capture_stream

std::shared_ptr<cudaStream_t> holoscan::GPUResidentExecutor::data_ready_handler_capture_stream()

workload_graph_clone

cudaGraph_t holoscan::GPUResidentExecutor::workload_graph_clone() const

data_ready_device_address

void * holoscan::GPUResidentExecutor::data_ready_device_address()

Get the CUDA device pointer for the data_ready signal.

Returns: Pointer to the device memory location for data_ready signal.

result_ready_device_address

void * holoscan::GPUResidentExecutor::result_ready_device_address()

Get the CUDA device pointer for the result_ready signal.

Returns: Pointer to the device memory location for result_ready signal.

tear_down_device_address

void * holoscan::GPUResidentExecutor::tear_down_device_address()

Get the CUDA device pointer for the tear_down signal.

Returns: Pointer to the device memory location for tear_down signal.

data_ready_handler

void holoscan::GPUResidentExecutor::data_ready_handler(
std::shared_ptr<Fragment> fragment
)

Register a data ready handler fragment.

This function stores a reference to the fragment that will handle data ready events.

Parameters

fragment
std::shared_ptr<Fragment>

The fragment to register as the data ready handler.

data_ready_handler_fragment

std::shared_ptr<Fragment> holoscan::GPUResidentExecutor::data_ready_handler_fragment()

Get the registered data ready handler fragment.

Returns: The data ready handler fragment, or nullptr if none is registered.

data_not_ready_sleep_interval_us

void holoscan::GPUResidentExecutor::data_not_ready_sleep_interval_us(
unsigned int sleep_interval_us = 500
)

Set the sleep interval on device when data is not ready.

Parameters

sleep_interval_us
unsigned intDefaults to 500

The sleep interval in microseconds. Default is 500 us.

sync_with_host

void holoscan::GPUResidentExecutor::sync_with_host(
bool enable
)

Enable or disable a system-wide fence in the while-end-marker kernel.

Parameters

enable
bool

True to enable, false to disable.

See also: Fragment::GPUResidentAccessor::sync_with_host for the public-facing API and full documentation.

enable_perf_measurement

void holoscan::GPUResidentExecutor::enable_perf_measurement(
unsigned int num_samples = 100
)

Enable execution time measurement.

Execution time is the time between the start of a streaming data iteration and the end of the same iteration. Execution time is not measured when the data is not marked as ready.

Parameters

num_samples
unsigned intDefaults to 100

The total number of samples to collect. Default is 100.

execution_times_us

std::pair<unsigned int *, unsigned int> holoscan::GPUResidentExecutor::execution_times_us()

Get the host pointer to the execution times in microseconds.

Returns: a pair of the host pointer to the execution times in microseconds and the number of samples collected.

interrupt

virtual bool holoscan::GPUResidentExecutor::interrupt()

Interrupt the execution.

Returns: true if the interrupt was successful (graph was running), false if the graph was not running (already stopped or not started).

wait

virtual void holoscan::GPUResidentExecutor::wait()

Wait for the execution to complete.

This method blocks until the graph execution (started by run_async or interrupted by interrupt()) completes. Should be called after interrupt() to ensure the scheduler has fully stopped before performing cleanup operations.

Only call this if interrupt() returned true. Calling wait() when the graph is not running can cause issues with concurrent cleanup.

fragment

Set the pointer to the fragment of the executor

Get a pointer to Fragment object

void holoscan::GPUResidentExecutor::fragment(
Fragment *fragment
)

Set the pointer to the fragment of the executor.

Parameters

fragment
Fragment *

The pointer to the fragment of the executor.

owns_context

bool holoscan::GPUResidentExecutor::owns_context()

Get whether the context is owned by the executor.

Returns: true if the context is owned by the executor. Otherwise, false.

context_uint64

Overload 1
Overload 2
void holoscan::GPUResidentExecutor::context_uint64(
uint64_t context
)

extension_manager

virtual std::shared_ptr<ExtensionManager> holoscan::GPUResidentExecutor::extension_manager()virtual std::shared_ptr<ExtensionManager> holoscan::GPUResidentExecutor::extension_manager()

Get the extension manager.

Returns: The shared pointer of the extension manager.

exception

Set the exception
Get the stored exception
void holoscan::GPUResidentExecutor::exception(
const std::exception_ptr &e
)

Set the exception.

This method is called by the framework to store the exception that occurred during the execution of the fragment. If the exception is set, this exception is rethrown by the framework after the execution of the fragment.

Parameters

e
const std::exception_ptr &

The exception to store.

connect_ports

void holoscan::GPUResidentExecutor::connect_ports(
std::shared_ptr<Operator> source_op,
std::shared_ptr<Operator> dest_op,
const std::string &source_port,
const std::string &destination_port
)

Inspect the port specs of a single source_port -> destination_port connection and either allocate a shared device buffer or wire an externally-owned device pointer.

Parameters

source_op
std::shared_ptr<Operator>

The upstream operator (owns the output port).

dest_op
std::shared_ptr<Operator>

The downstream operator (owns the input port).

source_port
const std::string &

Name of the output port on source_op.

destination_port
const std::string &

Name of the input port on dest_op.

allocate_io_device_buffer

void holoscan::GPUResidentExecutor::allocate_io_device_buffer(
std::shared_ptr<Operator> source_op,
std::shared_ptr<Operator> dest_op,
const std::string &source_port,
const std::string &target_port,
size_t memory_block_size
)

connect_io_device_ptr

void holoscan::GPUResidentExecutor::connect_io_device_ptr(
std::shared_ptr<Operator> source_op,
std::shared_ptr<Operator> dest_op,
const std::string &source_port,
const std::string &target_port,
void *device_ptr
)

create_gpu_resident_cuda_graph

void holoscan::GPUResidentExecutor::create_gpu_resident_cuda_graph()

This function creates the full GPU-resident CUDA graph.

It also instantiates the CUDA graph to be ready for launch.

create_cuda_graph_from_operators

void holoscan::GPUResidentExecutor::create_cuda_graph_from_operators(
std::vector<std::shared_ptr<Operator>> &topo_ordered_operators,
cudaGraph_t &graph,
cudaStream_t capture_stream
)

verify_distinct_operator_names

bool holoscan::GPUResidentExecutor::verify_distinct_operator_names()

This function verifies that the operator names are distinct between the main workload fragment and the data ready handler fragment.

Assumes topologically ordered operators are already created before calling this function.

Returns: True if the operator names are distinct, false otherwise.

set_unique_ids

void holoscan::GPUResidentExecutor::set_unique_ids(
std::shared_ptr<Operator> op
)

add_receivers

virtual bool holoscan::GPUResidentExecutor::add_receivers(
const std::shared_ptr<Operator> &op,
const std::string &receivers_name,
std::vector<std::string> &new_input_labels,
std::vector<holoscan::IOSpec *> &iospec_vector
)

Add the receivers as input ports of the given operator.

This method is to be called by the Fragment::add_flow() method to support for the case where the destination input port label points to the parameter name of the downstream operator, and the parameter type is ‘std::vector<holoscan::IOSpec*>’. This finds a parameter with with ‘std::vector<holoscan::IOSpec*>’ type and create a new input port with a specific label (‘parameter name:index’. e.g, ‘receivers:0’).

Returns: true if the receivers are added successfully. Otherwise, false.

Parameters

op
const std::shared_ptr<Operator> &

The reference to the shared pointer of the operator.

receivers_name
const std::string &

The name of the receivers whose parameter type is ‘std::vector<holoscan::IOSpec*>’.

new_input_labels
std::vector<std::string> &

The reference to the vector of input port labels to which the input port labels are added. In the case of multiple receivers, the input port label is updated to ‘parameter name:index’ (e.g. ‘receivers’ => ‘receivers:0’).

iospec_vector
std::vector<holoscan::IOSpec *> &

The reference to the vector of IOSpec pointers.

add_control_flow

virtual bool holoscan::GPUResidentExecutor::add_control_flow(
const std::shared_ptr<Operator> &upstream_op,
const std::shared_ptr<Operator> &downstream_op
)

Add a control flow between two operators.

This method is called by Fragment::add_flow() to add a control flow between two operators.

Returns: true if the control flow is added successfully. Otherwise, false.

Parameters

upstream_op
const std::shared_ptr<Operator> &

The shared pointer to the upstream operator.

downstream_op
const std::shared_ptr<Operator> &

The shared pointer to the downstream operator.


Member variables

NameTypeDescription
fragment_initialized_bool
io_device_buffers_std::unordered_map< std::string, std::shared_ptr< holoscan::utils::cuda::DeviceBuffer > >Map of input/output port name to the device buffers (executor-allocated).
io_device_ptrs_std::unordered_map< std::string, void * >Map of input/output port name to externally-owned device pointers.
topo_ordered_main_operators_std::vector< std::shared_ptr< Operator > >Vector of topologically ordered operators.
topo_ordered_drh_operators_std::vector< std::shared_ptr< Operator > >topologically ordered operators of the data ready handler fragment
execution_times_us_dev_std::shared_ptr< holoscan::utils::cuda::DeviceBuffer >Device buffer to store the execution times in microseconds.
start_time_ns_dev_std::shared_ptr< holoscan::utils::cuda::DeviceBuffer >
actual_samples_collected_dev_std::shared_ptr< holoscan::utils::cuda::DeviceBuffer >Device buffer to store the actual number of samples collected.
perf_enabled_bool
num_samples_unsigned int
sync_with_host_bool
exec_context_std::shared_ptr< ExecutionContext >
timeout_ms_unsigned long long
data_not_ready_sleep_interval_us_unsigned int
graph_capture_stream_std::shared_ptr< cudaStream_t >
drh_capture_stream_std::shared_ptr< cudaStream_t >
drh_graph_cudaGraph_tThe CUDA graph of the data ready handler.
workload_graph_cudaGraph_tThe CUDA graph of the main workload.
gpu_resident_graph_cudaGraph_tThe full GPU-resident CUDA graph including control flow nodes.
data_ready_handler_fragment_std::shared_ptr< Fragment >
gpu_resident_deck_std::shared_ptr< GPUResidentDeck >
fragment_Fragment *The fragment of the executor.
context_void *The context.
owns_context_boolWhether the context is owned by the executor.
extension_manager_std::shared_ptr< ExtensionManager >The extension manager.
exception_std::exception_ptrThe stored exception.