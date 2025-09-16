Q1: How do I create a new application with the Holoscan SDK?

A1: To create a new application:

Define the core concepts such as Operators, Fragments, and their interactions. Use the provided templates and examples as a starting point.

For more details, please refer to the Holoscan by Example section.

Q2: What are some example applications provided in the SDK?

A2: Example applications include:

Hello World: Basic introduction to the SDK.

Ping Simple and Ping Custom Op: Demonstrates simple data processing.

Video Replayer: Shows how to process and display video streams.

Medical imaging examples like ultrasound and endoscopy processing

For a list of example applications, please visit the Holoscan by Example section.

Q3: How can I integrate my own AI models into the Holoscan SDK?

A3: Integrating your own AI models involves:

Converting your model to a compatible format (e.g., TensorRT, ONNX). Ensuring that your data preprocessing and postprocessing steps align with the model’s requirements.

For more information on how to bring your own model in Holsocan SDK and build an inference example, please refer to this example .

Q4: How can I update the VideoStreamReplayerOp and the VideoStreamRecorderOp to use a custom file format ?

A4: Holoscan SDK depends on GXF.GXF is using Entity-Component-System paradigm. Holoscan uses GXF as an execution engine and Holoscan’s API abstracts Entity-Component-System and abstracts GXF node as Operator with input/output ports.

Most messages between Codelets(Operator in Holoscan) are also an entity object. An entity can hold multiple components/types. In VideoStreamReplayerOp and VideoStreamRecorderOp, an entity object that holds one or more GXF Tensor objects (as a component) is sent/received - you can think that an entity as a dictionary of objects – <key,object> map.For VideoStreamReplayerOp and VideoStreamRecorderOp, it currently uses a custom file type ( .gxf_entities and .gxf_index files) to load and store a sequence of entity objects (in this case, an entity has one GXF Tensor object). .gxf_index file include a file offset/timestamp information for each entity and .gxf_entities includes a series of (serialized) entity data.Serializing/deserializing an entity object is done by using nvidia::gxf::EntitySerializer class (with StdEntitySerializer implementation), and nvidia::gxf::FileStream endpoint.The official way to support GDS in GXF would be to extend nvidia::gxf::FileStream class so it uses cufile(GDS) internally. However, setting the development environment wouldn’t be straightforward. This are the steps you would need to follow:

Update VideoStreamRecorderOp::compute() to use your own implementation to save an entity (as a single tensor) to the file system. Example



Copy Copied! auto in_message = op_input.receive<holoscan::TensorMap>("in").value();

Update VideoStreamReplayerOp::compute() to use your own implementation to read the file (with the custom format) and emit it as an entity (holding tensor(s) as a component – it is called TensorMap). Example:



Copy Copied! nvidia::gxf::Expected<nvidia::gxf::Entity> out_message = CreateTensorMap(context.context(), pool.value(), {{out_tensor_name_.get(), nvidia::gxf::MemoryStorageType::kDevice, out_shape, out_primitive_type_, 0, nvidia::gxf::ComputeTrivialStrides(out_shape, dst_typesize)}}, false);

You need to update initialize() and other methods to get rid of nvidia::gxf::FileStream and nvidia::gxf::FileStream endpoint.For testing VideoStreamReplayerOp, you can just use VideoReplayerApp example.You can develop/test/create operator (release) binaries by following the user guide: https://github.com/nvidia-holoscan/holoscan-sdk/blob/main/DEVELOP.md

Copy Copied! class VideoReplayerApp : public holoscan::Application { public: void compose() override { using namespace holoscan; // Sets the data directory to use from the environment variable if it is set ArgList args; auto data_directory = std::getenv("HOLOSCAN_INPUT_PATH"); if (data_directory != nullptr && data_directory[0] != '\0') { auto video_directory = std::filesystem::path(data_directory); video_directory /= "racerx"; args.add(Arg("directory", video_directory.string())); } // Define the replayer and holoviz operators and configure using yaml configuration auto replayer = make_operator<ops::VideoStreamReplayerOp>("replayer", from_config("replayer"), args); auto visualizer = make_operator<ops::HolovizOp>("holoviz", from_config("holoviz")); // Define the workflow: replayer -> holoviz add_flow(replayer, visualizer, {{"output", "receivers"}}); } };

Copy Copied! ./run build ./run launch # inside the container ./examples/video_replayer/cpp/video_replayer

As an alternative, you can create a separate Holoscan Operator and apply it with other sample applications (such as endoscopy tool tracking app) by following HoloHub’s guide (https://github.com/nvidia-holoscan/holohub).You can also use Holoscan SDK’s installation binary with holoscan install dir created by ./run build with Holoscan SDK repo.

Q5: How can I use the Inference Operator with Python tensor?

A5: The Inference Operator accepts holoscan::Entity or holoscan::TensorMap (similar to the dictionary of Array-like objects in Python) as an input message.

For example, you can define an operator processing input video (as a tensor).You can find a more detailed example of this type of operator together with an example by referencing the tensor interop example.

Copy Copied! This operator has: inputs: "input_tensor" outputs: "output_tensor" The data from each input is processed by a CuPy gaussian filter and the result is sent to the output. ``` def compute(self, op_input, op_output, context): # in_message is of dict in_message = op_input.receive("input_tensor") # smooth along first two axes, but not the color channels sigma = (self.sigma, self.sigma, 0) # out_message is of dict out_message = dict() for key, value in in_message.items(): print(f"message received (count: {self.count})") self.count += 1 cp_array = cp.asarray(value) # process cp_array cp_array = ndi.gaussian_filter(cp_array, sigma) out_message[key] = cp_array op_output.emit(out_message, "output_tensor") ```

Q6: Is there support in the Holoscan SDK, particularly for models written as Triton Python backends like NVIDIA’s FoundationPose?

A6: Triton backends are not currently supported.The Inference Operator supports TensorRT (trt), ONNX Runtime (onnxrt), and Torch backends.

For more information on the Inference Operator please refer to the section in the User Guide regarding the Inference Operator .

Q7: Can I directly use a .pth (PyTorch) model file with the Holoscan SDK’s inference operator? A7:No, you cannot use a .pth model file directly with the Holoscan SDK. Here’s why and what you can do instead:

Holoscan SDK’s Torch backend is based on libtorch, which requires models to be in TorchScript format. Converting a .pth model to TorchScript is a manual process and cannot be done automatically within the SDK. For the best performance and ease of use, it’s recommended to: a) Use a TensorRT (TRT) model if available. b) If you have an ONNX model, you can convert it to TRT automatically within the SDK. Using a TRT model (or converting from ONNX to TRT) will likely provide the fastest inference and be the easiest to set up with the Holoscan SDK.

In summary, while direct .pth file usage isn’t supported, converting to TensorRT or using ONNX with automatic TRT conversion are the recommended approaches for optimal performance and compatibility with the Holoscan SDK.

Q8: Can I use multiple models with the Inference Operator?

A8: Yes, you can use multiple models by specifying them in the model_path_map parameter. For more information, please refer to the Parameters section of the Inference Operator in the Holoscan User Guide.

Q9: How can I enable parallel inference for multiple models?

A9: Parallel inference is enabled by default. To disable it, set parallel_inference : false in the parameter set. For more information, please refer to the Parameters section of the Inference Operator in the Holoscan User Guide.

Q9: Can I use different backends for different models in the same application?

A9: Yes, you can specify different backends for different models using the backend_map parameter.For more information, please refer to the Parameters section of the Inference Operator in the Holoscan User Guide.

Q10: Can I perform inference on the CPU?

A10: Yes, you can perform inference on the CPU by setting infer_on_cpu : true and use either the ONNX Runtime or PyTorch backend.For more information, please refer to the Parameters section of the Inference Operator in the Holoscan User Guide.

Q11:Can I control where the input and output data is stored (CPU vs GPU memory)?

A11: Yes, use the input_on_cuda, output_on_cuda , and transmit_on_cuda parameters to control data location.For more information, please refer to the Parameters section of the Inference Operator in the Holoscan User Guide.

Q12: How can I use the Optional flag?

A12: In Python, there are two ways to define parameter:

Using spec.param() method in Python’s setup() method of the operator , usually done when wrapping the existing C++ operator.

Parameters are passed to the Constructor ( __init__() ) directly. In Python there is no try_get() method in the parameter. Instead, the default value is set to None, allowing us to check whether the parameter is set by users by verifying if the parameter value is None.

Q13:How can I define an Operator’s creator for passing custom arguments?

A13:Feeding custom data to the constructor of an Operator in the compose() method is crucial. When you use the make_operator<>() template method in C++ or the Python Operator constructor, the setup() method is called internally, which prevents you from passing custom data (such as configuration values) after make_operator<>() is called. In C++, to pass non-condition/argument data to the constructor of a C++ Operator class, you need to define an additional constructor to accept your custom data. For example, you can define a constructor that accepts a std::vector<std::string> argument for the list of output port names as a second parameter.

Q14:How can I stop an application?

A14:There are two approaches to stopping an application:

using BooleanCondition on replayer operator

Copy Copied! std::string op_name = "replayer"; std::string param_name = "boolean_scheduling_term"; // get the operator holoscan::Operator* op = nullptr; auto& app_graph = fragment()->graph(); if (!app_graph.is_empty()) { op = app_graph.find_node(op_name).get(); } if (!op) { HOLOSCAN_LOG_ERROR("Operator '{}' is not defined", op_name); return; } // Stop executing compute() for 'replayer' operator auto boolean_condition = op->condition<holoscan::BooleanCondition>(param_name); boolean_condition->disable_tick(); // Stop executing compute() for this operator boolean_condition = condition<holoscan::BooleanCondition>(param_name); boolean_condition->disable_tick(); return;

To terminate the application smoothly, it is recommended to rely on the stop-on-deadlock feature in the scheduler. By default, the stop_on_deadlock parameter of GreedyScheduler is set to true. In case the VideoReplayer Operator stops, the entire pipeline will stop.

using interrupt()

Copy Copied! fragment()->executor().interrupt();

Please note that using interrupt() forces to terminate the execution and can cause error messages, and the recommendation is using deadlock-based approach.

As an alternative, you can also use the CountCondition .Please refer to the section. At a high level, this is how attaching a CountCondition to an operator works:

The operator starts in a READY state. Each time the operator executes, the count decreases by 1. When the count reaches 0, the operator’s state changes to NEVER. In the NEVER state, the operator stops executing.

For example, if you want to run the application 100 times and then stop it:

Copy Copied! auto my_operator = make_operator<MyOperator>("my_operator", make_condition<CountCondition>(100));

Q15:How can I loop an output.emit() call within the operator?

A15: Each input or output port has its own queue. Internally, the process works as follows:

Before the compute() method of an operator A is triggered, for each input port (usually backed by DoubleBufferReceiver ), data (messages) in the backstage of the input port’s queue are moved to the main stage of the queue. This is done using router->syncInbox(entity) . The compute() method of operator A is triggered. For each output port of operator A, data in the output port’s queue are moved to the queue (backstage) of the downstream operator’s input port using router->syncOutbox(entity) .

By default, the queue capacity of the input/output port is set to 1, although this can be configured in the setup() method. This is why we cannot call output.emit() multiple times in a compute() method, as doing so can cause a GXF_EXCEEDING_PREALLOCATED_SIZE error.

With the GreedyScheduler , which is the default scheduler using a single thread to trigger an operator’s compute() method, no other operator can be scheduled until the compute() method of the current operator returns.

To address this challenge, we might consider creating a utility method or class designed to accept a generator or iterator object. This approach would be particularly effective within a compute() method, especially if the operator is a source operator without input ports. It would enable the method to preserve the state of the input and either call output.emit() for each yielded value in a single compute() invocation or return without blocking the thread.

The Python API code to override the connector would be something like this if we wanted a queue with capacity 20 and policy of “reject” (discard) the item if the queue is full:

Copy Copied! from holoscan.core import IOSpec # and then within the setup method define the output using the connector method like this spec.output("out1").connector( IOSpec.ConnectorType.DOUBLE_BUFFER, capacity=20, policy=1 )

For the policy options:

0 = pop (if the queue is full, remove an item from the queue to make room for the incoming one)

1 = reject (if the queue is full, reject the new item)

2 = fault (terminate the application if the queue is full and a new item is added)

For completeness, to explicitly specify both the connector and its conditions, the syntax should be:

Copy Copied! # The default setting for an output should be equivalent to explicitly specifying spec.output("out1").connector( IOSpec.ConnectorType.DOUBLE_BUFFER, capacity=1, policy=2 ).condition( ConditionType.DOWNSTREAM_MESSAGE_AFFORDABLE, min_size=1, front_stage_max_size=1 )

Q16: How can I add a green border and a small image to a corner to a Holoviz Operator?

A16: You can follow the Holoviz examples here:

Holoviz geometry example : https://github.com/nvidia-holoscan/holoscan-sdk/blob/main/examples/holoviz/python/holoviz_geometry.py for border examples

Holoviz views example https://github.com/nvidia-holoscan/holoscan-sdk/blob/main/examples/holoviz/python/holoviz_views.py for view/image example

Q17 : What is the difference between setup vs initialize vs __init__ ?

A17: Since v0.6 release, Holoscan Operator does “lazy initialization” and Operator instance creation ( super().__init__(*args, **kwargs) ) doesn’t initialize (calling Operator.initialize(self) ) the corresponding GXF entity anymore. Currently, setting the class members in Python is done when Operator is initialized by GXF Executor.The purpose of setup method is for getting “operator’s spec” by providing OperatorSpec object (spec param) to the method. When __init__ is called, it calls C++’s Operator::spec(const std::shared_ptr<OperatorSpec>& spec) method (and also sets self.spec class member), and call setup method so that Operator’s spec() method hold the operator’s specification. Since setup method can be called multiple times with other OperatorSpec object (e.g., to enumerate the description of the operator), in the setup method, user shouldn’t initialize something in the Operator object. Such initialization needs to be done in initialize method. __init__ method is for creating Operator object. it can be used for initializing operator object itself by passing miscellaneous arguments, but it doesn’t ‘initialize’ corresponding GXF entity object.

Q18:I’d like to use a CUDA stream allocated by the Holoscan SDK in a non-Holoscan library (OpenCV, CuPy, PyTorch). All these 3rd party libraries support CUDA streams, allocators etc. but they have different objects to represent that CUDA Stream (such as a cupy.cuda.Stream ). I need to get the Holoscan CUDA stream and convert it to a cupy.cuda.Stream in a similar way a Holoscan Tensor is converted to a CuPy array with memory pointers.Please propose a solution.

A18:There is a CudaStreamHandler utility that works via GXF APIs in the C++ layer. We have not currently created a Python API to allow users to use it from the compute methods of native Python operators.In general, the underlying GXF library is currently refactoring how CUDA streams are handled and we plan to then improve the stream handling on Holoscan SDK after that.You can use CuPy or other 3rd party stream APIs within their own native operators and pass the stream objects as a Python object between your own native operators. I think this doesn’t help with the issue you are facing as you want to reuse a stream allocated by some upstream wrapped C++ operator provided by the SDK there is currently no proper way to do that from Python.

Q19:What is the purpose of the activation_map parameter in the Holoscan Holoinfer operator? A19: The activation_map parameter allows users to enable or disable model inferences dynamically at runtime. It can be used to decide on which frames to run inference for each model.

Q20:Is there an existing example or template that demonstrates the simultaneous use of integrated GPU (iGPU) and discrete GPU (dGPU) in a Holoscan application pipeline? Specifically, I am looking for a sample workflow that includes:

Receiving and processing data on the iGPU of an AGX Orin Transferring the processed data to a dGPU Running multiple AI models on the dGPU Displaying results using the dGPU

A20:To leverage both the integrated GPU (iGPU) and discrete GPU (dGPU) on your IGX system with Holoscan, please refer to the IGX user guide. This guide provides detailed instructions on utilizing the iGPU in containers when the IGX developer kit is configured in dGPU mode.

For Holoscan applications, there are two primary approaches to utilize both GPUs:

Concurrent Application Execution: Run separate applications simultaneously, as outlined in the IGX documentation. The iGPU application must be executed within the Holoscan iGPU container, while the dGPU application can be run either natively or within the Holoscan dGPU container. Distributed Application: Develop a single distributed application that utilizes both GPUs by executing distinct fragments on the iGPU and dGPU respectively.

To illustrate the second approach, consider the following example using the ‘ping’ distributed application. This demonstrates communication between the iGPU and dGPU using Holoscan containers:

Copy Copied! COMMON_DOCKER_FLAGS="--rm -i --init --net=host --runtime=nvidia -e NVIDIA_DRIVER_CAPABILITIES=all --cap-add CAP_SYS_PTRACE --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 " HOLOSCAN_VERSION=2.2.0 HOLOSCAN_IMG="nvcr.io/nvidia/clara-holoscan/holoscan:v$HOLOSCAN_VERSION" HOLOSCAN_DGPU_IMG="$HOLOSCAN_IMG-dgpu" HOLOSCAN_IGPU_IMG="$HOLOSCAN_IMG-igpu" # Pull necessary images docker pull $HOLOSCAN_DGPU_IMG docker pull $HOLOSCAN_IGPU_IMG # Execute ping distributed (Python) in dGPU container # Note: This instance serves as the 'driver', but the iGPU could also fulfill this role # The '&' allows for non-blocking execution, enabling subsequent iGPU command docker run \ $COMMON_DOCKER_FLAGS \ $HOLOSCAN_DGPU_IMG \ bash -c "python3 ./examples/ping_distributed/python/ping_distributed.py --gpu --worker --driver" & # Execute ping distributed (C++) in iGPU container docker run \ $COMMON_DOCKER_FLAGS \ -e NVIDIA_VISIBLE_DEVICES=nvidia.com/igpu=0 \ $HOLOSCAN_IMG-igpu \ bash -c "./examples/ping_distributed/cpp/ping_distributed --gpu --worker"

Q21:Is there an efficient method to configure Holoscan to enable concurrent processing pipelines? My objective is to implement a system where frame acquisition and processing can occur simultaneously. Specifically, I aim to initiate the reading of a subsequent frame while the current frame is still undergoing processing through the InferenceOp.To illustrate:

Is it possible to begin reading Frame N+1 while Frame N is still being processed by the InferenceOp? Or does Holoscan require the completion of all operations on Frame N before initiating any operations on Frame N+1?

If concurrent processing is achievable, what would be the recommended approach to implement such a system within the Holoscan framework?

A21: The NVIDIA GXF framework provides a nvidia::gxf::BroadcastCodelet with a “round robin” mode that offers an alternative to the standard broadcast behavior. This mode sequentially directs input messages to different output ports in rotation. While this functionality was accessible in Holoscan 2.1 through the GXFCodeletOp, we could also develop a native operator that provides equivalent utility.

The GXF source defines the modes as follows:

Copy Copied! enum struct BroadcastMode { kBroadcast = 0, // publishes incoming message to all transmitters kRoundRobin = 1, // publishes incoming message to one transmitter in round-robin fashion };

For the gathering operation, GXF implements the nvidia::gxf::Gather codelet. This codelet transfers any messages from the receive ports to the output port. The tick_source_limit parameter can be configured to cease checking for additional received messages on other ports once a specified maximum number of messages to output has been reached.

It’s important to note that the GXF Gather codelet may not inherently preserve the order in which inference operations were called. While messages might naturally be processed in the order they were received if inference operations complete sequentially, this behavior is not guaranteed.

To ensure strict ordering, we could develop a custom native operator. This operator would sequentially check for messages on each port, beginning with port 1, and only move to the next port once a message has been received on the current port. This approach would guarantee that frames are processed and displayed in the correct order.

This custom implementation would provide greater control over the message flow and ensure the integrity of the processing pipeline, particularly in scenarios where maintaining the original frame order is critical.

Q22: How can I use other libraries in my Holoscan SDK application pipeline? A22: Refer to the dedicated HoloHub tutorial for an overview of how to use external C++ or Python libraries in your custom Holoscan SDK application.

Q23: How can I ensure proper data flow and handling in a Holoscan pipeline with branching paths, especially when inline updates are performed on shared data?

A23:In a Holoscan pipeline with branching paths, such as:

Copy Copied! A -> B -> C -> D \ -> E

There are several considerations and potential solutions to ensure proper data flow and handling, especially when operators like C or D perform inline updates to the data.

Data Access Guarantee: E is guaranteed to access the data egressing from B. However, in a multithreaded scheduler, careful attention must be paid to potential data race conditions. Execution Order: The current implementation maintains the order of root nodes, but the creation of GXF connections between B -> C and B -> E is randomly determined due to the use of std::unordered_map as the graph data structure. This randomness can affect which connection is prioritized in GXF. Potential Solutions: a. PeriodicCondition: One approach is to use a PeriodicCondition to control the execution timing of operators. Here’s an example:

Copy Copied! from holoscan.conditions import CountCondition, PeriodicCondition from holoscan.core import Application from holoscan.operators import PingRxOp, PingTxOp class MyPingApp(Application): def compose(self): b = PingTxOp(self, CountCondition(self, 10), name="B") c = PingRxOp(self, PeriodicCondition(self, 20_000_000), name="C") e = PingRxOp(self, name="E") self.add_flow(b, c) self.add_flow(b, e)

In this example, the PeriodicCondition is used to ensure that C executes only after a specified period (20 milliseconds in this case) has elapsed. This can help control the timing of data processing between different branches. b. Custom Operator: Developing a custom native operator could provide more control over the message flow. This operator could sequentially check for messages on each port, ensuring that frames are processed and displayed in the correct order. c. Data Copying: To avoid issues with inline updates affecting shared data, consider implementing a mechanism to create copies of the data for each branch. This ensures that modifications in one branch don’t unintentionally affect the other.

Limitations and Considerations:

The GXF Gather codelet may not inherently preserve the order in which inference operations were called.

When using mock objects for testing (as mentioned in the background information), modifying the original pipeline structure might be challenging. In such cases, focusing on data copying or careful timing control might be more feasible.

Future Improvements: Updating the graph structure to use std::map instead of std::unordered_map for succ_ and pred_ could potentially provide more predictable behavior in terms of connection creation order.

Q24:I’m inquiring about the availability of a Holoscan example that demonstrates distributed processing across two networked computers. Specifically, I’m interested in a scenario where:

One computer captures a frame, potentially using an AJA capture card. The captured frame is then transferred over a local network to a second computer. The second computer receives and displays the frame.

Additionally, I have some questions regarding the networking aspects of such a setup:

Does Holoscan provide flexibility in selecting the transport layer for this inter-computer communication? Is it possible to utilize WebRTC as the transport protocol in this scenario?

A24: There are two relevant approaches:

WebRTC Implementation: A reference application demonstrating WebRTC-based video streaming is available in the HoloHub repository. You can find this example at: https://github.com/nvidia-holoscan/holohub/tree/main/applications/webrtc_video_server This application showcases how WebRTC can be utilized for inter-computer communication within the Holoscan framework. Distributed Application Approach: An alternative method involves creating a distributed application with separate fragments running on each node. For more information, please refer to the section in the User guide on Creating Distributed applications .

Q25: How can I use run_async() to launch an application in a separate thread and stop the application?

A25:We can set the event state to EVENT_WAITING (request sent to an async service, pending event done notification) and then EVENT_DONE (event done notification received, entity ready to be ticked) to allow a specific operator to wait/resume its operation. In the example, it calls AsynchronousCondition::event_state(AsynchronousEventState::EVENT_NEVER) to set the status of the condition to NEVER. (BooleanCondition does the same by setting the status of the condition to NEVER when BooleanCondition::disable_tick() is called). This means the operator does not want to be ticked again (end of execution).Once the state of the condition goes to NEVER (internally, SchedulingConditionType::NEVER), it marks the end of execution and cannot be undone.

Q26:Are there any existing applications or examples in the Holoscan ecosystem that demonstrate the ability to utilize multiple GPUs concurrently within a single application?

A26:The multi ai ultrasound application has settings for multi GPU in a different YAML file. It can be controlled by the inference parameters.

Q27: What is the role of a scheduler in Holoscan?

A27: The scheduler is responsible for determining when each operator in an application will execute.

Q28: How many types of schedulers are available in the Holoscan SDK?

A28: There are three schedulers available: Greedy Scheduler, Multi-Thread Scheduler, and Event-Based Scheduler.

Q29:Which scheduler is used by default for non-distributed applications?

A29:Non-distributed applications use the Greedy Scheduler by default.

Q30:What is the main characteristic of the Greedy Scheduler?

A30: The Greedy Scheduler has only a single worker thread that executes operators sequentially in a deterministic order.

Q31:How does the Multi-Thread Scheduler work?

A31: It’s a polling-based scheduler with a user-defined number of worker threads and a dedicated thread that polls operators at a user-defined interval.

Q32:What is unique about the Event-Based Scheduler?

A32:The Event-Based Scheduler waits for events indicating changes in operator readiness, rather than constantly polling.

Q33:How can the Event-Based Scheduler reduce CPU overhead?

A33:By eliminating the need for constant polling, it can significantly reduce CPU usage compared to the Multi-Thread Scheduler in certain scenarios.

Q34:In what situations do Multi-thread and Event-Based Schedulers show benefits?

A34:They show benefits in scenarios with multiple operators that can run simultaneously, potentially providing significant speedup compared to the Greedy Scheduler.

Q35:How do the Multi-Thread and Event-Based Schedulers compare in terms of performance?

A35:They often have similar runtime performance, but the Event-Based Scheduler tends to have lower CPU overhead on average.

Q36:Are there scenarios where using multi-thread schedulers might not be beneficial?

A36: Yes, for linear inference pipelines or applications with minimal computation per operator, multi-thread schedulers might not provide significant benefits and could even introduce overhead.

Q37: How does the number of worker threads affect performance in multi-thread schedulers?

A37: Increasing the number of worker threads can improve performance up to a point, but it also increases CPU usage.

Q38: Is there any memory pool (allocator) that supports both host and device memory?