Holoscan accelerates streaming AI applications by leveraging both hardware and software. The Holoscan SDK relies on multiple core technologies to achieve low latency and high throughput:
The Holoscan Developer Kits can be used along with the NVIDIA Rivermax SDK to provide an extremely efficient network connection using the onboard ConnectX network adapter that is further optimized for GPU workloads by using GPUDirect for RDMA. This technology avoids unnecessary memory copies and CPU overhead by copying data directly to or from pinned GPU memory, and supports both the integrated GPU or the discrete GPU.
NVIDIA is also committed to supporting hardware vendors enable RDMA within their own drivers, an example of which is provided by the AJA Video Systems as part of a partnership with NVIDIA for the Holoscan SDK. The AJASource extension is an example of how the SDK can leverage RDMA.
For more information about GPUDirect RDMA, see the following:
Minimal GPUDirect RDMA Demonstration source code, which provides a real hardware example of using RDMA and includes both kernel drivers and userspace applications for the RHS Research PicoEVB and HiTech Global HTG-K800 FPGA boards.
The Graph Execution Framework (GXF) is a core component of the Holoscan SDK that provides features to execute pipelines of various independent tasks with high performance by minimizing or removing the need to copy data across each block of work, and providing ways to optimize memory allocation.
GXF will be mentioned in many places across this user guide, including a dedicated section which provides more details.
NVIDIA TensorRT is a deep learning inference framework based on CUDA that provided the highest optimizations to run on NVIDIA GPUs, including the Holoscan Developer Kits.
GXF comes with a TensorRT base extension which is extended in the Holoscan SDK: the updated TensorRT extension is able to selectively load a cached TensorRT model based on the system GPU specifications.
Similarly, the new inference module leverages TensorRT and provides the ability to execute multiple inferences in parallel as demonstrated in the MultiAI sample application.
The TensorRT extension will be deprecated in favor of operators leveraging the new inference module in a future release.
OpenGL and Vulkan are commonly used for realtime visualization and, like CUDA, are executed on the GPU. This provides an opportunity for efficient sharing of resources between CUDA and those rendering frameworks.
The OpenGL and Segmentation Visualizer extensions use the OpenGL interoperability functions provided by the CUDA runtime API.
The Holoviz module uses the external resource interoperability functions of the low-level CUDA driver application programming interface, the Vulkan external memory and external semaphore extensions.
The OpenGL extension will be deprecated in favor of Vulkan/Holoviz in a future release.
Streaming image processing often requires common 2D operations like resizing, converting bit widths, and changing color formats. NVIDIA has built the CUDA accelerated NVIDIA Performance Primitive Library (NPP) that can help with many of these common transformations. NPP is extensively showcased in the Format Converter extension of the Holoscan SDK.