Holoscan Sample Applications

This section provides details on the following Holoscan sample applications:

  • Endoscopy Tool Tracking: based on a LSTM (long-short term memory) stateful model, this application demonstrate the use of custom components for tool tracking, including composition and rendering of text, tool position, and mask (as heatmap) combined with the original video stream.

  • Hi-Speed Endoscopy: this application showcases how high resolution cameras can be used to capture the scene, post-processed on GPU and displayed at high frame rate.

  • Ultrasound Segmentation: this application includes a generic visualization of segmentation results from a spinal scoliosis segmentation model of ultrasound videos. The model used is stateless, so this workflow could be configured to adapt to any vanilla DNN model.

  • Multi AI: this application demonstrates how to run multiple inference pipelines in a single application by leveraging the Holoscan Inference Module.

It also demonstrates how easy it can be to customize some of these applications by bringing your own model.


Note

Refer to the instructions in the Holoscan NGC container overview or the Github source repository to run the application, depending on how you’ve chosen to install the SDK.

Digital endoscopy is a key technology for medical screenings and minimally invasive surgeries. Using real-time AI workflows to process and analyze the video signal produced by the endoscopic camera, this technology helps medical professionals with anomaly detection and measurements, image enhancements, alerts, and analytics.

app_endoscopy.png

Fig. 18 Endoscopy (laparoscopy) image from a cholecystectomy (gallbladder removal surgery) showing AI-powered frame-by-frame tool identification and tracking. Image courtesy of Research Group Camma, IHU Strasbourg and the University of Strasbourg (NGC Resource).

The Endoscopy tool tracking application provides an example of how an endoscopy data stream can be captured and processed using the C++ or Python APIs on multiple hardware platforms.

workflow_tool_tracking_replayer.png

Fig. 19 Tool tracking application workflow with replay from file

The pipeline uses a recorded endoscopy video file (generated by convert_video_to_gxf_entities script) for input frames. Each input frame in the file is loaded by Video Stream Replayer and passed to the following two branches:

  • In the first branch, the input frames are directly passed to Holoviz for rendering in the background.

  • In the second branch, the frames go through the Format Converter to convert the data type of the image from uint8 to float32 before it is fed to the tool tracking model (with Custom TensorRT Inference). The result is then ingested by the Tool Tracking Postprocessor which extracts the masks, points, and text from the inference output, before Holoviz renders them as overlays.

The pipeline graph also defines an optional Video Stream Recorder that can be enabled to record the original video stream to disk (record_type: 'input'), or the final render by Holoviz (record_type: 'visualizer') after going from RGBA8888 to RGB888 using a Format Converter. Recording is disabled by default (record_type: 'none') in order to maximize performance.

workflow_tool_tracking_aja.png

Fig. 20 Tool tracking application workflow with input from AJA video source

The pipeline is similar to the one using the recorded video, with the exceptions below:

  • the input source is replaced with AJA Source (pixel format is RGBA8888 with a resolution of 1920x1080)

  • the Format Converter in the inference pipeline is configured to also resize the image, and convert to float32 from RGBA8888

  • the Format Converter in the recording pipeline is used for record_type: INPUT also

For AJA cards that support Hardware Keying, the pipeline can also be configured to overlay the segmentation results on the input video on the AJA card FPGA, instead of on the GPU: when is_aja_overlay_enabled is True, the overlay layer is sent from Holoviz back to the AJA Source operator which handles the alpha blending and outputs it to a port of the the AJA card. The blended image is also sent back to the Holoviz operator (instead of the input video only) for rendering the same image buffer.

Note

The Hi-Speed Endoscopy Application is not included in the Holoscan container on NGC. Instead, refer to the run instructions in the Github source repository.

The hi-speed endoscopy application showcases how high resolution cameras can be used to capture the scene, post-processed on GPU, and displayed at high frame rate.

This application requires:

  1. an Emergent Vision Technologies camera (see setup instructions)

  2. an NVIDIA ConnectX SmartNIC with Rivermax SDK and drivers installed

  3. a display with high refresh rate to keep up with the camera’s framerate

  4. additional setups to reduce latency

Tip

Tested on the Holoscan DevKits (ConnectX included) with:

hi_speed_endoscopy_app.png

Fig. 21 Hi-Speed Endoscopy App

The data acquisition happens using emergent-source, by default it is set to 4200x2160 at 240Hz. The acquired data is then demosaiced in GPU using CUDA via bayer-demosaic and displayed through holoviz-viewer.

The peak performance that can be obtained by running these applications with the recommended hardware, GSYNC and RDMA enabled on exclusive display mode is 10ms on Clara AGX Devkit and 8ms on NVIDIA IGX Orin DevKit ES. This is the photon-to-glass latency of a frame from scene acquisition to display on monitor.

Troubleshooting

  1. Problem: The application fails to find the EVT camera.

  2. Problem: The application fails to open the EVT camera.

    • Make sure that the application was run with sudo privileges.

    • Make sure a valid Rivermax license file is located at /opt/mellanox/rivermax/rivermax.lic.

  3. Problem: The application fails to connect to the EVT camera with error message “GVCP ack error”.

    • It could be an issue with the HR12 power connection to the camera. Disconnect the HR12 power connector from the camera and try reconnecting it.


Note

Refer to the instructions in the Holoscan NGC container overview or the Github source repository to run the application, depending on how you’ve chosen to install the SDK.

This section describes the details of the ultrasound segmentation sample application as well as how to load a custom inference model into the application for some limited customization. Out of the box, the ultrasound segmentation application comes as a “video replayer” and “AJA source”, where the user can replay a pre-recorded ultrasound video file included in the holoscan container or stream data from an AJA capture device directly through the GPU respectively.

This application performs an automatic segmentation of the spine from a trained AI model for the purpose of scoliosis visualization and measurement.

This application is available in C++ and Python API variants.

app_ultrasound.png

Fig. 22 Spine segmentation of ultrasound data (NGC Resource)

workflow_segmentation_replayer.png

Fig. 23 Segmentation application workflow with replay from file

The pipeline uses a recorded ultrasound video file (generated by convert_video_to_gxf_entities script) for input frames. Each input frame in the file is loaded by Video Stream Replayer and passed to the following two branches:

  • In the first branch, the input frames are directly passed to Holoviz for rendering in the background.

  • In the second branch, the frames go through the Format Converter to convert the data type of the image from uint8 to float32 and resize the image before it is fed into the segmentation model using TensorRT Inference. The result is then ingested by the Segmentation Postprocessor which extracts the masks from the inference output, before Holoviz renders them as overlays.

workflow_segmentation_aja.png

Fig. 24 Segmentation application workflow with input from AJA video source

The pipeline is similar to the one using the recorded video, with the exceptions below:

  • the input source is replaced with AJA Source (pixel format is RGBA8888 with a resolution of 1920x1080)

  • a Format Converter is added in the inference pipeline to convert from RGBA8888 (note: could have updated the configuration of the next Format Converter when using AJA instead of adding another operator in the pipeline)


Note

Refer to the instructions in the Holoscan NGC container overview or the Github source repository to run the application, depending on how you’ve chosen to install the SDK.

To run multiple inference pipelines in a single application, the Multi AI operators (inference and postprocessor) use APIs from the Holoscan Inference module to extract data, initialize and execute the inference workflow, process, and transmit data for visualization.

This application uses models and echocardiogram data from iCardio.ai. The models include:

  • a Plax chamber model, that identifies four critical linear measurements of the heart

  • a B-Mode Perspective Classifier model, that determines confidence of each frame to known 28 cardiac anatomical view as defined by the guidelines of the American Society of Echocardiography

  • an Aortic Stenosis Classification model, that provides a score which determines likeability for the presence of aortic stenosis

multiai_icardio_app.png

Fig. 25 Multi AI sample application workflow

The pipeline uses a recorded ultrasound video file (generated by convert_video_to_gxf_entities script) for input frames. The data is loaded by Video Stream Replayer and forwarded to the following operators for pre-processing:

  • B-mode Perspective Preprocessor: Entity uses Format Converter to convert the data type of the image to float32 and resize the data to 320x240 per frame.

  • Plax Chamber Resized: Entity uses Format Converter to resize the input image to 320x320x3 with RGB888 image format for visualization.

  • Plax Chamber Preprocessor: Entity uses Format Converter to convert the data type of the image to float32 and resize the data to 320x320 per frame.

  • Aortic Stenosis Preprocessor: Entity uses Format Converter to convert the data type of the image to float32 and resize the data to 300x300 per frame.

Then:

  • the Multi AI Inference uses outputs from three preprocessors to execute the inference.

  • Multi AI Postprocessor uses the inferred output to process as per specifications.

  • The Visualizer iCardio extension is used to generate visualization components for the plax chamber output.

  • Visualization components are finally fed into HoloViz to generate the visualization.

The sample application outputs demonstrates 5 keypoints identified by the Plax Chamber model. Keypoints are connected in the output frame as shown in the image below.

multiai_icardio_output.png

Fig. 26 Multi AI sample application. Data courtesy of iCardio.ai (NGC Resource)

Aortic Stenosis and B-mode Perspective models are the Classification models. Classification results can be printed using print keyword against the output tensors from the Classification models in the Multi AI Postprocessor settings. Printing of the results is optional and can be ignored by removing relevant entries in the post processor settings.

Note

The Holoscan SDK provides capability to process all models in ONNX, TensorRT FP32 and TensorRT FP16 format. Classification models (Aortic Stenosis and B-mode Perspective), do not support TensorRT FP16 conversion. Plax Chamber model is supported for all available formats (ONNX, TensorRT FP32 and TensorRT FP16).


This section shows how the user can easily swap their own model by modifying one of the ultrasound segmentation applications above, even with a different modality.

Prerequisites

Input video

The sample applications and the examples below demonstrate how to load input data from pre-recorded videos. At this time, video files need to be converted into a GXF replayable tensor format to be used as stream inputs. This step has already been done for the sample applications. To do so for your own video data, we provide a utility script on GitHub named convert_video_to_gxf_entities.py. This script should yield two files in .gxf_index and .gxf_entities formats, which can be used as inputs with Holoscan.

Note

We plan to add native support for generic video codecs in future releases.

Input model

Currently, the inference operators in Holoscan are able to load ONNX models, or TensorRT engine files built for the GPU architecture on which you will be running the model. TensorRT engines are automatically generated from ONNX by the operators when the applications run.

If you are converting your model from PyTorch to ONNX, chances are your input is NCHW and will need to be converted to NHWC. We provide an example transformation script on Github named graph_surgeon.py. You may need to modify the dimensions as needed before modifying your model.

Tip

To get a better understanding of your model, and if this step is necessary, websites such as netron.app can be used.

Customizing the Ultrasound Segmentation Application For Your Model

In this use case we will use the ultrasound segmentation application to implement a polyp segmentation model to run on a Colonoscopy sample video. The colonoscopy model and data are available from the Colonoscopy Sample Application Data NGC Resource, which are included in the NGC container or when running from source.

Tip

The full patch from ultrasound to colonoscopy segmentation - and instructions to apply it - are available on Github under examples/bring_your_own_model.

We will now need to modify both the python and yaml files to change our segmentation application from ultrasound to colonoscopy.

  1. First, lets change the following lines in apps/ultrasound_segmentation/python/ultrasound_segmentation.py: This change is to point to the colonoscopy files (models and video):

    Copy
    Copied!
                

    - model_path = os.path.join(sample_data_path, "ultrasound", "model") + model_path = os.path.join(sample_data_path, "colonoscopy", "model")

    Copy
    Copied!
                

    - model_file_path = os.path.join(model_path, "us_unet_256x256_nhwc.onnx") - engine_cache_dir = os.path.join(model_path, "us_unet_256x256_nhwc_engines") + model_file_path = os.path.join(model_path, "colon.onnx") + engine_cache_dir = os.path.join(model_path, "engine")

    Copy
    Copied!
                

    - video_dir = os.path.join(sample_data_path, "ultrasound", "video") + video_dir = os.path.join(sample_data_path, "colonoscopy", "video")

    This change is to allocate enough memory to preprocess our input video data which has a different size:

    Copy
    Copied!
                

    - width_preprocessor = 1264 - height_preprocessor = 1080 + width_preprocessor = 1350 + height_preprocessor = 1072

    This change is to allocate enough memory to our inference operator given the inference model size difference (next diff in yaml):

    Copy
    Copied!
                

    - width_inference = 256 - height_inference = 256 + width_inference = 512 + height_inference = 512

  2. Similarly, we will edit the apps/ultrasound_segmentation/python/ultrasound_segmentation.yaml for some other parameters: Video file name:

    Copy
    Copied!
                

    - # directory: "../data/ultrasound/video" - basename: "ultrasound_256x256" + # directory: "../data/colonoscopy/video" + basename: "colonoscopy"

    Resizing necessary to adjust to the model expected input size:

    Copy
    Copied!
                

    - resize_width: 256 - resize_height: 256 + resize_width: 512 + resize_height: 512

    Name of the output tensor layer in this new model:

    Copy
    Copied!
                

    output_binding_names: - - OUTPUT__0 + - output_old

    Activation function at the end of the model:

    Copy
    Copied!
                

    - network_output_type: softmax + network_output_type: sigmoid

  3. Now we may run the application with the new model and data.

    Copy
    Copied!
                

    python apps/ultrasound_segmentation/python/ultrasound_segmentation.py

colonoscopy_sample.jpg

Fig. 27 Colonoscopy polyps segmentation (NGC Resource)

Customizing the Multi AI Application For Your Model

This section shows how to create a multi AI application using the Holoscan Inference Module. The section will list down the steps to create the inference component of the pipeline only, pre-processing and post-processing steps are not covered in this section.

  1. Use the multiai_inference dictionary at apps/multiai/cpp/app_config.yaml as a template.

  2. Populate the elements in the dictionary based on your own set of models

    • Update the model_path_map dictionary keys with unique strings that refer to each model, and set the values to the path to the model files on disk. All models must be either in onnx or in tensorrt engine file format. The Holoscan Inference Module will do the onnx to tensorrt model conversion if the TensorRT engine files do not exist.

    • The pre_processor_map dictionary should contain the same keys as model_path_map, mapping to a list of the input tensors for each model. Only one value per model is supported. Models can have the same input tensor name as their input.

    • The inference_map dictionary should contain the same keys as model_path_map, mapping to the output tensor name for each model.

    • The in_tensor_names list should contain all the input tensor names.

    • The out_tensor_names list contain the list of all the output tensor names.

    • Update the backend as desired: if the input models are in tensorrt engine file format, select trt as the backend. If the input models are in onnx format, select either trt or onnx as the backend.

  3. Other parameters and the limitations are described in the Parameters and related features section of the Holoscan Inference Module.

  4. Once the inference component is created or updated, you can execute the Multi AI application as described above

Common Pitfalls Deploying New Models

Color Channel Order

It is important to know what channel order your model expects. This may be indicated by the training data, pre-training transformations performed at training, or the expected inference format used in your application.

For example, if your inference data is RGB, but your model expects BGR, you will need to add the following to your segmentation_preprocessor in the yaml file: out_channel_order: [2,1,0].

Normalizing Your Data

Similarily, default scaling for streaming data is [0,1], but dependent on how your model was trained, you may be expecting [0,255].

For the above case you would add the following to your segmentation_preprocessor in the yaml file:

scale_min: 0.0 scale_max: 255.0

Network Output Type

Models often have different output types such as Sigmoid, Softmax, or perhaps something else, and you may need to examine the last few layers of your model to determine which applies to your case.

As in the case of our colonoscopy example above, we added the following in our yaml file: network_output_type: sigmoid

© Copyright 2022, NVIDIA. Last updated on Mar 20, 2023.