9.4. Clara Deploy Base Inference Operator V2

This asset requires the Clara Deploy SDK. Follow the instructions on the Clara Bootstrap page to install the Clara Deploy SDK.

9.4.1. Overview

The NVIDIA Clara Train SDK and MMAR provide pre-trained models unique to medical imaging, with additional capabilities such as integration with the AI-assisted Annotation SDK for increasing the annotation speed of medical images. This allows the access to AI-assisted labeling [Reference].

To accelerate the deployment of Clara Train pre-trained models using Clara Deploy SDK, this containerized AI inference application was developed as a base container, which can be customized for deploying a specific pre-trained model. The customized container can then be used as the AI inference operator in Clara Deploy pipelines.

Customizing this base container requires the inference or validation configuration file used during model training with Clara Train. In addition, the trained model must have been exported using a format compatible with TRITON (formerly TRTIS), the TensorRT Inference Server. Steps on how to create model specific containers are provided in the following sections.

This base inference application uses the same set of transform functions and the same scanning window inference logic as Clara Train SDK 3.0. The output writer, however, is specific to Clara Deploy due to the need to support registration of Clara Deploy pipeline results.

9.4.1.1. Version information

This base inference application is targeted to run in the following environment:

Ubuntu 18.04
Python 3.6
NVIDIA TensorRT Inference Server Release 1.15.0 supporting the V1 interface, container version 20.07-v1-py3

9.4.2. Inputs

This application, in the form of a Docker container, expects an input folder (/input by default), which can be mapped to the host volume when the Docker container is started. This folder should contain a volume image file in NIfTI or MetaImage format. Furthermore, the volume image should be constructed from a single series of a DICOM study, typically the axial series with the data type of the original primary.

9.4.3. Outputs

This application saves the segmentation results to an output folder (/output by default), which also can be mapped to a folder on the host volume. After the application completes successfully, a segmentation volume image in MetaImage format is saved in the output folder.

The name of the output file is the same as that of the input file, due to certain limitations of the downstream operator in Clara Deploy SDK.

This container also publishes data for the Clara Deploy Render Server in the /publish folder by default. The original volume image, segmented volume image, along with config files for the Render Server, are saved in this folder.

9.4.4. AI Model

For testing, this base application uses a model trained using the NVIDIA Clara Train SDK V3.0 for lung segmentation, namely segmentation_ct_lung_v1. It is converted from a TensorFlow Checkpoint model to tensorflow_graphdef using the Clara Train SDK model export tool. The input tensor is of the shape 320 x 320 x 64 with a single channel, and the output is of the same shape with two channels.

The key model attributes (e.g. the model name) must be present in the config_inference.json file and is consumed by this application at runtime.

9.4.4.1. NVIDIA Triton Inference Server

This application performs inference on Triton (formerly known as TRTIS), the NVIDIA Triton Inference Server, which provides an inferencing solution optimized for NVIDIA GPUs. The server provides an inference service via an HTTP or gRPC endpoint, allowing remote clients to request inferencing for any model being managed by the server. You can read more on Triton here.

9.4.5. Directory Structure

The application source code files are in the directory structure shown below.

Copy
Copied!

            
            /
├── app_base_inference_v2
    ├── ai4med
    ├── config
    │   ├── config_render.json
    │   ├── config_inference.json
    │   └── __init__.py
    ├── dlmed
    ├── inferers
    ├── model_loaders
    ├── ngc
    ├── public
    ├── utils
    ├── writers
    ├── app.py
    ├── Dockerfile
    ├── executor.py
    ├── logging_config.json
    ├── main.py
    └── requirements.txt

The following describes the directory contents:

The ai4med and dlmed directories contain the library modules shared with Clara Train SDK, mainly for its transforms functions and base inference client classes.
The config directory contains model-specific configuration files, which is needed when building a customized container for a specific model.
- The config_inference.json file contains the configuration sections for pre- and post-transforms, as well as the model loader, inferer, and writer.
- The config_render.json contains the configuration for the Clara Deploy Render Server.
The inferers directory contains the implementation of the simple and scanning window inference client using the Triton API client library
The model_loaders directory contains the implementation of the model loader that gets model details from Triton Inference Server.
The ngc and public directories contain the user documentation.
The utils directory contains utilities for loading modules and creating application objects.
The Writers directory contains the specialized output writer required by Clara Deploy SDK, which saves the segmentation result to a volume image file as MetaImage.

The model name must be correctly specified in the inferer property in the config_inference.json file, as shown in the following example:

Copy
Copied!

            
            "inferer":
{
    "name": "TRTISScanWindowInferer",
    "args": {
        "model_name": "segmentation_ct_lung_v1",
        "ip": "localhost",
        "port": 8000,
        "protocol": "HTTP",
        "output_type": "RAW"
    }
}

9.4.6. Executing Operator as Docker Container

9.4.6.1. Prerequisites

Use the docker images command to check that the Docker image of Triton has been imported into the local Docker repository. Look for the image name tritonserver and the correct tag for the release, e.g. 20.07-v1-py3. The Docker image can also be pulled from NVIDIA if not present locally.
Ensure that the model folder, including the config.pbtxt, is present on the Clara Deploy host. Verify it using the following steps:
- Log on to the Clara Deploy host.
- Check for the folder segmentation_ct_lung_v1 under the directory /clara/common/models or /clara/repository/models.

9.4.6.2. Step 1

Change to your working directory (e.g. test).

9.4.6.3. Step 2

Create, if they do not exist, the following directories under your working directory:

input containing the input image file.
output for the AI inference output.
publish for publishing data for the Render Server.
logs for the log files.
models for models, and copy over segmentation_ct_lung_v1 folder.

9.4.6.4. Step 3

Note: If this base inference application container has already been pulled from NGC, tag the container:

Copy
Copied!

            
            docker tag <pulled base container> app_base_inference_v2:latest

In your working directory, create a shell script( e.g. run_base_docker.sh) and copy the content below.

Copy
Copied!

            
            #!/bin/bash

# Copyright (c) 2019, NVIDIA CORPORATION.  All rights reserved.
#
# NVIDIA CORPORATION and its licensors retain all intellectual property
# and proprietary rights in and to this software, related documentation
# and any modifications thereto.  Any use, reproduction, disclosure or
# distribution of this software and related documentation without an express
# license agreement from NVIDIA CORPORATION is strictly prohibited.

# Clara Platform server would launch the container with the following environment variables internally,
# to provide runtime information.
export NVIDIA_CLARA_TRTISURI="localhost:8000"

# Container name; add the version tag as needed if not retagged.
APP_NAME="app_base_inference_v2"

# Name of the model used in this app.
MODEL_NAME="segmentation_ct_lung_v1"

# Specific version of the Triton Inference Server image used in testing
TRITON_IMAGE="nvcr.io/nvidia/tritonserver:20.07-v1-py3"

# Docker network used by the app and Triton Docker container.
NETWORK_NAME="container-demo"

# Create network
docker network create ${NETWORK_NAME}

# Run Triton(name: triton), maping ./models/${MODEL_NAME} to /models/${MODEL_NAME}
# (localhost:8000 will be used)
RUN_TRITON="nvidia-docker run --name triton --network${NETWORK_NAME}-d --rm --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 \
-p 8000:8000 \
-v$(pwd)/models/${MODEL_NAME}:/models/${MODEL_NAME}${TRITON_IMAGE}\
tritonserver --model-repository=/models"

# Display the command
echo ${RUN_TRITON}
# Run the command to start the inference server Docker
eval ${RUN_TRITON}

# Wait until Triton is ready
triton_local_uri=$(docker inspect -f '{{range .NetworkSettings.Networks}}{{.IPAddress}}{{end}}' triton)
echo -n "Wait until Triton${triton_local_uri}is ready..."
while [ $(curl -s ${triton_local_uri}:8000/api/status | grep -c SERVER_READY) -eq 0 ]; do
    sleep 1
    echo -n "."
done
echo "done"

export NVIDIA_CLARA_TRTISURI="${triton_local_uri}:8000"

# Run ${APP_NAME} container.
# Like below, Clara Core would launch the app container with the following environment variables internally,
# to provide input/output path information.
# (They are subject to change. Do not use the environment variables directly in your application!)
docker run --name ${APP_NAME} --network ${NETWORK_NAME} -t --rm \
    -v $(pwd)/input:/input \
    -v $(pwd)/output:/output \
    -v $(pwd)/logs:/logs \
    -v $(pwd)/publish:/publish \
    -e NVIDIA_CLARA_TRTISURI \
    -e DEBUG_VSCODE \
    -e DEBUG_VSCODE_PORT \
    -e NVIDIA_CLARA_NOSYNCLOCK=TRUE \
    ${APP_NAME}

echo "${APP_NAME}has finished."

# Stop Triton container
echo "Stopping Triton"
docker stop triton > /dev/null

# Remove network
docker network remove ${NETWORK_NAME} > /dev/null

9.4.6.5. Step 4

Execute the created script:

Copy
Copied!

            
            ./run_base_docker.sh

Wait for the application container to finish.

9.4.6.6. Step 5

Check for the following output files.

In the output folder (whose contents will be consumed by DICOM object writers in a Clara Deploy pipeline):

Segmentation image file, both .mhd and .raw for MetaImage format. File name may appear to be the same as the input

In the publish folder (whose contents will be registered for Render Server in a Clara Deploy pipeline):

The original volume image, in either MHD or NIfTI format (e.g. image.mhd and image.raw)
The segmentation volume image (e.g. image.out.mhd and image.out.raw)
The rendering configuration file (config_render.json)
A metadata file describing the other files (config.meta)

9.4.6.7. Step 6

To visualize the segmentation results, any tool that support MHD or NFiTI can be used, e.g. 3D Slicer.

9.4.7. Executing Operator Docker Container Interactively

To see the internals of the container, and to manually run the application, follow these steps. Please note that the Triton server with the required model must be accessible from within this container–otherwise, a failure will occur.

See the above section on how to run the container with the required environment variables and volume mapping, and start the container by replacing the docker run command with the following: .. code-block:: bash

docker run -it –rm –entrypoint /bin/bash
Once in the Docker terminal, ensure the current directory is /.
Execute the following command: .. code-block:: bash

python3 ./app_base_inference_v2/main.py
Once finished, type exit.

9.4.8. Creating Model Specific Application

This section describes how to use the base application container to build a model-specific container to deploy Clara pre-trained models.

9.4.8.1. Prerequisites

First, prepare data files using Clara Train SDK:

With the Export tool, export the trained model to a platform compatible with TRTIS (e.g. tensorflow_graphdef). The server side configuration file, config.pbtxt, must also be generated. For details, please refer to the Triton and Clara Train SDK documentation.
The validation and inference configuration file must be available.
A test dataset of the volume image, in NIfTI or MetaImage format, is available for testing the container directly.
A test dataset of the DICOM studies is available for testing the Clara Deploy pipeline created with the customized application as its inference operator.

9.4.8.2. Steps

9.4.8.2.1. Step 1

Pull the base application container into the local Docker registry, if not already present.

9.4.8.2.2. Step 2

Create a folder, e.g. my_custom_app, with the structure shown below:

Copy
Copied!

            
            my_custom_app
├── config
│   ├── config_inference.json
│   └── config_render.json
└── Dockerfile

where the config_render.json contains the transfer functions for the rendering, and config_inference.json can be copied from the configuration file used during training validation and needs to be modified in the next step.

9.4.8.2.3. Step 3

For a model specific inference operator, the following are the necessary top level configuration properies in the JSON file, config_inference.json:

batch_size: it is always 1.
pre_transforms: this corresponds to the pre_transforms in validation configuration used during training with Clara Train SDK. Ensure there exists a transform function ScaleBySpacing or ScaleByResolution if the transforms includes loading a image file. This is needed to scale the input image pixel spacings to those required by the model.
post_transforms: this corresponds to the post_transforms in validation configuration used during training with Clara Train SDK. Ensure there are transform functions RestoreOriginalShape and CopyProperties if loading image and scaling pixel spacing is used in the pre_transforms.
writers: only the writer property for the model is needed, and as of now, only the output data type is used.
inferer: this needs to specify Triton specific properties, see below for more details.
model_loader: it is included for future use, and its properties are not used as of now.

Open the config_inference.json file. Remove properties that are not needed, and modify the pre_transforms and post_transforms sections as needed. Change the inferer section as shown below by adding and changing the required properties. The model_name must be changed to that of model used in the inference:

Copy
Copied!

            
            "inferer":
{
    "name": "TRTISScanWindowInferer",
    "args": {
        "model_name": "segmentation_ct_lung_v1",
        "ip": "localhost",
        "port": 8000,
        "protocol": "HTTP"
    }
},

9.4.8.2.4. Step 4

Open the Dockerfile and update it with the content shown below.

Note: Update the actual app_base_inference_v2 container name and tag if they are different in your environment.

Copy
Copied!

            
            # Build upon the named base container; version tag can be used if known.
FROM app_base_inference_v2:latest

# This is a well known folder in the base container. Please do not change it.
ENV BASE_NAME="app_base_inference_v2"

# This is the name of the folder containing the config files; same as the app name.
ENV MY_APP_NAME="my_custom_app"

# Copy configuration files to overwrite base defaults
COPY ./$MY_APP_NAME/config/* ./$BASE_NAME/config/

9.4.8.2.5. Step 5

Build the customized container of the name defined in APP_NAME with the following command, or run the command using the shell script.

Copy
Copied!

            
            APP_NAME="my_custom_app"
docker build -t ${APP_NAME} -f ${APP_NAME}/Dockerfile .

9.4.9. License

An End User License Agreement is included with the product. By pulling and using the Clara Deploy asset on NGC, you accept the terms and conditions of these licenses.

9.4.10. Suggested Reading

Release Notes, the Getting Started Guide, and the SDK itself are available at the NVIDIA Developer forum: (https://developer.nvidia.com/clara).

For answers to any questions you may have about this release, visit the NVIDIA Devtalk forum: (https://devtalk.nvidia.com/default/board/362/clara-sdk/).