10.1. Clara Deploy Operator Development Guide
This section describes how to develop an application to run in a Docker container on the NVIDIA Clara Deploy SDK, with emphasis on AI inference application. Also described are the steps to configure AI models and make them available to the NVIDIA Triton Inference Server.
The NVIDIA Clara Deploy SDK provides open, scalable computing platform that enables development of
medical imaging applications for hybrid (embedded, on-premise, or cloud) computing environments to
create intelligent instruments and automated healthcare pipelines. Applications are containerized
and are used as operators in pipelines. A containerized application can be used in more than one
operator in a pipeline, and also can be used in different pipelines. Please refer to sections
Introduction and Core Concepts for additional details.
An application Docker image needs to be built from Clara Deploy Python base image in order to be used
as an operator and deployed in Clara Deploy pipelines.
This is different from previous versions when Argo orchestration mode was still supported on Clara Deploy platform.
Please see section on Pipeline for details.
The primary I/O model for an operator, and its constituent application container, is through
persistent volumes mounted by the Clara Deploy Platform server. An operator can define any number of inputs
whose individual path is a volume or a rooted directory. Each input also specifies which other
operator’s output it will receive input data, and if no other operator is named, the initial payload
to the pipeline becomes the input. This is basically how nodes in a directed acyclic graph, DAG, are
connected, and in Clara Deploy the pipeline definition is a DAG.
Operators can also define environment variables that will be set by the Clara Deploy Platform server at
runtime, as do dependent services. Examples are given in the specific operators, e.g. ai
operator.
Applications used in Clara Deploy operators must run in a Docker container, which should be tested in a development environment before being deployed. The expected volumes and environment variable need to be set simulating what will be set by the Clara Deploy platform. Dependent services also need to be started up through ways such as scripting.
The following is a sample script that builds the container image, starts the dependent service, mounts the volumes according to operator definition, sets the environment variables, and runs the container on a named network.
            
            #!/bin/bash
# Copyright (c) 2020, NVIDIA CORPORATION.  All rights reserved.
#
# NVIDIA CORPORATION and its licensors retain all intellectual property
# and proprietary rights in and to this software, related documentation
# and any modifications thereto.  Any use, reproduction, disclosure or
# distribution of this software and related documentation without an express
# license agreement from NVIDIA CORPORATION is strictly prohibited.
#
#     ---- EXAMPLE ONLY ----
#
# Define the name of the app (aka operator), assumed the same as the project folder name
APP_NAME="app_livertumor"
# Build Dockerfile
docker build -t ${APP_NAME} -f ${APP_NAME}/Dockerfile .
# Specific version of the Triton Inference Server image used in testing
TRITON_IMAGE="nvcr.io/nvidia/tritonserver:20.07-v1-py3"
# Clara Core would launch the container with the following environment variables internally,
# to provide runtime information.
export NVIDIA_CLARA_TRTISURI="localhost:8000"
# Define the model name for use when launching Triton with only the specific model
MODEL_NAME="segmentation_liver_v1"
# Create a Docker network so that containers can communicate on this network
NETWORK_NAME="container-demo"
# Create network
docker network create ${NETWORK_NAME}
# Run Triton(name: triton), maping ./sampleData/models/${MODEL_NAME} to /models/${MODEL_NAME}
# (localhost:8000 will be used)
cp -f $(pwd)/${APP_NAME}/config/config.pbtxt $(pwd)/sampleData/models/${MODEL_NAME}/
nvidia-docker run --name triton --network ${NETWORK_NAME} -d --rm --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 \
    -p 8000:8000 \
    -v $(pwd)/sampleData/models/${MODEL_NAME}:/models/${MODEL_NAME} ${TRITON_IMAGE} \
    tritonserver --model-repository=/models
# Wait until Triton is ready
triton_local_uri=$(docker inspect -f '{{range .NetworkSettings.Networks}}{{.IPAddress}}{{end}}' triton)
echo -n "Wait until Triton${triton_local_uri}is ready..."
while [ $(curl -s ${triton_local_uri}:8000/api/status | grep -c SERVER_READY) -eq 0 ]; do
    sleep 1
    echo -n "."
done
echo "done"
export NVIDIA_CLARA_TRTISURI="${triton_local_uri}:8000"
# Run ${APP_NAME} container.
# Like below, Clara Platform Server would launch the app container with the following environment variables internally,
# to provide input/output path information.
docker run --name ${APP_NAME} --network ${NETWORK_NAME} -it --rm \
    -v $(pwd)/input:/input \
    -v $(pwd)/output:/output \
    -v $(pwd)/logs:/logs \
    -v $(pwd)/publish:/publish \
    -e NVIDIA_CLARA_TRTISURI \
    -e NVIDIA_CLARA_NOSYNCLOCK=TRUE \
    ${APP_NAME}
echo "${APP_NAME}is done."
# Stop Triton container
echo "Stopping Triton"
docker stop triton > /dev/null
# Remove network
docker network remove ${NETWORK_NAME} > /dev/null
    
The Clara Deploy SDK currently does not provide an API to upload an application Docker image into its own registry. An application Docker image needs to be copied to the Clara Deploy host server, and then loaded into the host’s Docker registry.
Please see Docker documentation on how to save and load a Docker image.
For AI inference application, trained models need to be made available for the NVIDIA Triton Inference Server. For details on what models are supported, how to configure a model, and Triton model repository, please refer to official NVIDIA documentation at the following link <`https://docs.nvidia.com/deeplearning/triton-inference-server/master-user-guide//>`
After the model and configuration files are ready, they must be copied onto the Clara Deploy host server,
and saved in the shared storage folder, namely /clara/common/models, which will be mapped to the
models folder of the TRTIS container.
Once the application Docker image is available on the Clara Deploy, a pipeline can be created to utilize it. Please refer to the section on pipeline creation for details.