NVIDIA Clara Deploy SDK User Guide
1.0

11.1. Developing Clara Pipeline Operators Using the Operator Development Kit


The Operator Development Kit (ODK) contains the minimal set of assets for a user to:

  • be able to develop a Clara pipeline operator from scratch, or
  • to migrate an existing container image into a Clara pipeline operator.

As provided, the ODK is a functioning inference operator using a COVID-19 Classification model.

Download the ODK from NGC and unzip the package locally using Note: You can download the file with Guest account or using your NGC assess.

Once you have downloaded the ODK asset, then unzip the file into a directory.

Copy
Copied!
            

unzip app_operator.zip -d app_operator

The operator development package should contain the following.

Copy
Copied!
            

└─ app_operator ├─ Dockerfile ├─ main.py ├─ build.sh ├─ run-local.sh ├─ run-triton.sh ├─ requirements.txt ├─ def | └─ one-operator-pipeline.yml └─ local ├─ input | ├─ volume-covid19-A-0000.nii.gz | ├─ volume-covid19-A-0001.nii.gz | └─ volume-covid19-A-0010.nii.gz ├─ output └─ models └─ covid ├─ 1 │ └─ model.pt └─ config.pbtxt

  • Dockerfile builds the operator container. This build script can be used to build the operator container, or to package an existing custom inference container image as a Clara pipeline operator (please see Packaging existing container to run in Clara below).
  • main.py is the Python script that is executed when the operator is instantiated. The script reads NIFTI files from an input folder, applies simple transformations to each file to prepare them for inference, and uses the Triton Client library to send the inference to the COVID-19 classification model deployed to Triton, outputting the result of each inference request in the output folder.
  • build.sh is a Bash script that initiates packaging the application into a container.
  • run-local is a Bash script that initiates a local (stand-alone) containerized operator. For run-local.sh to run successfully the user must first build the operator’s container image using build.sh and run run-triton.sh to make the model available to the operator for inference.
  • run-triton.sh is a Bash script that starts a local containerized Triton Inference Server.
  • requirements.txt lists all the libraries needed for main.py to run.
  • pipelines contains the operator and pipeline (one-operator-pipeline.yaml) definitions necessary to deploy the pipeline and operator in Clara Deploy.
    Note that Clara Deploy only requires the deployment of pipelines, and will pull any necessary container images at runtime. Therefore, ensure any required containers have been pulled locally, or are available for Clara Deploy to pull at runtime. When pulling a container requires authentication (aka logon), it is recommended that these containers are pull prior to attempting to create a Clara pipeline-job which requires them.

  • local is a directory containing the model artifacts necessary to run main.py locally in the state it is distributed.
  • run-triton.sh will mount the local/models folder to the Triton Inference Server container, making the models available for inference.
  • run-local will mount the local/input folder into the operator container to allow the operator container to read *.nii.gz and send it for inference.

Before going into the development details and learning how to modify the ODK, let us first try to run the example COVID-19 inference out of the box.

To run the inference model locally make sure to install Docker CE, and NVIDIA Docker 2 to be able to serve the model on a GPU. You may follow the steps here.

To run the packaged COVID-19 inference model:

  1. Build the Clara example inference operator using .. code-block:: guess

    ./build.sh

  2. Start the Triton inference server using .. code-block:: guess

    ./run-triton.sh

  3. Run the Clara operator locally using .. code-block:: guess

    ./run-local.sh

  4. Once the inference completes check the output of the inference in the local/output directory. .. code-block:: guess

    ─ output

    ├── volume-covid19-A-0000.nii_result.txt ├── volume-covid19-A-0001.nii_result.txt └── volume-covid19-A-0010.nii_result.txt

    Within each file is either a positive or negative results for COVID-19. .. code-block:: guess

    volume-covid19-A-0010.nii_result.txt –> COVID Negative volume-covid19-A-0000.nii_result.txt –> COVID Positive volume-covid19-A-0001.nii_result.txt –> COVID Negative

    ## Customizing the Operator Development Kit Example Code

The ODK can be used as a starting point for the development of custom Clara operators or the migration of existing container images into Clara operators. This section explains the focal components of the ODK when building a custom Clara operator (with inference).

11.1.3.1. Clara Operator-specific Variables in Python

main.py is the functional part of the code the developer may customize for their own purposes. As provided main.py is an example of decorating the code with the necessary components. Because Clara will expect certain components in the pipeline defintion be sure to take care of these key components in development

  • Enviornment variables
  • Inputs
  • Pre-transformations
  • Infererence
  • Post-transformations
  • Outputs

The following describes the components used in the operator development kit sample application.

  1. The input component reads the input data from the input payload which in this case is assumed to be an input directory path defined by OPERATOR_INPUT_PATH with one or more files, .. code-block:: guess

    input_path = os.getenv(‘OPERATOR_INPUT_PATH’, ‘/input’)

  2. The pre-transformation component performs pre-inference transformations to the input data. This component is optional and model dependent. .. code-block:: guess

    pre_transforms = Compose([

    LoadImage(reader=”NibabelReader”, image_only=True, dtype=np.float32), AddChannel(), ScaleIntensityRange(a_min=-1000, a_max=500, b_min=0.0, b_max=1.0, clip=True), CropForeground(margin=5), Resize([192, 192, 64], mode=”area”),


    ])

  3. The inference uses the Triton Python client to perform inference via the model deployed in Triton Inference Server (see Section “Deploy the Model” below), .. code-block:: guess

    inference_ctx = initialize_inference_server() inference_response = inference_ctx.run({ model_input_label : (inference_image,) },

    { model_output_label: InferContext.ResultFormat.RAW }, batch_size = 1)

  4. Post-transformation component performs post-inference transformations to the inference result, .. code-block:: guess

    post_transforms = Compose([

    ToTensor(), Activations(sigmoid=True), AsDiscrete(threshold_values=True, logit_thresh=0.5), ToNumpy(),

    ])

  5. Output component finally writes the output data to the path defined by OPERATOR_OUTPUT_PATH. .. code-block:: guess

    output_path = os.getenv(‘OPERATOR_OUTPUT_PATH’, ‘/output’)

    The full code set can be found in main.py

This section assumes that you have installed a Clara cluster which has access to the container image built using build.sh (by default the image is tagged clara/simple-operator:0.1.0). To install or upgrade Clara Deploy SDK, follow the steps outlined on the Installation page of the Clara Deploy User Guide.

11.1.4.1. Deploy the Model

Before deploying the operator, one must ensure that the inference model is deployed in Clara (if an inference model is used). In the case of the example COVID-19 classification model, the user must copy the contents under local/models to the Clara model repository directory, by default /clara/common/models, resulting in a directory structure such as

Copy
Copied!
            

clara └─ common └─ models └─ covid ├─ 1 │ └─ model.pt └─ config.pbtxt

The ODK contains a PyTorch model, however, the user may deploy ONNX, Tensorflow Graphdef, Tensoflow SavedMode, TensorRT, and other formats supported by the Triton Inference Server, with an accompanying config.pbtxt for the model.

11.1.4.2. Deploy the Operator

To deploy the singleton-operator example pipeline provided start in the top level ODK folder.

Copy
Copied!
            

cd pipelines clara create pipelines -p one-operator-pipeline.yml

clara create will output a pipeline id, <pipeline-id>, if deployed successfully.

Before trying to run a job from the newly deployed pipeline, the user must ensure that the container image built using build.sh is reachable from the Kubernetes cluster on which Clara is running. For instance, if the container image is my_inference_container:0.1 either ensure

  • that this image is present in the container repository local to Clara (i.e. in the same machine), or
  • available in a public container repository reachable by the Kubernetes cluster where Clara is deployed (e.g. DockerHub).

To run the pipeline with the given data we must first create a job

Copy
Copied!
            

cd .. clara create job -p <pipeline-id> -n covid-class -f local/input

The output will be a job id <job-id>.

It is important to provide the input option with -f here as this takes the local input data and moves (uploads) it to the expected Clara operator location for input data.

Copy
Copied!
            

JOB_ID: <job-id> PAYLOAD_ID: <payload-id> Payload uploaded successfully.

To run the created job use

Copy
Copied!
            

clara start job -j <job-id>

To check for a successful job use

Copy
Copied!
            

clara list jobs

Look for the <job-id> and corresponding <payload-id> for the job.

Inspect the outputs using the <job-id> and corresponding <payload-id>in the corresponding <payload-id> folder which by default Clara always stores in the defined payloads folder.

Copy
Copied!
            

cd /clara/payloads/<payload-id>

The output of the classification results are stored as text correxponding with each of the inputs as shown below.

Copy
Copied!
            

├── input │ ├── volume-covid19-A-0000.nii.gz │ ├── volume-covid19-A-0001.nii.gz │ └── volume-covid19-A-0010.nii.gz ├── NVIDIA │ └── Clara └── operators └── simple-operator └── classification-output ├── volume-covid19-A-0000.nii_result.txt ├── volume-covid19-A-0001.nii_result.txt └── volume-covid19-A-0010.nii_result.txt


To package a custom inference container as a Clara operator you may update the Dockerfile and build.sh script. Packaging your own code to run in a Clara operator requires you to define and install environment requirements, an ENTRYPOINT, and any optional dependencies for your inference application.

For example, the Dockerfile to package an existing container may look something like

Copy
Copied!
            

FROM my_inference_container:0.1.0 ... # install Clara-compatible Triton client ARG TRITON_CLIENTS_URL=https://github.com/triton-inference-server/server/releases/download/v2.11.0/v2.11.0_ubuntu2004.clients.tar.gz RUN mkdir -p /opt/nvidia/triton-clients \ && curl -L ${TRITON_CLIENTS_URL} | tar xvz -C /opt/nvidia/triton-clients RUN pip install --no-cache-dir future==0.18.2 grpcio==1.27.2 protobuf==3.11.3 /opt/nvidia/triton-clients/python/*.whl # install operator requirements RUN pip install --upgrade setuptools &&\ pip install -r requirements.txt ENTRYPOINT ["bash", "-c", "python -u my_script.py"]

Note that, you must include the Triton client installation steps in the script if you want to use Triton for inference. This step can be omitted if the inference model is packaged in your container.

Similary, use the build.sh script to package the container. Simply pass the fully qualified name and tag of the desired container to the script. ./build.sh custom_inference_container:1.0.0

note: By defulat the script will create an operator named clara/simple-operator:1.0.0

Once the container is successfuly created, then go back to Deploy The Operator and follow the same steps using your custom container.

© Copyright 2018-2020, NVIDIA Corporation. All rights reserved. Last updated on Jun 28, 2023.