10.2. Developing Clara Operators Using the Operator Development Kit

10.2.1. Overview of the Operator Development Kit

The operator development kit (ODK) contains the minimal set of assets for a user to:

  • be able to develop a Clara operator from scratch, or

  • to migrate an existing container image into a Clara operator.

As provided, the operator development kit is a functioning inference operator using the liver segmentation model from the Clara reference applications, but instead of following the Clara Train development model it uses a more simplistic development approach that is applicable to users that do not use Clara Train to train their models.

10.2.2. Package Contents

Download the ODK from NGC and unzip the package locally using

unzip app_operator.zip -d app_operator

The operator development package should contain the following.

└─ app_operator
   ├─ Dockerfile
   ├─ main.py
   ├─ build.sh
   ├─ run-local.sh
   ├─ run-triton.sh
   ├─ requirements.txt
   ├─ env.list
   ├─ def
   |  └─ one-operator-pipeline.yml
   ├─ lib
   |  └─ clara-0.1-py3-none-any.whl
   └─ local
      ├─ input
      |  └─ liver_14.nii.gz
      ├─ output
      └─ models
         └─ segmentation_liver_v1
            ├─ 1
            │  └─ model.graphdef
            └─ config.pbtxt
  • Dockerfile builds the operator container. This build script can be used to build the operator container, or to package an existing custom inference container image as a Clara operator (please see Packaging Existing Container below).

  • main.py is the Python script that is executed when the operator is instantiated. The script reads NIFTI files from an input folder, applies simple transformations to each file to prepare them for inference, and uses the Triton client library to send the inference to the liver segmentation model deployed in Triton, outputting the result of each Triton inference in the output folder.

  • build.sh is a Bash script that initiates the docker build.

  • run-local is a Bash script that initiates a local (stand-alone) containerized operator. For run-local.sh to run successfully the user must first build the operator container image using build.sh and run run-triton.sh to make the model available to the operator for inference.

  • run-triton.sh is a Bash script that initiates a local containerized Triton server.

  • requirements.txt lists all the libraries needed for main.py to run.

  • env.list lists all the environment variable names required to run the Clara operator locally (bare-metal or containerized) for development and debugging purposes.

  • def contains the operator and pipeline (one-operator-pipeline.yaml) definitions necessary to deploy the operator in Clara. Note that Clara will accept only the deployment of pipelines, not operators alone, therefore ensure that the operator is available to a container repository reachable by Clara at runtime.

  • local is a directory containing the model artifacts necessary to run main.py locally in the state it is distributed.

    • run-triton.sh will load the local/models folder into the Triton container to make it available for inference.

    • run-local will load the local/input folder into the operator container to allow the operator container to read liver_14.nii.gz and send it for inference.

  • lib contains the Clara operator Python client library. This library must always be installed for a Clara operator to function correctly.

10.2.3. Run the Example Liver Segmentation Model Locally

Before going into the development details and learning how to modify the ODK, let us first try to run the example liver inference out of the box.

To run the inference model locally make sure to install Docker CE, and NVIDIA Docker 2 to be able to serve the model on a GPU. You may follow the steps here.

To run the packaged liver inference model:

  1. Build the Clara example inference operator using

    ./build.sh
    
  2. Start the Triton inference server using

    ./run-triton.sh
    
  3. Run the Clara operator locally using

    ./run-local.sh
    
  4. Once the inference completes check the output of the inference in the local/output directory.

10.2.4. Customizing the Operator Development Kit Example Code

The ODK can be used as a starting point for the development of custom Clara operators or the migration of existing container images into Clara operators. This section explains the focal components of the ODK when to build a custom Clara operator (with inference).

10.2.4.1. Clara Operator-specific Variables in Python

main.py is the functional part of the code the developer may customize for their own purposes. As provided main.py

  1. reads the input data from the input payload which in this case is assumed to be an input directory path with one or more files,

  2. performs pre-inference transformations to the input data,

  3. uses the Triton Python client to perform inference via the model deployed in Triton Inference Server (see Section “Deploy the Model” below),

  4. performs post-inference transformations to the inference result,

  5. and finally writes the output data to the output payload.

Some of the above steps make use of Clara-specific objects, namely clara_payload which is a global variable introduced by the Clara Pipeline Driver Python library at runtime. While the operator developer need not use this variable, they choose to use it in cases where they expect the operator to be reused in the pipeline multiple times but executing different branches of the code.

Specifically, clara_payload has two properties

  • input_entries which lists all the inputs to this operator,

  • output_entries which lists all the output from this operator.

One may iterate over clara_payload.input_entries (or conversely clara_payload.output_entries) and check the name and path properties of the entry object. In the case of an input entry

  • name will be of the form upstream-operatorK-name/outputN-name

    • which implies that this input is coming from an operator named upstream-operatorK-name

    • from that operator’s output named outputN-name,

  • path will contain the input path of the entry upstream-operatorK-name/outputN-name. __: If the name of an input entry is payload

Similarly, for clara_payload.output_entries, one may iterate over them and check the

  • name which is of the form my-operator-name/my-outputM-name,

  • path which is the output folder of entry my-operator-name/my-outputM-name.

Let’s look at the example below to find out how the name and path properties in the clara_payload entries object relate to the pipeline definition. Consider the pipeline definition:

api-version: 0.4.0
name: example-pipeline
operators:
  - name: operator1
    container:
      image: clara/simple-operator
      tag: 0.1.0
    input:
    - path: /pipeline_input
    output:
    - name: segmentation-output
      path: /output
  - name: operator2
    container:
      image: clara/another-simple-operator
      tag: 0.1.0
    input:
    - path: /input
      from: operator1
      name: segmentation-output
    output:
    - name: final-output
      path: /output

Inside the code of operator1 we would find:

  • clara_payload.input_entries has only one entry where

    • name is payload implying that this operator accepts the input to the pipeline as input, and

    • path is /pipeline_input as specified in the pipeline definition;

  • clara_payload.output_entries has only one entry where

    • name is operator1/segmentation-output and

    • path is /output.

In operator2 we would find: clara_payload.input_entries has only one entry where

  • name is operator1/segmentation-output implying that this operator takes as input segmentation-output of operator1, and

  • path is /input;

    • clara_payload.output_entries has only one entry where

  • name is operator2/final-output and

  • path is /output.

10.2.4.2. Configuring Local Environment to Reproduce Operator Dependencies Expected in Deployment

The developer may update env.list to develop, test, and debug the operator in their local environment (that is, without a Clara deployment or pipeline). Assuming the developer is given the expected inputs and outputs of the operator when deployed in the pipeline, the developer may update env.list to reflect the expected inputs and outputs. Specifically, the developer may set:

  • NVIDIA_CLARA_INPUTPATHS to reflect the input paths to this operator,

  • NVIDIA_CLARA_OUTPUTPATHS to reflect the output path from this operator.

As an example, if I expect two inputs

  • one is the input to the pipeline,

  • one is an input from the output upstream-output of an upstream operator named upstream-op,

then I set

NVIDIA_CLARA_INPUTPATHS=payload:/input;upstream-op/upstream-output:/input1

Similarly, if I want my operator to output to two different outputs say output-segmentation with path /output1, and output-original with path /output2 then I set

NVIDIA_CLARA_OUTPUTPATHS=my-operator/output-segmentation:/output1;my-operator/output-original:/output2

Note that here, while your operator’s pipeline-specific name my-operator must be included, you may choose to ignore it programmatically in main.py (e.g. by stripping the characters my-operator/) if you expect this name to change in the pipeline(s) where this operator will be deployed.

Additionally, the developer may update

  • TRITON_MODEL_NAME

  • TRITON_MODEL_VERSION

  • TRITON_MODEL_INPUT

  • TRITON_MODEL_OUTPUT if they need the operator to perform inference via Triton on a locally available model. The variables set in env.list by default make use of the provided liver segmentation model.

The developer may then run the operator locally by following the same steps as above.

10.2.5. Deploy an Existing Container as a Clara Operator

Some developers may have already developed their code and packaged them in containers, or alternatively some developer may only be provided with an inference container image. In such cases, the ODK allows the developer to extend the existing container image to a pipeline-deployable Clara operator by updating the Dockerfile. For instance, the Dockerfile required to build a Clara operator from an existing container may look something like

FROM my_inference_container:0.1.0

...

# install Clara-compatible Triton client
ARG TRITON_CLIENTS_URL=https://github.com/NVIDIA/tensorrt-inference-server/releases/download/v1.5.0/v1.5.0_ubuntu1804.clients.tar.gz
RUN mkdir -p /opt/nvidia/trtis-clients \
    && curl -L ${TRITON_CLIENTS_URL} | tar xvz -C /opt/nvidia/triton-clients
RUN pip install --no-cache-dir future==0.18.2 grpcio==1.27.2 protobuf==3.11.3 /opt/nvidia/triton-clients/python/*.whl

# install operator requirements
RUN pip install --upgrade setuptools &&\
    pip install -r requirements.txt

ENTRYPOINT ["bash", "-c", "python -u -m clara.app my_script.py"]

Note that:

  1. You must include the Triton client installation steps in the script if you want to use Triton for inference. This step can be omitted if the inference model is packaged in your container.

  2. You must always install the whl file in the lib/ directory as it contains the wrapper code necessary for the Clara operator to function correctly. In the above example requirements.txt includes the Clara pipeline driver (CPDriver) Python library.

  3. The ENTRYPOINT to your container must use the clara.app Python module for the operator to function correctly. Specifically, if your main script is my_script.py then your container entry-point should be

    python -u -m clara.app my_script.py
    

10.2.6. Deploying a Single-Operator Pipeline to Existing Clara Cluster

This section assumes that you have installed a Clara cluster which has access to the container image built using build.sh (by default the image is tagged clara/simple-operator:0.1.0). To install or upgrade Clara Deploy SDK, follow the steps outlined on the Installation page of the Clara Deploy User Guide.

10.2.6.1. Deploy the Model

Before deploying the operator, one must ensure that the inference model is deployed in Clara (if an inference model is used). In the case of the example liver segmentation model, the user must copy the contents under local/models to the Clara model repository directory, by default /clara/common/models, resulting in a directory structure such as

clara
  └─ common
        └─ models
           └─ segmentation_liver_v1
              ├─ 1
              │  └─ model.graphdef
              └─ config.pbtxt

The ODK contains a TensorFlow GraphDef model, however, the user may deploy ONNX, Pyt=Torch, and other formats supported by the Triton Inference Server, with an accompanying config.pbtxt specifying the model’s deployment settings, and inputs and outputs.

10.2.6.2. Deploy the Operator

To deploy the singleton-operator example pipeline provided

cd def
clara create pipelines -p one-operator-pipeline.yml

clara create will output a pipeline id, PIPELINE_ID, if deployed successfully.

Before trying to run a job from the newly deployed pipeline, the user must ensure that the container image built using build.sh is reachable from the Kubernetes cluster on which Clara is running. For instance, if the container image is my_inference_container:0.1 either ensure

  • that this image is present in the container repository local to Clara (i.e. in the same machine), or

  • available in a public container repository reachable by the Kubernetes cluster where Clara is deployed (e.g. DockerHub).

To run the pipeline with the given data we must first create a job

cd ..
clara create job -p PIPELINE_ID -n liver-job -f local/input

the command will output a job id JOB_ID.

To run the created job use

clara start job -j JOB_ID