11.2. Developing Clara Operators Using the Operator Development Kit
The operator development kit (ODK) contains the minimal set of assets for a user to:
- be able to develop a Clara operator from scratch, or
- to migrate an existing container image into a Clara operator.
As provided, the operator development kit is a functioning inference operator using the liver segmentation model from the Clara reference applications, but instead of following the Clara Train development model it uses a more simplistic development approach that is applicable to users that do not use Clara Train to train their models.
Download the ODK from NGC and unzip the package locally using
unzip app_operator.zip -d app_operator
The operator development package should contain the following.
└─ app_operator
├─ Dockerfile
├─ main.py
├─ build.sh
├─ run-local.sh
├─ run-triton.sh
├─ requirements.txt
├─ env.list
├─ def
| └─ one-operator-pipeline.yml
└─ local
├─ input
| └─ liver_14.nii.gz
├─ output
└─ models
└─ segmentation_liver_v1
├─ 1
│ └─ model.graphdef
└─ config.pbtxt
Dockerfilebuilds the operator container. This build script can be used to build the operator container, or to package an existing custom inference container image as a Clara operator (please see Packaging Existing Container below).main.pyis the Python script that is executed when the operator is instantiated. The script reads NIFTI files from an input folder, applies simple transformations to each file to prepare them for inference, and uses the Triton client library to send the inference to the liver segmentation model deployed in Triton, outputting the result of each Triton inference in the output folder.build.shis a Bash script that initiates the docker build.run-localis a Bash script that initiates a local (stand-alone) containerized operator. Forrun-local.shto run successfully the user must first build the operator container image usingbuild.shand runrun-triton.shto make the model available to the operator for inference.run-triton.shis a Bash script that initiates a local containerized Triton server.requirements.txtlists all the libraries needed formain.pyto run.env.listlists all the environment variable names required to run the Clara operator locally (bare-metal or containerized) for development and debugging purposes.defcontains the operator and pipeline (one-operator-pipeline.yaml) definitions necessary to deploy the operator in Clara. Note that Clara will accept only the deployment of pipelines, not operators alone, therefore ensure that the operator is available to a container repository reachable by Clara at runtime.localis a directory containing the model artifacts necessary to runmain.pylocally in the state it is distributed.run-triton.shwill load thelocal/modelsfolder into the Triton container to make it available for inference.run-localwill load thelocal/inputfolder into the operator container to allow the operator container to readliver_14.nii.gzand send it for inference.
Before going into the development details and learning how to modify the ODK, let us first try to run the example liver inference out of the box.
To run the inference model locally make sure to install Docker CE, and NVIDIA Docker 2 to be able to serve the model on a GPU. You may follow the steps here.
To run the packaged liver inference model:
Build the Clara example inference operator using .. code-block:: guess
./build.sh
Start the Triton inference server using .. code-block:: guess
./run-triton.sh
Run the Clara operator locally using .. code-block:: guess
./run-local.sh
Once the inference completes check the output of the inference in the
local/outputdirectory.
The ODK can be used as a starting point for the development of custom Clara operators or the migration of existing container images into Clara operators. This section explains the focal components of the ODK when to build a custom Clara operator (with inference).
11.2.4.1. Clara Operator-specific Variables in Python
main.py is the functional part of the code the developer may customize for their own purposes. As provided main.py
- reads the input data from the input payload which in this case is assumed to be an input directory path with one or more files,
- performs pre-inference transformations to the input data,
- uses the Triton Python client to perform inference via the model deployed in Triton Inference Server (see Section “Deploy the Model” below),
- performs post-inference transformations to the inference result,
- and finally writes the output data to the output payload.
Some of the above steps make use of Clara-specific objects, namely clara_payload which is a global variable introduced by the Clara Pipeline Driver Python library at runtime. While the operator developer need not use this variable, they choose to use it in cases where they expect the operator to be reused in the pipeline multiple times but executing different branches of the code.
Specifically, clara_payload has two properties
input_entrieswhich lists all the inputs to this operator,output_entrieswhich lists all the output from this operator.
One may iterate over clara_payload.input_entries (or conversely clara_payload.output_entries) and check the name and path properties of the entry object. In the case of an input entry
namewill be of the formupstream-operatorK-name/outputN-name- which implies that this input is coming from an operator named
upstream-operatorK-name - from that operator’s output named
outputN-name,
- which implies that this input is coming from an operator named
pathwill contain the input path of the entryupstream-operatorK-name/outputN-name. __: If the name of an input entry ispayload
Similarly, for clara_payload.output_entries, one may iterate over them and check the
namewhich is of the formmy-operator-name/my-outputM-name,pathwhich is the output folder of entrymy-operator-name/my-outputM-name.
Let’s look at the example below to find out how the name and path properties in the clara_payload entries object relate to the pipeline definition. Consider the pipeline definition:
api-version: 0.4.0
name: example-pipeline
operators:
- name: operator1
container:
image: clara/simple-operator
tag: 0.1.0
input:
- path: /pipeline_input
output:
- name: segmentation-output
path: /output
- name: operator2
container:
image: clara/another-simple-operator
tag: 0.1.0
input:
- path: /input
from: operator1
name: segmentation-output
output:
- name: final-output
path: /output
Inside the code of operator1 we would find:
clara_payload.input_entrieshas only one entry wherenameispayloadimplying that this operator accepts the input to the pipeline as input, andpathis/pipeline_inputas specified in the pipeline definition;
clara_payload.output_entrieshas only one entry wherenameisoperator1/segmentation-outputandpathis/output.
In operator2 we would find:
clara_payload.input_entries has only one entry where
nameisoperator1/segmentation-outputimplying that this operator takes as inputsegmentation-outputofoperator1, andpathis/input;clara_payload.output_entrieshas only one entry where
nameisoperator2/final-outputandpathis/output.
11.2.4.2. Configuring Local Environment to Reproduce Operator Dependencies Expected in Deployment
The developer may update env.list to develop, test, and debug the operator in their local environment (that is, without a Clara deployment or pipeline). Assuming the developer is given the expected inputs and outputs of the operator when deployed in the pipeline, the developer may update env.list to reflect the expected inputs and outputs. Specifically, the developer may set:
NVIDIA_CLARA_INPUTPATHSto reflect the input paths to this operator,NVIDIA_CLARA_OUTPUTPATHSto reflect the output path from this operator.
As an example, if I expect two inputs
- one is the input to the pipeline,
- one is an input from the output
upstream-outputof an upstream operator namedupstream-op,
then I set
NVIDIA_CLARA_INPUTPATHS=payload:/input;upstream-op/upstream-output:/input1
Similarly, if I want my operator to output to two different outputs say output-segmentation with path /output1, and output-original with path /output2 then I set
NVIDIA_CLARA_OUTPUTPATHS=my-operator/output-segmentation:/output1;my-operator/output-original:/output2
Note that here, while your operator’s pipeline-specific name my-operator must be included, you may choose to ignore it programmatically in main.py (e.g. by stripping the characters my-operator/) if you expect this name to change in the pipeline(s) where this operator will be deployed.
Additionally, the developer may update
TRITON_MODEL_NAMETRITON_MODEL_VERSIONTRITON_MODEL_INPUTTRITON_MODEL_OUTPUTif they need the operator to perform inference via Triton on a locally available model. The variables set inenv.listby default make use of the provided liver segmentation model.
The developer may then run the operator locally by following the same steps as above.
Some developers may have already developed their code and packaged them in containers, or alternatively some developer may only be provided with an inference container image. In such cases, the ODK allows the developer to extend the existing container image to a pipeline-deployable Clara operator by updating the Dockerfile. For instance, the Dockerfile required to build a Clara operator from an existing container may look something like
FROM my_inference_container:0.1.0
...
# install Clara-compatible Triton client
ARG TRITON_CLIENTS_URL=https://github.com/NVIDIA/tensorrt-inference-server/releases/download/v1.5.0/v1.5.0_ubuntu1804.clients.tar.gz
RUN mkdir -p /opt/nvidia/trtis-clients \
&& curl -L ${TRITON_CLIENTS_URL} | tar xvz -C /opt/nvidia/triton-clients
RUN pip install --no-cache-dir future==0.18.2 grpcio==1.27.2 protobuf==3.11.3 /opt/nvidia/triton-clients/python/*.whl
# install operator requirements
RUN pip install --upgrade setuptools &&\
pip install -r requirements.txt
ENTRYPOINT ["bash", "-c", "python -u -m nvidia_clara_pipeline_driver.app my_script.py"]
Note that:
You must include the Triton client installation steps in the script if you want to use Triton for inference. This step can be omitted if the inference model is packaged in your container.
You must always install the
whlfile in thelib/directory as it contains the wrapper code necessary for the Clara operator to function correctly. In the above examplerequirements.txtincludes the Clara pipeline driver (CPDriver) Python library.The
ENTRYPOINTto your container must use thenvidia_clara_pipeline_driver.appPython module for the operator to function correctly. Specifically, if your main script ismy_script.pythen your container entry-point should be .. code-block:: guesspython -u -m nvidia_clara_pipeline_driver.app my_script.py
This section assumes that you have installed a Clara cluster which has access to the container image built using build.sh (by default the image is tagged clara/simple-operator:0.1.0). To install or upgrade Clara Deploy SDK, follow the steps outlined on the Installation page of the Clara Deploy User Guide.
11.2.6.1. Deploy the Model
Before deploying the operator, one must ensure that the inference model is deployed in Clara (if an inference model is used). In the case of the example liver segmentation model, the user must copy the contents under local/models to the Clara model repository directory, by default /clara/common/models, resulting in a directory structure such as
clara
└─ common
└─ models
└─ segmentation_liver_v1
├─ 1
│ └─ model.graphdef
└─ config.pbtxt
The ODK contains a TensorFlow GraphDef model, however, the user may deploy ONNX, Pyt=Torch, and other formats supported by the Triton Inference Server, with an accompanying config.pbtxt specifying the model’s deployment settings, and inputs and outputs.
11.2.6.2. Deploy the Operator
To deploy the singleton-operator example pipeline provided
cd def
clara create pipelines -p one-operator-pipeline.yml
clara create will output a pipeline id, PIPELINE_ID, if deployed successfully.
Before trying to run a job from the newly deployed pipeline, the user must ensure that the container image built using build.sh is reachable from the Kubernetes cluster on which Clara is running. For instance, if the container image is my_inference_container:0.1 either ensure
- that this image is present in the container repository local to Clara (i.e. in the same machine), or
- available in a public container repository reachable by the Kubernetes cluster where Clara is deployed (e.g. DockerHub).
To run the pipeline with the given data we must first create a job
cd ..
clara create job -p PIPELINE_ID -n liver-job -f local/input
the command will output a job id JOB_ID.
To run the created job use
clara start job -j JOB_ID