11.2. Developing Clara Operators Using the Operator Development Kit
The operator development kit (ODK) contains the minimal set of assets for a user to:
- be able to develop a Clara operator from scratch, or
- to migrate an existing container image into a Clara operator.
As provided, the operator development kit is a functioning inference operator using the liver segmentation model from the Clara reference applications, but instead of following the Clara Train development model it uses a more simplistic development approach that is applicable to users that do not use Clara Train to train their models.
Download the ODK from NGC and unzip the package locally using
unzip app_operator.zip -d app_operator
The operator development package should contain the following.
└─ app_operator
├─ Dockerfile
├─ main.py
├─ build.sh
├─ run-local.sh
├─ run-triton.sh
├─ requirements.txt
├─ env.list
├─ def
| └─ one-operator-pipeline.yml
└─ local
├─ input
| └─ liver_14.nii.gz
├─ output
└─ models
└─ segmentation_liver_v1
├─ 1
│ └─ model.graphdef
└─ config.pbtxt
Dockerfile
builds the operator container. This build script can be used to build the operator container, or to package an existing custom inference container image as a Clara operator (please see Packaging Existing Container below).main.py
is the Python script that is executed when the operator is instantiated. The script reads NIFTI files from an input folder, applies simple transformations to each file to prepare them for inference, and uses the Triton client library to send the inference to the liver segmentation model deployed in Triton, outputting the result of each Triton inference in the output folder.build.sh
is a Bash script that initiates the docker build.run-local
is a Bash script that initiates a local (stand-alone) containerized operator. Forrun-local.sh
to run successfully the user must first build the operator container image usingbuild.sh
and runrun-triton.sh
to make the model available to the operator for inference.run-triton.sh
is a Bash script that initiates a local containerized Triton server.requirements.txt
lists all the libraries needed formain.py
to run.env.list
lists all the environment variable names required to run the Clara operator locally (bare-metal or containerized) for development and debugging purposes.def
contains the operator and pipeline (one-operator-pipeline.yaml
) definitions necessary to deploy the operator in Clara. Note that Clara will accept only the deployment of pipelines, not operators alone, therefore ensure that the operator is available to a container repository reachable by Clara at runtime.local
is a directory containing the model artifacts necessary to runmain.py
locally in the state it is distributed.run-triton.sh
will load thelocal/models
folder into the Triton container to make it available for inference.run-local
will load thelocal/input
folder into the operator container to allow the operator container to readliver_14.nii.gz
and send it for inference.
Before going into the development details and learning how to modify the ODK, let us first try to run the example liver inference out of the box.
To run the inference model locally make sure to install Docker CE, and NVIDIA Docker 2 to be able to serve the model on a GPU. You may follow the steps here.
To run the packaged liver inference model:
Build the Clara example inference operator using .. code-block:: guess
./build.sh
Start the Triton inference server using .. code-block:: guess
./run-triton.sh
Run the Clara operator locally using .. code-block:: guess
./run-local.sh
Once the inference completes check the output of the inference in the
local/output
directory.
The ODK can be used as a starting point for the development of custom Clara operators or the migration of existing container images into Clara operators. This section explains the focal components of the ODK when to build a custom Clara operator (with inference).
11.2.4.1. Clara Operator-specific Variables in Python
main.py
is the functional part of the code the developer may customize for their own purposes. As provided main.py
- reads the input data from the input payload which in this case is assumed to be an input directory path with one or more files,
- performs pre-inference transformations to the input data,
- uses the Triton Python client to perform inference via the model deployed in Triton Inference Server (see Section “Deploy the Model” below),
- performs post-inference transformations to the inference result,
- and finally writes the output data to the output payload.
Some of the above steps make use of Clara-specific objects, namely clara_payload
which is a global variable introduced by the Clara Pipeline Driver Python library at runtime. While the operator developer need not use this variable, they choose to use it in cases where they expect the operator to be reused in the pipeline multiple times but executing different branches of the code.
Specifically, clara_payload
has two properties
input_entries
which lists all the inputs to this operator,output_entries
which lists all the output from this operator.
One may iterate over clara_payload.input_entries
(or conversely clara_payload.output_entries
) and check the name
and path
properties of the entry object. In the case of an input entry
name
will be of the formupstream-operatorK-name/outputN-name
- which implies that this input is coming from an operator named
upstream-operatorK-name
- from that operator’s output named
outputN-name
,
- which implies that this input is coming from an operator named
path
will contain the input path of the entryupstream-operatorK-name/outputN-name
. __: If the name of an input entry ispayload
Similarly, for clara_payload.output_entries
, one may iterate over them and check the
name
which is of the formmy-operator-name/my-outputM-name
,path
which is the output folder of entrymy-operator-name/my-outputM-name
.
Let’s look at the example below to find out how the name
and path
properties in the clara_payload
entries object relate to the pipeline definition. Consider the pipeline definition:
api-version: 0.4.0
name: example-pipeline
operators:
- name: operator1
container:
image: clara/simple-operator
tag: 0.1.0
input:
- path: /pipeline_input
output:
- name: segmentation-output
path: /output
- name: operator2
container:
image: clara/another-simple-operator
tag: 0.1.0
input:
- path: /input
from: operator1
name: segmentation-output
output:
- name: final-output
path: /output
Inside the code of operator1
we would find:
clara_payload.input_entries
has only one entry wherename
ispayload
implying that this operator accepts the input to the pipeline as input, andpath
is/pipeline_input
as specified in the pipeline definition;
clara_payload.output_entries
has only one entry wherename
isoperator1/segmentation-output
andpath
is/output
.
In operator2
we would find:
clara_payload.input_entries
has only one entry where
name
isoperator1/segmentation-output
implying that this operator takes as inputsegmentation-output
ofoperator1
, andpath
is/input
;clara_payload.output_entries
has only one entry where
name
isoperator2/final-output
andpath
is/output
.
11.2.4.2. Configuring Local Environment to Reproduce Operator Dependencies Expected in Deployment
The developer may update env.list
to develop, test, and debug the operator in their local environment (that is, without a Clara deployment or pipeline). Assuming the developer is given the expected inputs and outputs of the operator when deployed in the pipeline, the developer may update env.list
to reflect the expected inputs and outputs. Specifically, the developer may set:
NVIDIA_CLARA_INPUTPATHS
to reflect the input paths to this operator,NVIDIA_CLARA_OUTPUTPATHS
to reflect the output path from this operator.
As an example, if I expect two inputs
- one is the input to the pipeline,
- one is an input from the output
upstream-output
of an upstream operator namedupstream-op
,
then I set
NVIDIA_CLARA_INPUTPATHS=payload:/input;upstream-op/upstream-output:/input1
Similarly, if I want my operator to output to two different outputs say output-segmentation
with path /output1
, and output-original
with path /output2
then I set
NVIDIA_CLARA_OUTPUTPATHS=my-operator/output-segmentation:/output1;my-operator/output-original:/output2
Note that here, while your operator’s pipeline-specific name my-operator
must be included, you may choose to ignore it programmatically in main.py
(e.g. by stripping the characters my-operator/
) if you expect this name to change in the pipeline(s) where this operator will be deployed.
Additionally, the developer may update
TRITON_MODEL_NAME
TRITON_MODEL_VERSION
TRITON_MODEL_INPUT
TRITON_MODEL_OUTPUT
if they need the operator to perform inference via Triton on a locally available model. The variables set inenv.list
by default make use of the provided liver segmentation model.
The developer may then run the operator locally by following the same steps as above.
Some developers may have already developed their code and packaged them in containers, or alternatively some developer may only be provided with an inference container image. In such cases, the ODK allows the developer to extend the existing container image to a pipeline-deployable Clara operator by updating the Dockerfile
. For instance, the Dockerfile
required to build a Clara operator from an existing container may look something like
FROM my_inference_container:0.1.0
...
# install Clara-compatible Triton client
ARG TRITON_CLIENTS_URL=https://github.com/NVIDIA/tensorrt-inference-server/releases/download/v1.5.0/v1.5.0_ubuntu1804.clients.tar.gz
RUN mkdir -p /opt/nvidia/trtis-clients \
&& curl -L ${TRITON_CLIENTS_URL} | tar xvz -C /opt/nvidia/triton-clients
RUN pip install --no-cache-dir future==0.18.2 grpcio==1.27.2 protobuf==3.11.3 /opt/nvidia/triton-clients/python/*.whl
# install operator requirements
RUN pip install --upgrade setuptools &&\
pip install -r requirements.txt
ENTRYPOINT ["bash", "-c", "python -u -m nvidia_clara_pipeline_driver.app my_script.py"]
Note that:
You must include the Triton client installation steps in the script if you want to use Triton for inference. This step can be omitted if the inference model is packaged in your container.
You must always install the
whl
file in thelib/
directory as it contains the wrapper code necessary for the Clara operator to function correctly. In the above examplerequirements.txt
includes the Clara pipeline driver (CPDriver) Python library.The
ENTRYPOINT
to your container must use thenvidia_clara_pipeline_driver.app
Python module for the operator to function correctly. Specifically, if your main script ismy_script.py
then your container entry-point should be .. code-block:: guesspython -u -m nvidia_clara_pipeline_driver.app my_script.py
This section assumes that you have installed a Clara cluster which has access to the container image built using build.sh
(by default the image is tagged clara/simple-operator:0.1.0
). To install or upgrade Clara Deploy SDK, follow the steps outlined on the Installation page of the Clara Deploy User Guide.
11.2.6.1. Deploy the Model
Before deploying the operator, one must ensure that the inference model is deployed in Clara (if an inference model is used). In the case of the example liver segmentation model, the user must copy the contents under local/models
to the Clara model repository directory, by default /clara/common/models
, resulting in a directory structure such as
clara
└─ common
└─ models
└─ segmentation_liver_v1
├─ 1
│ └─ model.graphdef
└─ config.pbtxt
The ODK contains a TensorFlow GraphDef model, however, the user may deploy ONNX, Pyt=Torch, and other formats supported by the Triton Inference Server, with an accompanying config.pbtxt
specifying the model’s deployment settings, and inputs and outputs.
11.2.6.2. Deploy the Operator
To deploy the singleton-operator example pipeline provided
cd def
clara create pipelines -p one-operator-pipeline.yml
clara create
will output a pipeline id, PIPELINE_ID
, if deployed successfully.
Before trying to run a job from the newly deployed pipeline, the user must ensure that the container image built using build.sh
is reachable from the Kubernetes cluster on which Clara is running. For instance, if the container image is my_inference_container:0.1
either ensure
- that this image is present in the container repository local to Clara (i.e. in the same machine), or
- available in a public container repository reachable by the Kubernetes cluster where Clara is deployed (e.g. DockerHub).
To run the pipeline with the given data we must first create a job
cd ..
clara create job -p PIPELINE_ID -n liver-job -f local/input
the command will output a job id JOB_ID
.
To run the created job use
clara start job -j JOB_ID