11.1. Developing Clara Pipeline Operators Using the Operator Development Kit
The Operator Development Kit (ODK) contains the minimal set of assets for a user to:
- be able to develop a Clara pipeline operator from scratch, or
- to migrate an existing container image into a Clara pipeline operator.
As provided, the ODK is a functioning inference operator using a COVID-19 Classification model.
Download the ODK from NGC and unzip the package locally using Note: You can download the file with Guest account or using your NGC assess.
Once you have downloaded the ODK asset, then unzip the file into a directory.
unzip app_operator.zip -d app_operator
The operator development package should contain the following.
└─ app_operator
├─ Dockerfile
├─ main.py
├─ build.sh
├─ run-local.sh
├─ run-triton.sh
├─ requirements.txt
├─ def
| └─ one-operator-pipeline.yml
└─ local
├─ input
| ├─ volume-covid19-A-0000.nii.gz
| ├─ volume-covid19-A-0001.nii.gz
| └─ volume-covid19-A-0010.nii.gz
├─ output
└─ models
└─ covid
├─ 1
│ └─ model.pt
└─ config.pbtxt
Dockerfile
builds the operator container. This build script can be used to build the operator container, or to package an existing custom inference container image as a Clara pipeline operator (please see Packaging existing container to run in Clara below).main.py
is the Python script that is executed when the operator is instantiated. The script reads NIFTI files from an input folder, applies simple transformations to each file to prepare them for inference, and uses the Triton Client library to send the inference to the COVID-19 classification model deployed to Triton, outputting the result of each inference request in the output folder.build.sh
is a Bash script that initiates packaging the application into a container.run-local
is a Bash script that initiates a local (stand-alone) containerized operator. Forrun-local.sh
to run successfully the user must first build the operator’s container image usingbuild.sh
and runrun-triton.sh
to make the model available to the operator for inference.run-triton.sh
is a Bash script that starts a local containerized Triton Inference Server.requirements.txt
lists all the libraries needed formain.py
to run.pipelines
contains the operator and pipeline (one-operator-pipeline.yaml
) definitions necessary to deploy the pipeline and operator in Clara Deploy.
Note that Clara Deploy only requires the deployment of pipelines, and will pull any necessary container images at runtime. Therefore, ensure any required containers have been pulled locally, or are available for Clara Deploy to pull at runtime. When pulling a container requires authentication (aka logon), it is recommended that these containers are pull prior to attempting to create a Clara pipeline-job which requires them. local
is a directory containing the model artifacts necessary to runmain.py
locally in the state it is distributed.run-triton.sh
will mount thelocal/models
folder to the Triton Inference Server container, making the models available for inference.run-local
will mount thelocal/input
folder into the operator container to allow the operator container to read*.nii.gz
and send it for inference.
Before going into the development details and learning how to modify the ODK, let us first try to run the example COVID-19 inference out of the box.
To run the inference model locally make sure to install Docker CE, and NVIDIA Docker 2 to be able to serve the model on a GPU. You may follow the steps here.
To run the packaged COVID-19 inference model:
Build the Clara example inference operator using .. code-block:: guess
./build.sh
Start the Triton inference server using .. code-block:: guess
./run-triton.sh
Run the Clara operator locally using .. code-block:: guess
./run-local.sh
Once the inference completes check the output of the inference in the
local/output
directory. .. code-block:: guess- ─ output
├── volume-covid19-A-0000.nii_result.txt ├── volume-covid19-A-0001.nii_result.txt └── volume-covid19-A-0010.nii_result.txt
Within each file is either a positive or negative results for COVID-19. .. code-block:: guess
volume-covid19-A-0010.nii_result.txt –> COVID Negative volume-covid19-A-0000.nii_result.txt –> COVID Positive volume-covid19-A-0001.nii_result.txt –> COVID Negative
## Customizing the Operator Development Kit Example Code
The ODK can be used as a starting point for the development of custom Clara operators or the migration of existing container images into Clara operators. This section explains the focal components of the ODK when building a custom Clara operator (with inference).
11.1.3.1. Clara Operator-specific Variables in Python
main.py
is the functional part of the code the developer may customize for their own purposes. As provided main.py
is an example of decorating the code with the necessary components. Because Clara will expect certain components in the pipeline defintion be sure to take care of these key components in development
- Enviornment variables
- Inputs
- Pre-transformations
- Infererence
- Post-transformations
- Outputs
The following describes the components used in the operator development kit sample application.
The input component reads the input data from the input payload which in this case is assumed to be an input directory path defined by
OPERATOR_INPUT_PATH
with one or more files, .. code-block:: guessinput_path = os.getenv(‘OPERATOR_INPUT_PATH’, ‘/input’)
The pre-transformation component performs pre-inference transformations to the input data. This component is optional and model dependent. .. code-block:: guess
- pre_transforms = Compose([
LoadImage(reader=”NibabelReader”, image_only=True, dtype=np.float32), AddChannel(), ScaleIntensityRange(a_min=-1000, a_max=500, b_min=0.0, b_max=1.0, clip=True), CropForeground(margin=5), Resize([192, 192, 64], mode=”area”),
])
The inference uses the Triton Python client to perform inference via the model deployed in Triton Inference Server (see Section “Deploy the Model” below), .. code-block:: guess
inference_ctx = initialize_inference_server() inference_response = inference_ctx.run({ model_input_label : (inference_image,) },
{ model_output_label: InferContext.ResultFormat.RAW }, batch_size = 1)
Post-transformation component performs post-inference transformations to the inference result, .. code-block:: guess
- post_transforms = Compose([
ToTensor(), Activations(sigmoid=True), AsDiscrete(threshold_values=True, logit_thresh=0.5), ToNumpy(),
])
Output component finally writes the output data to the path defined by
OPERATOR_OUTPUT_PATH
. .. code-block:: guessoutput_path = os.getenv(‘OPERATOR_OUTPUT_PATH’, ‘/output’)
The full code set can be found in main.py
This section assumes that you have installed a Clara cluster which has access to the container image built using build.sh
(by default the image is tagged clara/simple-operator:0.1.0
). To install or upgrade Clara Deploy SDK, follow the steps outlined on the Installation page of the Clara Deploy User Guide.
11.1.4.1. Deploy the Model
Before deploying the operator, one must ensure that the inference model is deployed in Clara (if an inference model is used). In the case of the example COVID-19 classification model, the user must copy the contents under local/models
to the Clara model repository directory, by default /clara/common/models
, resulting in a directory structure such as
clara
└─ common
└─ models
└─ covid
├─ 1
│ └─ model.pt
└─ config.pbtxt
The ODK contains a PyTorch model, however, the user may deploy ONNX, Tensorflow Graphdef, Tensoflow SavedMode, TensorRT, and other formats supported by the Triton Inference Server, with an accompanying config.pbtxt
for the model.
11.1.4.2. Deploy the Operator
To deploy the singleton-operator example pipeline provided start in the top level ODK folder.
cd pipelines
clara create pipelines -p one-operator-pipeline.yml
clara create
will output a pipeline id, <pipeline-id>
, if deployed successfully.
Before trying to run a job from the newly deployed pipeline, the user must ensure that the container image built using build.sh
is reachable from the Kubernetes cluster on which Clara is running. For instance, if the container image is my_inference_container:0.1
either ensure
- that this image is present in the container repository local to Clara (i.e. in the same machine), or
- available in a public container repository reachable by the Kubernetes cluster where Clara is deployed (e.g. DockerHub).
To run the pipeline with the given data we must first create a job
cd ..
clara create job -p <pipeline-id> -n covid-class -f local/input
The output will be a job id <job-id>
.
It is important to provide the input option with -f
here as this takes the local input data and moves (uploads) it to the expected Clara operator location for input data.
JOB_ID: <job-id>
PAYLOAD_ID: <payload-id>
Payload uploaded successfully.
To run the created job use
clara start job -j <job-id>
To check for a successful job use
clara list jobs
Look for the <job-id>
and corresponding <payload-id>
for the job.
Inspect the outputs using the <job-id>
and corresponding <payload-id>
in the corresponding <payload-id>
folder which by default Clara always stores in the defined payloads folder.
cd /clara/payloads/<payload-id>
The output of the classification results are stored as text correxponding with each of the inputs as shown below.
├── input
│ ├── volume-covid19-A-0000.nii.gz
│ ├── volume-covid19-A-0001.nii.gz
│ └── volume-covid19-A-0010.nii.gz
├── NVIDIA
│ └── Clara
└── operators
└── simple-operator
└── classification-output
├── volume-covid19-A-0000.nii_result.txt
├── volume-covid19-A-0001.nii_result.txt
└── volume-covid19-A-0010.nii_result.txt
To package a custom inference container as a Clara operator you may update the Dockerfile
and build.sh
script. Packaging your own code to run in a Clara operator requires you to define and install environment requirements, an ENTRYPOINT
, and any optional dependencies for your inference application.
For example, the Dockerfile
to package an existing container may look something like
FROM my_inference_container:0.1.0
...
# install Clara-compatible Triton client
ARG TRITON_CLIENTS_URL=https://github.com/triton-inference-server/server/releases/download/v2.11.0/v2.11.0_ubuntu2004.clients.tar.gz
RUN mkdir -p /opt/nvidia/triton-clients \
&& curl -L ${TRITON_CLIENTS_URL} | tar xvz -C /opt/nvidia/triton-clients
RUN pip install --no-cache-dir future==0.18.2 grpcio==1.27.2 protobuf==3.11.3 /opt/nvidia/triton-clients/python/*.whl
# install operator requirements
RUN pip install --upgrade setuptools &&\
pip install -r requirements.txt
ENTRYPOINT ["bash", "-c", "python -u my_script.py"]
Note that, you must include the Triton client installation steps in the script if you want to use Triton for inference. This step can be omitted if the inference model is packaged in your container.
Similary, use the build.sh
script to package the container. Simply pass the fully qualified name and tag of the desired container to the script.
./build.sh custom_inference_container:1.0.0
note: By defulat the script will create an operator named clara/simple-operator:1.0.0
Once the container is successfuly created, then go back to Deploy The Operator and follow the same steps using your custom container.