Step #2: Start the Triton Inference Server

As noted previously, The Triton Inference server Kubernetes deployment object has already been deployed to the cluster as part of the helm chart installation. But, . Now that we have saved the model as part of Step#1 of the lab (Jupyter notebook), let’s start the Triton Inference Server pod as part of the model deployment. Also, since we only have a single GPU to work with on this environment we need to scale down the training Jupyter notebook pod.

Using the System Console link on the left-hand navigation pane, open the System console. You will use the it to start the Triton Inference Server Pod.

image-classification-nav.png

Using the commands below, scale down the Jupyter pod.

Copy
Copied!
            

oc scale deployments image-classification --replicas=0 -n classification-namespace

Wait for a few seconds and scale up the Triton Inference Server pod using the following command.

Copy
Copied!
            

oc scale deployments image-classification-tritonserver-image-classification --replicas=1 -n classification-namespace

Keep checking the status of the Triton Inference Server pod using the command below. Only proceed to the next step once the pod is in a Running state. It might take a few minutes to pull the Triton Inference Server container from NGC.

Copy
Copied!
            

oc get pods -A | grep triton

Once the pod is in a Running state. You can check the logs by running the command below.

Copy
Copied!
            

oc logs name_of_the_triton_pod_from_previous_command

Within the console output, notice the Triton model repository contains the mobilenet_classifier model which was saved within the Jupyter Notebook and the status is Ready.

© Copyright 2022-2023, NVIDIA. Last updated on Jan 10, 2023.