Step #2: Starting the Triton Inference Server
The Triton Inference server Kubernetes deployment object has already been deployed to the cluster, but it currently does not have access to the GPU. Now that we have saved the model as part of Step #1 of the lab (Jupyter notebook), let’s start the Triton Inference Server pod as part of the model deployment. Training the model required using the GPU, but since we are done training and saving the model we can allocate the GPU to Triton. The first step to do this is to scale down the training Jupyter notebook pod.
Using the System Console link on the left navigation pane, open the System console. You will use it to start the Triton Inference Server Pod.
Using the commands below, scale down the Jupyter pod.
kubectl scale deployments fraud-jupyter-notebooks -n fraud-detection --replicas=0
Wait for a few seconds and scale up the Triton Inference Server pod using the following command.
kubectl scale deployments triton-server -n fraud-detection --replicas=1
Keep checking the status of the Triton Inference Server pod using the command below. Only proceed to the next step once the pod is in a Running state. It might take a few minutes to pull the Triton Inference Server container from NGC.
kubectl get pods -n fraud-detection | grep triton
Once the pod is in a Running state. You can check the logs by running the command below.
kubectl logs -n fraud-detection name_of_the_triton_pod_from_previous_command
Within the console output, notice the Triton model repository contains the mobilenet_classifier model which was saved within the Jupyter Notebook and the status is Ready.