AI Practitioner

Natural Language Processing with Triton Inference Server (0.1.0)

With AI in the Data Center, IT Administrators will typically create VMs, which will be accessed by either AI Practitioners (for model training and creation) or a DevOps Engineer (for model deployment). Please refer to the User Persona section for further information. If your IT Admin would like to details regarding how to create VM templates, refer to the IT Administrator section of this guide.

Once the IT Administrator has successfully created the a VM, the AI Practitioner can quickly leverage the VM since the software stack is pre-installed and the Jupyter notebook server is running.

The following steps are outlined below and will be executed inside the AI Training VM:

  • Train the BERT Question-Answering (QA) model.

  • Export model to Triton Inference Server format.

  • Optional - Convert the model to TensorRT.

On your web browser open a tab with the following URL to open a Jupyter server instance. The VM IP address will be provided to the AI Practitioner by the IT Administrator.

For example: http://<VM_IP>:8888

Since the VM is based on a template containing the BERT container from NVIDIA NGC, a sample Jupyter Notebook is also provided within the VM, which can be quickly leveraged to perform BERT QA model training. Within this Notebook, the AI Practitioner will take the model that has been pre-trained and fine-tune it on the much smaller dataset to get the needed amount of accuracy on the customer-specific use case.

  • Open and run through the bert fine-tuning Jupyter Notebook in the notebooks folder of your jupyter notebook container in the $HOME/notebooks directory to fine-tune a BERT model on the SQuAD dataset.

  • Open the terminal on the Jupyer Notebook and download the squad the data set with the command below:


    python3 /workspace/bert/data/ --action download --dataset squad


Linux bash commands can run inside the Jupyter Notebook adding a pound symbol (!)located before the command inside the Jupyter Notebook cell. This is shown in the notebook linked above and is used to download the pre-trained BERT model for fine-tuning.

Now we will export the trained model to a format Triton uses. Triton Inference Server can deploy models trained using TensorFlow, Pytorch, ONNX, and TensorRT. We will first save the TensorFlow model for this guide, and then in the upcoming step, we will convert the model to TensorRT for the best performance.


This step can be skipped if you already have a trained model in Triton format.

For TensorFlow saved models, the Triton requires the model to be exported to a directory in the following format:


<model-repository-path>/ <model-name>/ config.pbtxt 1/ model.savemodel/ <save-model files>

Follow Triton Inference Server Model Repository for more information on the model repository format for ONNX, TensorRT, and Pytorch.

The steps below show the process of exporting a TensorFlow checkpoint to the directory format shown above. If you already have your model saved in the format above from your NVIDIA NGC Catalog, you can skip the section below.

  1. Create a bert_dllog.json file.


    mkdir /results touch /results/bert_dllog.json

  2. Export the Triton model.


    python --vocab_file=/workspace/bert/data/download/finetuned_large_model_SQUAD1.1/vocab.txt --bert_config_file=/workspace/bert/data/download/finetuned_large_model_SQUAD1.1/bert_config.json --init_checkpoint=/workspace/bert/data/download/finetuned_large_model_SQUAD1.1/model.ckpt --max_seq_length=384 --doc_stride=128 --predict_batch_size=8 --output_dir=/triton --export_triton=True --amp --use_xla

  3. After running the export python script, you should have the following directory structure inside the VM:



    As part of the AI Practitioner workflow, ensure the files are inside the VM in the $HOME/triton directory. The server VM needs to be restarted for the model to load and changes to kick in.

Enterprises may have both IT Administrators as well as DevOps Engineers, while others may not, this is dependent on the size of the Enterprise. For Enterprises who are not afforded DevOps Engineers, the AI Practitioner may need to proceed with following DevOps section in order to deploy the model to Triton Inference Server. In such case, please ensure your IT Administrator has created the Triton Inference Server VM using the instructions within the IT Administrator.


For large scale production inference deployments, please refer to the Appendix – Scaling Triton Inference Server.

Previous IT Administrator
Next DevOps Engineer
© Copyright 2024, NVIDIA. Last updated on Apr 2, 2024.