AI Practitioner - NVIDIA Docs

With AI in the Data Center, IT Administrators will typically create VMs, which will be accessed by either AI Practitioners (for model training and creation) or a DevOps Engineer (for model deployment). Please refer to the User Persona section for further information. If your IT Admin would like to details regarding how to create VM templates, refer to the IT Administrator section of this guide.

Once the IT Administrator has successfully created the a VM, the AI Practitioner can quickly leverage the VM since the software stack is pre-installed and the Jupyter notebook server is running.

The following steps are outlined below and will be executed inside the AI Training VM:

Train the BERT Question-Answering (QA) model.
Export model to Triton Inference Server format.
Optional - Convert the model to TensorRT.

Train the BERT QA Model

On your web browser open a tab with the following URL to open a Jupyter server instance. The VM IP address will be provided to the AI Practitioner by the IT Administrator.

For example: http://<VM_IP>:8888

Since the VM is based on a template containing the BERT container from NVIDIA NGC, a sample Jupyter Notebook is also provided within the VM, which can be quickly leveraged to perform BERT QA model training. Within this Notebook, the AI Practitioner will take the model that has been pre-trained and fine-tune it on the much smaller dataset to get the needed amount of accuracy on the customer-specific use case.

Open and run through the bert fine-tuning Jupyter Notebook in the notebooks folder of your jupyter notebook container in the $HOME/notebooks directory to fine-tune a BERT model on the SQuAD dataset.

Open the terminal on the Jupyer Notebook and download the squad the data set with the command below:

Copy
Copied!

            
            python3 /workspace/bert/data/bertPrep.py --action download --dataset squad

Note

Linux bash commands can run inside the Jupyter Notebook adding a pound symbol (!)located before the command inside the Jupyter Notebook cell. This is shown in the notebook linked above and is used to download the pre-trained BERT model for fine-tuning.

Export Model to Triton Inference Server Format

Now we will export the trained model to a format Triton uses. Triton Inference Server can deploy models trained using TensorFlow, Pytorch, ONNX, and TensorRT. We will first save the TensorFlow model for this guide, and then in the upcoming step, we will convert the model to TensorRT for the best performance.

Note

This step can be skipped if you already have a trained model in Triton format.

For TensorFlow saved models, the Triton requires the model to be exported to a directory in the following format:

Copy
Copied!

            
            <model-repository-path>/
  <model-name>/
    config.pbtxt
    1/
     model.savemodel/
       <save-model files>

Follow Triton Inference Server Model Repository for more information on the model repository format for ONNX, TensorRT, and Pytorch.

The steps below show the process of exporting a TensorFlow checkpoint to the directory format shown above. If you already have your model saved in the format above from your NVIDIA NGC Catalog, you can skip the section below.

Create a bert_dllog.json file.

Copy
Copied!

            
            mkdir /results
touch /results/bert_dllog.json

Export the Triton model.

Copy
Copied!

            
            python run_squad.py --vocab_file=/workspace/bert/data/download/finetuned_large_model_SQUAD1.1/vocab.txt --bert_config_file=/workspace/bert/data/download/finetuned_large_model_SQUAD1.1/bert_config.json --init_checkpoint=/workspace/bert/data/download/finetuned_large_model_SQUAD1.1/model.ckpt --max_seq_length=384 --doc_stride=128 --predict_batch_size=8 --output_dir=/triton --export_triton=True --amp --use_xla

After running the export python script, you should have the following directory structure inside the VM:

Note

As part of the AI Practitioner workflow, ensure the files are inside the VM in the $HOME/triton directory. The server VM needs to be restarted for the model to load and changes to kick in.

Enterprises may have both IT Administrators as well as DevOps Engineers, while others may not, this is dependent on the size of the Enterprise. For Enterprises who are not afforded DevOps Engineers, the AI Practitioner may need to proceed with following DevOps section in order to deploy the model to Triton Inference Server. In such case, please ensure your IT Administrator has created the Triton Inference Server VM using the instructions within the IT Administrator.

Note

For large scale production inference deployments, please refer to the Appendix – Scaling Triton Inference Server.