IT Administrator

When an AI Practitioner or a Software Engineer requests a VM for experimentation or deployment, the IT Administrator creates a new VM from a VM template and serves it to the AI Practitioner. A VM template is a master copy image of a virtual machine that includes VM disks, virtual devices, and software settings. Templates save time and avoid errors when configuring settings for AI workflows. They also ensure that VMs are consistent and standardized when they are created and deployed across the Enterprise.

This section covers creating VM templates from scratch with required NVIDIA AI Enterprise components to perform AI Training and deploy inferencing using Triton. The following graphic illustrates the workflow performed by the IT Administrator.

cb-ti-04.png

IT Administrators can follow this four-step process to serve AI-ready VMs. Within this guide, detailed steps are provided to understand workflow processes clearly.

Before continuing with this guide, ensure the following server requirements are met:

Minimum Server Requirements

Recommended VM configuration is as follows for both AI Training and Inference use cases:

Virtual Machine Configuration

Boot

Configured for EFI

OS

Ubuntu Server 20.04 Server HWE 64-bit

CPU

16 vCPU on a single socket

RAM

64GB

Storage

150GB thin provisioned disk

Network

VMXNet3 NIC

GPU

A100-40C (As an example)

To proceed with this guide, create a VM with the above hardware configuration.

Please refer to the Creating Your First NVIDIA AI Enterprise VM section with the NV AI Enterprise for VMware vSphere Deployment Guide. This guide provides detailed steps for the following requirements:

GPU partitions can be a valid option for executing Deep Learning workloads for Ampere based GPUs. An example is Deep Learning training workflows, which utilize smaller sentence sizes, smaller models, or batch sizes. Inferencing workloads typically don’t require as much GPU memory as training workflows, and the model is generally quantized to run at a lower memory footprint (INT8 and FP16). vGPU with MIG partitioning, allows for a single GPU to be sliced up to seven accelerators. These partitions can then be leveraged by up to seven different VMs, bringing optimal GPU utilization and VM density. To turn MIG on or off on the server, please refer to the Advanced GPU Configuration section of NV AI Enterprise for VMware vSphere Deployment Guide.

Using MIG partitions for Triton Inference server deployments provides a better ROI for many organizations. Therefore, when using this guide, the VM can be assigned a fractional MIG profile such as A100-3-20C. Additional information on MIG is located here.

After the VM is created, perform the following in the VM:

Once the above VM prerequisites are met, the VM needs to be further configured to either execute AI Training and deploying Triton Inference Server. The following sections describe additional application-specific configurations which are necessary, as well as the required docker container pull for the VM. The next steps are outlined below and will be executed inside the VM:

  • Create a directory to hold the dataset.

  • Pull the appropriate docker container from NVIDIA NGC Enterprise Catalog.

  • AutoStart application-specific services.

Configuring the VM for BERT Model Training and Inference

Since AI Practitioners will leverage this VM for AI training, a TensorFlow and Triton Inference Server container are pulled from the NVIDIA NGC Enterprise Catalog. This section contains detailed steps for pulling a BERT container built on top of the TensorFlow container. We will also create a dataset folder inside the home directory of the VM and set up a systemd process to restart a Jupyter notebook upon cloning the VM and reboot. This will ensure the AI Practitioner can quickly leverage this VM since the Jupyter notebook server will be up and running.

Execute the following workflow steps within the VM in order to pull the containers.

  1. Generate or use an existing API key.

  2. Access the NVIDIA NGC Enterprise Catalog.

  3. Create a triton directory inside the VM for the AI Practitioner to host the model.

    Copy
    Copied!
                

    mkdir ~/triton


  4. Pull the appropriate NVIDIA AI Enterprise containers.

    Important

    You will need access to NVIDIA NGC in order to pull the docker files called out below.

    Copy
    Copied!
                

    sudo docker pull nvcr.io/nvaie/tensorflow-<NVAIE-MAJOR-VERSION>:<NVAIE-CONTAINER-TAG>

    Copy
    Copied!
                

    sudo docker pull nvcr.io/nvaie/tritonserver-<NVAIE-MAJOR-VERSION>:<NVAIE-CONTAINER-TAG>

    Note

    For most of the AI Training use cases, the TensorFlow base container is sufficient, but since we are going to use an NVIDIA pre-trained model for creating a custom Conversational AI model which will be further trained on your data, we need these additional libraries. So we will build a container with extra libraries on top of the NVIDIA AI Enterprise container.


  5. Clone the directory below.

    Copy
    Copied!
                

    git clone https://github.com/NVIDIA/DeepLearningExamples.git


  6. Change to the directory.

    Copy
    Copied!
                

    cd DeepLearningExamples/TensorFlow/LanguageModeling/BERT


  7. Finally build the custom docker container.

    Copy
    Copied!
                

    docker build -t bert_container .


  8. Create a script to run the TensorFlow and Triton Inference Server automatically on a template clone or VM upon reboot.

    Copy
    Copied!
                

    touch triton-starup.sh


  9. Create a script to run TensorFlow automatically on template clone or VM reboot.

    Copy
    Copied!
                

    vim ~/triton-startup.sh


  10. Add the following contents to the file.

    Copy
    Copied!
                

    #!/bin/bash docker run -d --gpus=all -v /home/temp/triton:/triton --net=host bert_container jupyter-notebook --ip='0.0.0.0' --NotebookApp.token='' --NotebookApp.base_url='/notebook/'span class="s1">''


  11. Make the script executable.

    Copy
    Copied!
                

    chmod +x ~/triton-startup.sh


  12. Create a systemd process for auto startup

    Copy
    Copied!
                

    sudo vim /etc/systemd/system/jupyter.service


  13. Add the following content to the service file.

    Copy
    Copied!
                

    [Unit] Description=Starts Jupyter server [Service] ExecStart=/home/nvidia/triton-startup.sh #use your home path [Install] WantedBy=multi-user.target


  14. Start and enable the service on reboot.

    Copy
    Copied!
                

    sudo systemctl start jupyter.service sudo systemctl enable jupyter.service


Now that the VM has been appropriately configured for AI Training and deploying Inference, the final workflow for the IT Administrator is to create a VM Template that can be used to deploy VMs in the future rapidly. The IT Administrator creates a template for the VM and then clones the templates to serve multiple AI Practitioners/Engineers. We will create a template for the VM for this guide, but organizations may also choose to create templates using an OVF file.

Create a Guest Customization Specification

Guest customization specifications can be created in vCenter; these specifications for system settings are essentially XML files that contain guest operating system settings for virtual machines. When you apply a specification to the guest operating system during virtual machine cloning or deployment, you prevent conflicts that might result in deploying virtual machines with identical settings, such as duplicate DNS computer names.

Follow the VMware Doc to create a customization spec for Linux.

Create the Virtual Machine Template

  • In the vCenter, right-click on the newly created VM -> select Clone -> select “Clone to template”.

  • Add name folder -> Select the compute resource -> add storage -> select the guest customization spec that you created -> Click on finish.

Enterprises may have both IT Admins as well as DevOps Engineers, while others may not, this is dependent on the size of the Enterprise. For Enterprises who are not afforded DevOps Engineers, the IT Administrator or AI Practitioner may need to proceed with following DevOps section to deploy the model to Triton Inference Server.

Note

For large scale production inference deployments, please refer to the Appendix – Scaling Triton Inference Server. IT Administrators can either use a traditional approach using a load balancer or use Kubernetes to deploy and auto scale Triton.

© Copyright 2022-2023, NVIDIA. Last updated on Jan 23, 2023.