Optimize VMware vSphere for AI and Data Science Workloads
Optimize VMware vSphere for AI and Data Science Workloads (Latest Version)

Step #4: Installing AI and Data Science Applications and Frameworks

Once the above VM prerequisites are met, the VM needs to be further configured to execute AI Training and deploying Triton Inference Server. The following sections describe additional application-specific configurations which are necessary, as well as the required docker container pull for the VM. The next steps are outlined below and will be executed inside the VM:

  • Create a directory to hold the dataset.

  • Pull the appropriate docker container from NVIDIA NGC Enterprise Catalog.

  • AutoStart application-specific services.

Since AI Practitioners will leverage this VM for AI training, a TensorFlow container is pulled from the NVIDIA NGC Enterprise Catalog. This section contains detailed steps for pulling a BERT container built on top of the TensorFlow container. We will also create a dataset folder inside the home directory of the VM and set up a systemd process to restart a Jupyter notebook upon cloning the VM and reboot. This will ensure the AI Practitioner can quickly leverage this VM since the Jupyter notebook server will be up and running.

Execute the following workflow steps within the VM in order to pull the containers.

  1. Generate or use an existing API key. If you have followed steps within LaunchPad, this step was done previously when installing NVIDIA AI Enterprise Guest driver. If you have not done so already, confirm your access the NVIDIA NGC Enterprise Catalog.

    Note

    You received an email from NVIDIA NGC when you were approved for NVIDIA LaunchPad, if you have not done so already, please click on the link within the email to activate the NVIDIA AI Enterprise NGC Catalog.


  2. Log in to the NGC container registry.

    Copy
    Copied!
                

    docker login nvcr.io


  3. When prompted for your user name, enter the following text:

    Copy
    Copied!
                

    $oauthtoken

    The $oauthtoken user name is a special user name that indicates that you will authenticate with an API key and not a user name and password.

  4. When prompted for your password, enter your NGC API key as shown in the following example.

    Copy
    Copied!
                

    Username: $oauthtoken Password: my-api-key

    Note

    When you get your API key as explained in Generating Your NGC API Key, copy it to the clipboard so that you can paste the API key into the command shell when you are prompted for your password.


  5. Create a triton directory inside the VM for the AI Practitioner to host the model.

    Copy
    Copied!
                

    mkdir ~/triton


  6. Pull the appropriate NVIDIA AI Enterprise containers.

    Important

    You will need access to NVIDIA NGC in order to pull the docker files called out below.

    Copy
    Copied!
                

    sudo docker pull nvcr.io/nvaie/tensorflow:21.07-tf1-py3

    Note

    For most of the AI Training use cases, the TensorFlow base container is sufficient, but since we are going to use an NVIDIA pre-trained model for creating a custom Conversational AI model which will be further trained on your data, we need these additional libraries So we will build a container with extra libraries on top of the NVIDIA AI Enterprise container.


  7. Clone the directory below.

    Copy
    Copied!
                

    git clone https://github.com/NVIDIA/DeepLearningExamples.git


  8. Change to the directory.

    Copy
    Copied!
                

    cd DeepLearningExamples/TensorFlow/LanguageModeling/BERT


  9. Finally build the custom docker container.

    Copy
    Copied!
                

    docker build -t bert_container .


  10. Create a script to run TensorFlow automatically on template clone or VM reboot.

    Copy
    Copied!
                

    touch ~/jupyter-startup.sh


  11. Add the following contents to the file.

    Copy
    Copied!
                

    nano ~/jupyter-startup.sh

    Copy
    Copied!
                

    #!/bin/bash docker run --gpus=all -v /home/temp/triton:/triton --net=host bert_container jupyter-notebook --ip='0.0.0.0' --NotebookApp.token='' --NotebookApp.base_url='/notebook/'


  12. Make the script executable.

    Copy
    Copied!
                

    chmod +x ~/jupyter-startup.sh


  13. Create a systemd process for auto startup

    Copy
    Copied!
                

    sudo vim /etc/systemd/system/jupyter.service


  14. Add the following content to the service file.

    Copy
    Copied!
                

    [Unit] Description=Starts Jupyter server [Service] ExecStart=/home/nvidia/jupyter-startup-tf.sh #use your home path [Install] WantedBy=multi-user.target


  15. Start and enable the service on reboot.

    Copy
    Copied!
                

    sudo systemctl start jupyter.service sudo systemctl enable jupyter.service


  16. Create a script to start Triton Inference Server that will allow the AI Practitioner to start the server later.

    Copy
    Copied!
                

    touch ~/triton-startup.sh


  17. Add the following contents to the file.

    Copy
    Copied!
                

    vim ~/triton-startup.sh

    Copy
    Copied!
                

    #!/bin/bash docker run --gpus all --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 -p8000:8000 -p8001:8001 -p8002:8002 --name triton_server_cont -v /home/temp/triton_models:/models nvcr.io/nvaie/tritonserver:21.07-py3 tritonserver --model-store=/models --strict-model-config=false --log-verbose=1


Create a template from the VM

Now that the VM has been appropriately configured for AI Training and deploying Inference, the final workflow for the IT Administrator is to create a VM Template that can be used to deploy VMs in the future rapidly. The IT Administrator creates a template for the VM and then clones the templates to serve multiple AI Practitioners/Engineers. We will create a template for the VM for this guide, but organizations may also choose to create templates using an OVF file.

Guest customization specifications can be created in vCenter; these specifications for system settings are essentially XML files that contain guest operating system settings for virtual machines. When you apply a specification to the guest operating system during virtual machine cloning or deployment, you prevent conflicts that might result in deploying virtual machines with identical settings, such as duplicate DNS computer names.

Follow the VMware Doc to create a customization spec for Linux.

  • Shutdown the VM.

  • In the vCenter, right-click on the newly created VM -> select Clone -> select “Clone to template”.

  • Add name folder -> Select the compute resource -> add storage -> select the guest customization spec that you created -> Click on finish.

© Copyright 2022-2023, NVIDIA. Last updated on Jan 10, 2023.