Step #4: Installing AI and Data Science Applications and Frameworks

Once the above VM prerequisites are met, the VM needs to be further configured to execute AI Training and deploying Triton Inference Server. The following sections describe additional application-specific configurations which are necessary, as well as the required docker container pull for the VM. The next steps are outlined below and will be executed inside the VM:

Create a directory to hold the dataset.
Pull the appropriate docker container from NVIDIA NGC Enterprise Catalog.
AutoStart application-specific services.

Configuring The VM For Bert Model Training and Inference

Since AI Practitioners will leverage this VM for AI training, a TensorFlow container is pulled from the NVIDIA NGC Enterprise Catalog. This section contains detailed steps for pulling a BERT container built on top of the TensorFlow container. We will also create a dataset folder inside the home directory of the VM and set up a systemd process to restart a Jupyter notebook upon cloning the VM and reboot. This will ensure the AI Practitioner can quickly leverage this VM since the Jupyter notebook server will be up and running.

Execute the following workflow steps within the VM in order to pull the containers.

Generate or use an existing API key. If you have followed steps within LaunchPad, this step was done previously when installing NVIDIA AI Enterprise Guest driver. If you have not done so already, confirm your access the NVIDIA NGC Enterprise Catalog.

Note

You received an email from NVIDIA NGC when you were approved for NVIDIA LaunchPad, if you have not done so already, please click on the link within the email to activate the NVIDIA AI Enterprise NGC Catalog.

Log in to the NGC container registry.

Copy
Copied!

            
            docker login nvcr.io

When prompted for your user name, enter the following text:
Copy

Copied!
```
            
            $oauthtoken
        
```
The $oauthtoken user name is a special user name that indicates that you will authenticate with an API key and not a user name and password.
When prompted for your password, enter your NGC API key as shown in the following example.
Copy

Copied!
```
            
            Username: $oauthtoken
Password: my-api-key
        
```
Note

When you get your API key as explained in Generating Your NGC API Key, copy it to the clipboard so that you can paste the API key into the command shell when you are prompted for your password.
Create a triton directory inside the VM for the AI Practitioner to host the model.
Copy

Copied!
```
            
            mkdir ~/triton
        
```
Pull the appropriate NVIDIA AI Enterprise containers.
Important

You will need access to NVIDIA NGC in order to pull the docker files called out below.
Copy

Copied!
```
            
            sudo docker pull nvcr.io/nvaie/tensorflow:21.07-tf1-py3
        
```
Note

For most of the AI Training use cases, the TensorFlow base container is sufficient, but since we are going to use an NVIDIA pre-trained model for creating a custom Conversational AI model which will be further trained on your data, we need these additional libraries So we will build a container with extra libraries on top of the NVIDIA AI Enterprise container.

Clone the directory below.

Copy
Copied!

            
            git clone https://github.com/NVIDIA/DeepLearningExamples.git

Change to the directory.

Copy
Copied!

            
            cd DeepLearningExamples/TensorFlow/LanguageModeling/BERT

Finally build the custom docker container.

Copy
Copied!

            
            docker build -t bert_container .

Create a script to run TensorFlow automatically on template clone or VM reboot.

Copy
Copied!

            
            touch ~/jupyter-startup.sh

Add the following contents to the file.

Copy
Copied!

            
            nano ~/jupyter-startup.sh

Copy
Copied!

            
            #!/bin/bash
docker run --gpus=all -v /home/temp/triton:/triton --net=host bert_container jupyter-notebook --ip='0.0.0.0' --NotebookApp.token='' --NotebookApp.base_url='/notebook/'

Make the script executable.

Copy
Copied!

            
            chmod +x ~/jupyter-startup.sh

Create a systemd process for auto startup

Copy
Copied!

            
            sudo vim /etc/systemd/system/jupyter.service

Add the following content to the service file.

Copy
Copied!

            
            [Unit]
Description=Starts Jupyter server

[Service]
ExecStart=/home/nvidia/jupyter-startup-tf.sh #use your home path

[Install]
WantedBy=multi-user.target

Start and enable the service on reboot.

Copy
Copied!

            
            sudo systemctl start jupyter.service
sudo systemctl enable jupyter.service

Create a script to start Triton Inference Server that will allow the AI Practitioner to start the server later.
Copy

Copied!
```
            
            touch ~/triton-startup.sh
        
```

Add the following contents to the file.

Copy
Copied!

            
            vim ~/triton-startup.sh

Copy
Copied!

            
            #!/bin/bash
docker run --gpus all    --shm-size=1g    --ulimit memlock=-1   --ulimit stack=67108864    -p8000:8000    -p8001:8001    -p8002:8002    --name triton_server_cont    -v /home/temp/triton_models:/models nvcr.io/nvaie/tritonserver:21.07-py3 tritonserver --model-store=/models --strict-model-config=false --log-verbose=1

Create a template from the VM

Now that the VM has been appropriately configured for AI Training and deploying Inference, the final workflow for the IT Administrator is to create a VM Template that can be used to deploy VMs in the future rapidly. The IT Administrator creates a template for the VM and then clones the templates to serve multiple AI Practitioners/Engineers. We will create a template for the VM for this guide, but organizations may also choose to create templates using an OVF file.

Create a Guest Customization Specification

Guest customization specifications can be created in vCenter; these specifications for system settings are essentially XML files that contain guest operating system settings for virtual machines. When you apply a specification to the guest operating system during virtual machine cloning or deployment, you prevent conflicts that might result in deploying virtual machines with identical settings, such as duplicate DNS computer names.

Follow the VMware Doc to create a customization spec for Linux.

Create the Virtual Machine Template

Shutdown the VM.
In the vCenter, right-click on the newly created VM -> select Clone -> select “Clone to template”.
Add name folder -> Select the compute resource -> add storage -> select the guest customization spec that you created -> Click on finish.