Step #4: Installing AI and Data Science Applications and Frameworks
Once the above VM prerequisites are met, the VM needs to be further configured to execute AI Training and deploying Triton Inference Server. The following sections describe additional application-specific configurations which are necessary, as well as the required docker container pull for the VM. The next steps are outlined below and will be executed inside the VM:
Create a directory to hold the dataset.
Pull the appropriate docker container from NVIDIA NGC Enterprise Catalog.
AutoStart application-specific services.
Since AI Practitioners will leverage this VM for AI training, a TensorFlow container is pulled from the NVIDIA NGC Enterprise Catalog. This section contains detailed steps for pulling a BERT container built on top of the TensorFlow container. We will also create a dataset folder inside the home directory of the VM and set up a systemd process to restart a Jupyter notebook upon cloning the VM and reboot. This will ensure the AI Practitioner can quickly leverage this VM since the Jupyter notebook server will be up and running.
Execute the following workflow steps within the VM in order to pull the containers.
Generate or use an existing API key. If you have followed steps within LaunchPad, this step was done previously when installing NVIDIA AI Enterprise Guest driver. If you have not done so already, confirm your access the NVIDIA NGC Enterprise Catalog.Note
You received an email from NVIDIA NGC when you were approved for NVIDIA LaunchPad, if you have not done so already, please click on the link within the email to activate the NVIDIA AI Enterprise NGC Catalog.
Log in to the NGC container registry.
docker login nvcr.io
When prompted for your user name, enter the following text:
The $oauthtoken user name is a special user name that indicates that you will authenticate with an API key and not a user name and password.
When prompted for your password, enter your NGC API key as shown in the following example.
Username: $oauthtoken Password: my-api-keyNote
When you get your API key as explained in Generating Your NGC API Key, copy it to the clipboard so that you can paste the API key into the command shell when you are prompted for your password.
tritondirectory inside the VM for the AI Practitioner to host the model.
Pull the appropriate NVIDIA AI Enterprise containers.Important
You will need access to NVIDIA NGC in order to pull the docker files called out below.
sudo docker pull nvcr.io/nvaie/tensorflow:21.07-tf1-py3Note
For most of the AI Training use cases, the TensorFlow base container is sufficient, but since we are going to use an NVIDIA pre-trained model for creating a custom Conversational AI model which will be further trained on your data, we need these additional libraries So we will build a container with extra libraries on top of the NVIDIA AI Enterprise container.
Clone the directory below.
git clone https://github.com/NVIDIA/DeepLearningExamples.git
Change to the directory.
Finally build the custom docker container.
docker build -t bert_container .
Create a script to run TensorFlow automatically on template clone or VM reboot.
Add the following contents to the file.
#!/bin/bash docker run --gpus=all -v /home/temp/triton:/triton --net=host bert_container jupyter-notebook --ip='0.0.0.0' --NotebookApp.token='' --NotebookApp.base_url='/notebook/'
Make the script executable.
chmod +x ~/jupyter-startup.sh
Create a systemd process for auto startup
sudo vim /etc/systemd/system/jupyter.service
Add the following content to the service file.
[Unit] Description=Starts Jupyter server [Service] ExecStart=/home/nvidia/jupyter-startup-tf.sh #use your home path [Install] WantedBy=multi-user.target
Start and enable the service on reboot.
sudo systemctl start jupyter.service sudo systemctl enable jupyter.service
Create a script to start Triton Inference Server that will allow the AI Practitioner to start the server later.
Add the following contents to the file.
#!/bin/bash docker run --gpus all --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 -p8000:8000 -p8001:8001 -p8002:8002 --name triton_server_cont -v /home/temp/triton_models:/models nvcr.io/nvaie/tritonserver:21.07-py3 tritonserver --model-store=/models --strict-model-config=false --log-verbose=1
Create a template from the VM
Now that the VM has been appropriately configured for AI Training and deploying Inference, the final workflow for the IT Administrator is to create a VM Template that can be used to deploy VMs in the future rapidly. The IT Administrator creates a template for the VM and then clones the templates to serve multiple AI Practitioners/Engineers. We will create a template for the VM for this guide, but organizations may also choose to create templates using an OVF file.
Guest customization specifications can be created in vCenter; these specifications for system settings are essentially XML files that contain guest operating system settings for virtual machines. When you apply a specification to the guest operating system during virtual machine cloning or deployment, you prevent conflicts that might result in deploying virtual machines with identical settings, such as duplicate DNS computer names.
Follow the VMware Doc to create a customization spec for Linux.
Shutdown the VM.
In the vCenter, right-click on the newly created VM -> select Clone -> select “Clone to template”.
Add name folder -> Select the compute resource -> add storage -> select the guest customization spec that you created -> Click on finish.