Installing AI and Data Science Applications and Frameworks

Accessing AI and Data Science Tools and Frameworks

The AI and data science applications and frameworks are distributed as NGC container images through the NVIDIA NGC Catalog. Each container image contains the entire user-space software stack that is required to run the application or framework; namely, the CUDA libraries, cuDNN, any required Magnum IO components, TensorRT, and the framework.

Execute the following workflow steps within the environment with which you would like to pull AI and data science containers into.

First you will need to sign into NGC with your NVIDIA account and password.

Navigate to setup.

Select “Get API Key”.

Generate your API key.

Confirm Generate a New API Key.

Copy your API Key to the clipboard

Note

Selecting confirm will generate a new API key, and your old API key will become invalid (if applicable).

When you are interacting with the repository from the command line you are required to use an API key if you want to pull locked container images or push back to the registry. The API key is unique to you and tied to your account.

Important

Keep your API key secret and in a safe place. Do not share it or store it in a place where others can see or copy it.

Return to the SSH session and/or environment terminal to log into podman and begin pulling containers down from the NVIDIA NGC Catalog.

Log into the NGC container registry.

Copy
Copied!

            
            podman login nvcr.io

When prompted for your username, enter the following text:

Copy
Copied!

            
            $oauthtoken

Note

The $oauthtoken username is a special username that indicates that you will authenticate with an API key and not a username and password.

When prompted for your password, paste your NGC API key as shown in the following example.

Copy
Copied!

            
            Username: $oauthtoken
Password: my-api-key

Note

When you get your API key as explained in Generating Your NGC API Key, copy it to the clipboard so that you can paste the API key into the command shell when you are prompted for your password.

Production Branches are designed to provide stability and security for your applications built on NVIDIA AI, offering 9 months of support, API stability, and monthly fixes for high and critical software vulnerabilities. The production branches provide a stable and secure environment to maintain the uptime of mission-critical AI applications.

Feature Branches and Models offers the latest AI frameworks, libraries, workflows, models, and tools for performance-optimized AI development and deployment software.

The Infra Release of NVIDIA AI Enterprise offers infrastructure optimization and management software for IT professionals to manage and scale AI workloads efficiently.

An example container pull using Tensorflow has been provided with detailed steps in the below sections. We will use the “Pull Tag” function to easily copy and paste our container pull commands into the desired environment.

NVIDIA AI Enterprise Search Filters

Using the NVIDIA AI Enterprise search filters, users can access GPU-optimized software for deep learning, machine learning, and HPC through the NGC catalog that provides containers, models, model scripts, and industry solutions.

An example container pull using the NVIDIA RAPIDS Production Branch has been provided, and detailed steps are provided in the sections below using the NVIDIA AI Enterprise search filters. We will use the “Pull Tag” function to easily copy and paste our container pull commands into the desired environment.

Navigate to the search filters and select NVIDIA AI Enterprise Support, NVIDIA AI Enterprise Essentials, Container, and Rapids, as shown in the image below.

Navigate to Get Container and copt the command to your clipboard.

Paste the command into your SSH session

Copy
Copied!

            
            sudo docker pull nvcr.io/nvaie/rapids-pb23h2:23.06.04-runtime

For each AI or data science application that you are interested in, load the container using the pull command function.

For reference, the Podman pull commands for downloading the container for each application or framework are detailed below with the context of <NVAIE-MAJOR-VERSION>:<NVAIE-CONTAINER-TAG>

NVIDIA TensorRT

NVIDIA TensorRT is a C++ library that facilitates high-performance inference on NVIDIA graphics processing units (GPUs). TensorRT takes a trained network and produces a highly optimized runtime engine that performs inference for that network.

Copy
Copied!

            
            sudo podman pull nvcr.io/nvaie/tensorrt-<NVAIE-MAJOR-VERSION>:<NVAIE-CONTAINER-TAG>

NVIDIA Triton Inference Server

Triton Inference Server is an open source software that lets teams deploy trained AI models from any framework, from local or cloud storage and on any GPU- or CPU-based infrastructure in the cloud, data center, or embedded devices.

The xx.yy-py3 image contains the Triton inference server with support for Tensorflow, PyTorch, TensorRT, ONNX and OpenVINO models.

Copy
Copied!

            
            sudo podman pull nvcr.io/nvaie/tritonserver-<NVAIE-MAJOR-VERSION>:<NVAIE-CONTAINER-TAG>

NVIDIA RAPIDS

The NVIDIA RAPIDS suite of software libraries gives you the freedom to execute end-to-end data science, machine learning and analytics pipelines entirely on GPUs.

Copy
Copied!

            
            sudo podman pull nvcr.io/nvaie/nvidia-rapids-<NVAIE-MAJOR-VERSION>:<NVAIE-CONTAINER-TAG>

PyTorch

PyTorch is a GPU accelerated tensor computational framework. Functionality can be extended with common Python libraries such as NumPy and SciPy. Automatic differentiation is done with a tape-based system at the functional and neural network layer levels.

Copy
Copied!

            
            sudo podman pull nvcr.io/nvaie/pytorch-<NVAIE-MAJOR-VERSION>:<NVAIE-CONTAINER-TAG>

TensorFlow

TensorFlow is an open source platform for machine learning. It provides comprehensive tools and libraries in a flexible architecture allowing easy deployment across a variety of platforms and devices.

Copy
Copied!

            
            sudo podman pull nvcr.io/nvaie/tensorflow-<NVAIE-MAJOR-VERSION>:<NVAIE-CONTAINER-TAG>
sudo podman pull nvcr.io/nvaie/tensorflow-<NVAIE-MAJOR-VERSION>:<NVAIE-CONTAINER-TAG>

TAO Toolkit

NVIDIA AI Enterprise 2.0 or later

Train Adapt Optimize (TAO) Toolkit is a python based AI toolkit for taking purpose-built pre-trained AI models and customizing them with your own data. TAO adapts popular network architectures and backbones to your data, allowing you to train, fine-tune, and export highly optimized and accurate AI models for deployment.

Copy
Copied!

            
            sudo podman pull nvcr.io/nvaie/tao-toolkit-lm-<NVAIE-MAJOR-VERSION>:<NVAIE-CONTAINER-TAG>
sudo podman pull nvcr.io/nvaie/tao-toolkit-pyt-<NVAIE-MAJOR-VERSION>:<NVAIE-CONTAINER-TAG>
sudo podman pull nvcr.io/nvaie/tao-toolkit-tf-<NVAIE-MAJOR-VERSION>:<NVAIE-CONTAINER-TAG>