Select the documentation center to browse.

Optimized Frameworks
The NVIDIA Optimized Frameworks such as Kaldi, MXNet, NVCaffe, PyTorch, and TensorFlow offer flexibility with designing and training custom deep neural networks (DNNs) for machine learning and AI applications.
Browse >
TensorRT
The core of NVIDIA TensorRT is a C++ library that facilitates high-performance inference on NVIDIA graphics processing units (GPUs). TensorRT takes a trained network, which consists of a network definition and a set of trained parameters, and produces a highly optimized runtime engine which performs inference for that network.
Browse >


Triton Inference Server
NVIDIA Triton Inference Server (formerly TensorRT Inference Server) provides a cloud inferencing solution optimized for NVIDIA GPUs. The server provides an inference service via an HTTP or GRPC endpoint, allowing remote clients to request inferencing for any model being managed by the server.
Browse >
NCCL
The NVIDIA Collective Communications Library (NCCL) is a library of multi-GPU collective communication primitives that are topology-aware and can be easily integrated into applications. Collective communication algorithms employ many processors working in concert to aggregate data. NCCL is not a full-blown parallel programming framework; rather, it is a library focused on accelerating collective communication primitives.
Browse >



DALI
NVIDIA Data Loading Library (DALI) is a collection of highly optimized building blocks, and an execution engine, to accelerate the pre-processing of the input data for deep learning applications. DALI provides both the performance and the flexibility for accelerating different data pipelines as a single library. This single library can then be easily integrated into different deep learning training and inference applications.
Browse >



Deep Learning Performance
GPUs accelerate machine learning operations by performing calculations in parallel. Many operations, especially those representable as matrix multiplies, will see good acceleration right out of the box. Even better performance can be achieved by tweaking operation parameters to efficiently use GPU resources. The performance documents presents the tips that we think are most widely useful.
Browse >
DIGITS
The NVIDIA Deep Learning GPU Training System (DIGITS) can be used to rapidly train highly accurate DNNs for image classification, segmentation and object detection tasks.
Browse >
NVIDIA NeMo
NVIDIA NeMo is a flexible Python toolkit enabling data scientists and researchers to build state of the art speech and language deep learning models composed of reusable building blocks that can be safely connected together for conversational AI applications.
Browse >



Deep Learning SDK
The NVIDIA Deep Learning SDK offers powerful libraries such as cuDNN and NCCL for training, TensorRT and Triton Inference Server for inference, and DALI for data loading. Together, these tools and libraries with the use of mixed precision and Tensor Cores can be used to design and deploy GPU-accelerated deep learning applications.
Browse >



NVIDIA GPU Cloud
NVIDIA GPU Cloud (NGC) is a GPU-accelerated cloud platform optimized for deep learning and scientific computing. NGC empowers AI researchers with fast and easy access to performance-engineered deep learning framework containers, pre-integrated and optimized by NVIDIA.
Browse >
DGX Systems
NVIDIA DGX Systems provide integrated hardware, software, and tools for running GPU-accelerated, HPC applications such as deep learning, AI analytics, and interactive visualization.
Browse >