Library Documentation#

NeMo, developed by NVIDIA, is a generative AI framework targeting researchers and developers who use PyTorch. Its core purpose is to provide a robust and scalable framework to facilitate the design and implementation of generative AI models. NeMo simplifies access to pre-existing code and pretrained models, helping users from both industry and academia accelerate their development processes. The developer guide offers extensive technical details regarding NeMo’s design, implementation, and optimizations.

NeMo AutoModel includes a suite of libraries and recipe collections that help users train models end to end. The AutoModel library (“NeMo AutoModel”) provides Day-0 GPU-accelerated PyTorch training for Hugging Face models. Users can start training and fine-tuning instantly with no conversion delays, and scale effortlessly using PyTorch-native parallelisms, optimized custom kernels, and memory-efficient recipes—all while preserving the original checkpoint format for smooth integration across the Hugging Face ecosystem.

NeMo Curator is a Python library composed of several scalable data-mining modules, specifically designed for curating Natural Language Processing (NLP) data to train large language models (LLMs). It enables NLP researchers to extract high-quality text from vast, uncurated web corpora efficiently, supporting the development of more accurate and powerful language models.

NeMo Eval is a comprehensive evaluation module under NeMo Framework for Large Language Models (LLMs). It provides seamless deployment and evaluation capabilities for models trained using NeMo Framework via state-of-the-art evaluation harnesses.

NeMo Export and Deploy provides tools and APIs for exporting and deploying NeMo and Hugging Face models to production environments. It supports various deployment paths including TensorRT, TensorRT-LLM, and vLLM deployment through NVIDIA Triton Inference Server.

NeMo RL is a scalable and efficient post-training library designed for models ranging from 1 GPU to thousands, and from tiny to over 100 billion parameters.

What you can expect:

Seamless integration with Hugging Face for ease of use, allowing users to leverage a wide range of pre-trained models and tools.
High-performance implementation with Megatron Core, supporting various parallelism techniques for large models (>100B) and large context lengths.
Efficient resource management using Ray, enabling scalable and flexible deployment across different hardware configurations.
Flexibility with a modular design that allows easy integration and customization.
Comprehensive documentation that is both detailed and user-friendly, with practical examples.