Important
NeMo 2.0 is an experimental feature and currently released in the dev container only: nvcr.io/nvidia/nemo:dev. Please refer to NeMo 2.0 overview for information on getting started.
Playbooks
Overview
The NeMo Framework playbooks demonstrate how to use the NeMo Framework container to fine-tune Large Language Models (LLMs) with different data sets. The information includes how to:
Set up your infrastructure to use the playbooks with DGX Cloud and Kubernetes.
Pre-process, train, validate, test, and run fine-tuning scripts on state-of-the-art models like Llama-2/3, Mixtral 8x7B, and Mistral 7B LLMs.
Apply Supervised Fine-Tuning (SFT) and Parameter-Efficient Fine-Tuning (PEFT) techniques to the databricks-dolly-15k and PubMedQA datasets.
Set up and launch foundation model pre-training in your infrastructure.
Deploy LLMs trained with NeMo Framework using NVIDIA NIM, or export them for deployment with TensorRT-LLM.
Infrastructure Setup
The Run NeMo Framework on DGX Cloud playbook focuses on preparing a dataset and pre-training a foundational model with NeMo Framework on DGX Cloud. The playbook covers essential aspects of DGX Cloud, such as uploading containers, creating workspaces, mounting workspaces, launching jobs, and pre-training a model.
The Run NeMo Framework on Kubernetes playbook demonstrates deploying and managing NeMo using Kubernetes. The playbook covers setting up a cluster, installing NeMo Framework, preparing data, and training the model.
Model Alignment and Customization
The NeMo Framework SFT with Llama-2 playbook shows how to fine-tune Llama-2 models of various sizes using SFT against the databricks-dolly-15k dataset. It demonstrates data preprocessing, training, validation, testing, and running the fine-tuning scripts included in NeMo Framework. It also shows how to run inference against the fine-tuned model.
The NeMo Framework SFT with Mistral 7B playbook shows how to fine-tune the Mistral 7B model using SFT against the databricks-dolly-15k dataset. It demonstrates data preprocessing, training, validation, testing, and running the fine-tuning scripts included in NeMo Framework.
The NeMo Framework SFT playbook shows how to fine-tune Mixtral 8x7B and Nemotron-4 340B using SFT against the databricks-dolly-15k dataset. It demonstrates data preprocessing, training, validation, testing, and running the fine-tuning scripts included in NeMo Framework. It also shows how to run inference against the fine-tuned model.
The NeMo Framework PEFT with Mistral 7B playbook shows how to fine-tune the Mistral 7B model using PEFT against the PubMedQA dataset. It demonstrates data preprocessing, training, validation, testing, and running the fine-tuning scripts included in NeMo Framework. It also shows how to run inference against the fine-tuned model.
The NeMo Framework PEFT playbook shows how to fine-tune Mixtral 8x7B, Llama-2, and Nemotron-4 340B models of various sizes using PEFT against the PubMedQA dataset. It demonstrates data preprocessing, training, validation, testing, and running the fine-tuning scripts included in NeMo Framework.
The NeMo Framework LoRA Fine-tuning and NVIDIA NIM Deployment with Llama-3 tutorial shows an end-to-end PEFT pipeline, including LoRA-tuning a Llama-3 8B model against the PubMedQA dataset and deploying multiple LoRA adapters with NVIDIA NIM.
The NeMo Framework Law-Domain LoRA Fine-tuning on Synthetic Data and NVIDIA NIM Deployment with Llama-3.1 tutorial shows how to perform LoRA PEFT on Llama-3.1 8B Instruct model using a synthetically augmented version of the Law StackExchange dataset. It leverages the data curated from the NeMo Curator tutorial that demonstrates various filtering and processing operations on the records to improve data quality for PEFT of LLMs.
Pre-training
The NeMo Framework Foundation Model Pre-training playbook focuses on successfully launching a foundation model pre-training job on your infrastructure and getting the necessary training artifacts as the output of the successful runs. It demonstrates how to execute the workflow of pre-training foundation models using NeMo Framework and the Pile dataset, as well as producing checkpoints, logs, and event files.
The NeMo Framework AutoConfigurator playbook demonstrates how to use NeMo Framework AutoConfigurator to determine the optimal model size for a given compute and training budget. Then, it shows how to produce optimal foundation model pre-training and inference configurations to achieve the highest throughput runs. It specifically focuses on automating the configuration process for NeMo, such as autoconfiguration, parameter tuning, and optimization to streamline setup.
The NeMo Framework Single Node Pre-training playbook shows how to pre-train a simple GPT-style model using consumer hardware.
Data Curation
The Distributed Data Classification notebook showcases how to use NeMo Curator with two distinct classifiers: one for evaluating data quality and another for identifying data domains. The integration of these classifiers streamlines the annotation process, thereby enhancing the combination of diverse datasets essential for the training of foundational models.
The PEFT Curation tutorial demonstrates how to use the NeMo Curator Python API to curate a dataset for PEFT. Specifically, it uses the Enron dataset, which contains emails along with classification labels. Each email entry includes a subject, body, and category (class label). Throughout the tutorial, different filtering and processing operations are demonstrated, which can be applied to each record.
The Single Node notebook provides a typical data curation pipeline using NeMo Curator, with the Thai Wikipedia dataset as an example. It includes demonstrations of how to download Wikipedia data using NeMo Curator, perform language separation using FastText, apply GPU-based exact deduplication and fuzzy deduplication, and utilize CPU-based heuristic filtering.
The Tinystories tutorial shows how to use the NeMo Curator Python API to curate the TinyStories dataset. TinyStories is a dataset of short stories generated by GPT-3.5 and GPT-4, featuring words that are understood by 3 to 4-year olds. The small size of this dataset makes it ideal for creating and validating data curation pipelines on a local machine.
The Curating Datasets for Parameter Efficient Fine-tuning with Synthetic Data Generation tutorial demonstrates the usage of NeMo Curator’s Python API for data curation as well as synthetic data generation, and qualitative score assignment to prepare a dataset for PEFT of LLMs.
Deployment
The Post-Training Quantization (PTQ) with Nemotron-4 and Llama-3 playbook shows how to apply PTQ to the Nemotron-4 340B and Llama-3 70B LLMs, enabling export to NVIDIA TensorRT-LLM, and deployment via PyTriton in FP8 precision.
The Quantization Aware Training (QAT) for Llama-2 SFT Model playbook demonstrates how to perform QAT on a SFT model. The playbook is an extension to the Llama2 SFT and Post-training Quantization playbooks. It shows how to quantize an SFT model using PTQ, run SFT on the quantized model again (QAT), and deploy it with TensorRT-LLM.
Playbooks
- Run NeMo Framework on DGX Cloud
- Run NeMo Framework on Kubernetes
- NeMo Framework SFT with Llama 2
- NeMo Framework SFT with Mistral-7B
- NeMo Framework SFT with Mixtral-8x7B and Nemotron 4 340B
- NeMo Framework PEFT with Mistral-7B
- NeMo Framework PEFT with Llama2, Mixtral-8x7B, and Nemotron 4 340B
- NeMo Framework Foundation Model Pre-training
- NeMo Framework AutoConfigurator
- NeMo Framework Single Node Pre-training
- NeMo Framework Post-Training Quantization (PTQ) with Nemotron4 and Llama3
- NeMo Framework Quantization Aware Training (QAT) for Llama2 SFT Model