Important
NeMo 2.0 is an experimental feature and currently released in the dev container only: nvcr.io/nvidia/nemo:dev. Please refer to NeMo 2.0 overview for information on getting started.
Nemotron
Nemotron is a Large Language Model (LLM) that can be integrated into a synthetic data generation pipeline to produce training data, assisting researchers and developers in building their own LLMs.
The following examples use NeMo Framework Launcher, which provides a user-friendly interface to build end-to-end workflows for model development. To get started please follow the Installation Steps, and start the NeMo Framework container ensuring the necessary launcher and any data folders are mounted.
All the config scripts that you’ll use in the examples are located in NeMo-Framework-Launcher/launcher_scripts
.
We provide playbooks to showcase NeMo features including PEFT, SFT, and deployment with PTQ:
NeMo Framework PEFT with Llama2, Mixtral-8x7B and Nemotron 4 340B
NeMo Framework Post-Training Quantization (PTQ) with Nemotron4 and Llama3
Note
If you are using NeMo Framework container version <=24.05
, make sure to mount the latest NeMo-Framework-Launcher to have the correct Nemotron config for your workflow. See instructions below:
Clone the latest NeMo-Framework-Launcher:
git clone git@github.com:NVIDIA/NeMo-Framework-Launcher.git
Launch the docker container mounted with the above repository:
docker run --gpus all -it --rm --shm-size=4g -p 8000:8000 -v ${PWD}/NeMo-Framework-Launcher:NeMo-Framework-Launcher nvcr.io/nvidia/nemo:version
Feature |
Status |
---|---|
Data parallelism |
✓ |
Tensor parallelism |
✓ |
Pipeline parallelism |
✓ |
Interleaved Pipeline Parallelism Sched |
N/A |
Sequence parallelism |
✓ |
Selective activation checkpointing |
✓ |
Gradient checkpointing |
✓ |
Partial gradient checkpointing |
✓ |
FP32/TF32 |
✓ |
AMP/FP16 |
✗ |
BF16 |
✓ |
TransformerEngine/FP8 |
✗ |
Multi-GPU |
✓ |
Multi-Node |
✓ |
Inference |
✓ |
Slurm |
✓ |
Base Command Manager |
✓ |
Base Command Platform |
✓ |
Distributed data preprcessing |
✓ |
NVfuser |
✗ |
P-Tuning and Prompt Tuning |
✓ |
IA3 and Adapter learning |
✓ |
Distributed Optimizer |
✓ |
Distributed Checkpoint |
✓ |
Fully Shared Data Parallel |
N/A |