Nemotron-H
Nemotron-H
NVIDIA Nemotron-H is a hybrid Mamba-2 / transformer architecture that interleaves selective state space layers with standard attention layers for improved efficiency on long sequences.
Available Models
- NVIDIA-Nemotron-Nano-9B-v2: 9B hybrid model
- NVIDIA-Nemotron-Nano-12B-v2: 12B hybrid model
- NVIDIA-Nemotron-3-Nano-30B-A3B-BF16: 30B total, 3B activated (sparse MoE + Mamba-2)
Architecture
NemotronHForCausalLM
Example HF Models
Example Recipes
Try with NeMo AutoModel
1. Install (full instructions):
2. Clone the repo to get the example recipes:
3. Run the recipe from inside the repo:
Run with Docker
1. Pull the container and mount a checkpoint directory:
2. Navigate to the AutoModel directory (where the recipes are):
3. Run the recipe:
See the Installation Guide and LLM Fine-Tuning Guide.
Fine-Tuning
See the LLM Fine-Tuning Guide.