For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
DocumentationAPI Reference
DocumentationAPI Reference
  • Home
    • Welcome
  • About NeMo Curator
    • Overview
    • Key Features
  • Get Started
    • Overview
    • Install (All Modalities)
    • Text Quickstart
    • Image Quickstart
    • Video Quickstart
    • Audio Quickstart
  • Curate Text
    • Overview
    • Tutorials
    • Save and Export
  • Curate Images
    • Overview
    • Save and Export
  • Curate Video
    • Overview
    • Load Data
    • Save and Export
  • Curate Audio
    • Overview
    • Save and Export
  • Setup & Deployment
    • Overview
  • Reference
    • Overview
    • Related Tools
NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Your Privacy Choices | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.

LogoLogoNeMo Curator
On this page
  • NeMo Framework
  • Tokenization
  • Training Large Language Models
  • NeMo Aligner
  • NVIDIA AI Enterprise
  • Complete Workflow
Reference

NVIDIA AI Ecosystem: Related Tools

||View as Markdown|
Previous

Container Environments

After preparing your data with NeMo Curator, you’ll likely want to use it to train models. NVIDIA provides an integrated ecosystem of AI tools that work seamlessly with data prepared by NeMo Curator. This guide outlines the related tools for your next steps.

NeMo Framework

NVIDIA NeMo is an end-to-end framework for building, training, and fine-tuning GPU-accelerated language models. It provides:

  • Pretrained model checkpoints
  • Training and inference scripts
  • Optimization techniques for large-scale deployments

Tokenization

Tokenizers transform text into tokens that language models can interpret. NeMo Curator provides MegatronTokenizerWriter for tokenizing curated datasets and exporting them in the binary format required by Megatron-LM for pretraining. See the Save and Export documentation for details.

For training custom tokenizers (such as SentencePiece models), use NeMo Framework. Learn how to train a tokenizer using NeMo in the tokenizer training documentation.

Training Large Language Models

Pretraining a large language model involves running next-token prediction on large curated datasets, exactly the type that NeMo Curator helps you prepare. NeMo handles everything for pretraining large language models using your curated data.

Find comprehensive information on:

  • Pretraining methodologies
  • Model evaluation
  • Parameter-efficient fine-tuning (PEFT)
  • Distributed training

In the large language model section of the NeMo user guide.

NeMo Aligner

NVIDIA NeMo Aligner is a framework designed for aligning language models with human preferences.

After pretraining a large language model, aligning it allows you to interact with it in a chat-like setting. NeMo Aligner lets you take curated alignment data and use it to align a pretrained language model.

Learn about NeMo Aligner’s capabilities including:

  • Reinforcement Learning from Human Feedback (RLHF)
  • Direct Preference Optimization (DPO)
  • Proximal Policy Optimization (PPO)
  • Constitutional AI (CAI)

In the NeMo Aligner documentation.

NVIDIA AI Enterprise

For organizations looking to deploy trained models to production, NVIDIA AI Enterprise provides a software platform that includes enterprise support for:

  • The complete NeMo framework
  • Pretrained foundation models
  • Deployment and inference tools
  • Enterprise-grade security and support

Complete Workflow

A typical end-to-end workflow with NVIDIA’s AI tools includes:

  1. Data Preparation: Use NeMo Curator to clean, filter, and prepare your dataset
  2. Tokenization: Use NeMo Curator’s MegatronTokenizerWriter to tokenize curated data for Megatron-LM, or train a custom tokenizer with NeMo
  3. Model Training: Pretrain or fine-tune models with NeMo
  4. Alignment: Align models with human preferences using NeMo Aligner
  5. Deployment: Deploy models using NVIDIA AI Enterprise or Triton Inference Server

This integrated ecosystem allows you to move from raw data to deployed, production-ready models with consistent tooling and optimized performance.