For AI agents: a documentation index is available at the root level at /llms.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
LogoLogoNeMo AutoModel
    • Home
  • Get Started
    • About NeMo AutoModel
    • Key Features and Concepts
    • Install NeMo AutoModel
    • YAML Configuration
    • 🤗 Transformers API Compatibility
    • Repository Structure
  • Announcements
    • Announcements
  • NeMo AutoModel Performance
    • Performance Summary
  • Model Coverage
    • Model Coverage Overview
    • Model Release Log
  • Recipes & E2E Examples
    • Recipes and End-to-End Examples
    • Supervised Fine-Tuning (SFT) and Parameter-Efficient Fine-Tuning (PEFT) with NeMo AutoModel
    • Function Calling with NeMo AutoModel using FunctionGemma
    • Knowledge Distillation with NeMo AutoModel
    • Fine-Tune Large MoE LLMs
    • DeepSeek V4 Flash
    • Hy3-preview
    • Pretraining Megatron Core Datasets with NeMo AutoModel
    • LLM Pre-Training with NeMo AutoModel
    • Sequence Classification (SFT/PEFT) with NeMo AutoModel
    • Fine-Tune Gemma 3 and Gemma 3n
    • Fine-Tuning Gemma 4 31B on CORD-v2 Receipts — End-to-End Guide
    • Fine-Tune Qwen3.5-VL
    • Nemotron-Omni
    • Mistral Medium 3.5 VL
    • Diffusion Model Fine-Tuning with NeMo AutoModel
    • dLLM Fine-Tuning
    • Quantization-Aware Training (QAT) in NeMo Automodel
    • Model Training on Databricks
  • Datasets
    • Dataset Overview: LLM, VLM, and Retrieval Datasets in NeMo AutoModel
    • Integrate Your Own Text Dataset
    • Retrieval Dataset (Embedding Fine-tuning)
    • Use the ColumnMappedTextInstructionDataset
    • Use the ColumnMappedTextInstructionIterableDataset (Streaming)
    • Integrate Your Own Multi-Modal Dataset
    • Diffusion Dataset Preparation
  • Job Launchers
    • Job Launchers
    • Run on Your Local Workstation
    • Run on a Cluster
    • Run with NeMo-Run
    • Run on Any Cloud with SkyPilot
    • SkyPilot k8s
  • Development
    • Checkpointing in NeMo Automodel
    • Gradient (Activation) Checkpointing in NeMo AutoModel
    • Pipeline Parallelism with AutoPipeline
    • FP8 Training in NeMo AutoModel
    • MLflow Logging in NeMo AutoModel
    • Breaking Changes
      • Overview
          • Nemo Automodel
            • Autonvtx
            • Cli
            • Components
              • Attention
              • Checkpoint
              • Config
              • Datasets
              • Distributed
              • Eval
              • Flow Matching
              • Launcher
              • Loggers
              • Loss
              • Models
                • Bagel
                • Baichuan
                • Common
                • Deepseek V3
                • Deepseek V32
                • Deepseek V4
                • Diffusion Gemma
                • Ernie4 5
                • Gemma4 Drafter
                • Gemma4 Moe
                • Glm Moe Dsa
                • Glm4 Moe
                • Glm4 Moe Lite
                • Gpt Oss
                • Gpt2
                • Hy Mt2
                • Hy V3
                • Kimi K25 Vl
                • Kimivl
                • Ling V2
                • Llama
                • Llama Bidirectional
                • Llava Onevision
                • Mimo V2 Flash
                • Minimax M2
                • Minimax M3 Vl
                • Ministral Bidirectional
                • Mistral3
                • Mistral3 Vlm
                • Mistral4
                • Nemotron Omni
                • Nemotron Parse
                • Nemotron V3
                • Qwen2
                • Qwen2 5 Omni
                • Qwen3 5
                • Qwen3 5 Moe
                • Qwen3 Moe
                • Qwen3 Next
                • Qwen3 Omni Moe
                • Qwen3 Vl Moe
                • Step3p5
                • Step3p7
              • Moe
              • Optim
              • Quantization
              • Speculative
              • Training
              • Utils
            • Package Info
            • Shared
  • Home
  • About NeMo AutoModel
  • Key Features and Concepts
  • Install NeMo AutoModel
  • YAML Configuration
  • 🤗 Transformers API Compatibility
  • Repository Structure
  • Announcements
  • Performance Summary
  • Model Coverage Overview
  • Model Release Log
  • Overview
  • Llama
  • Gemma
  • Qwen2
  • Qwen2 MoE
  • Qwen3
  • Qwen3 MoE
  • Qwen3-Next
  • DeepSeek
  • DeepSeek-V3
  • DeepSeek-V4 Flash
  • Mistral
  • Mixtral
  • Ministral3 / Devstral
  • Phi
  • Phi-3 / Phi-4
  • Phi-3-Small
  • Nemotron / Minitron
  • Nemotron-H
  • Nemotron-Flash
  • Nemotron-Super (Llama-3.3-Nemotron-Super-49B)
  • ChatGLM
  • GLM-4
  • GLM-4 MoE (GLM-4.5 / GLM-4.7)
  • GLM-5 MoE (DSA)
  • Granite
  • Granite MoE
  • Bamba
  • OLMo
  • OLMo2
  • OLMoE
  • GPT-OSS
  • GPT-2
  • GPT-J
  • GPT-NeoX / Pythia
  • StarCoder
  • StarCoder2
  • Aquila / Aquila2
  • Baichuan / Baichuan2
  • Command-R
  • Falcon
  • EXAONE
  • InternLM
  • Jais
  • MiniMax-M2
  • MiniCPM
  • Moonlight
  • Seed (ByteDance)
  • Solar Pro
  • Orion
  • StableLM
  • Step-3.5
  • GritLM
  • Hy3-preview
  • Overview
  • Kimi-VL
  • Gemma 3 VL / Gemma 3n
  • Gemma 4
  • Qwen2.5-VL
  • Qwen3-VL / Qwen3-VL-MoE
  • Qwen3.5-VL
  • Nemotron-Parse
  • Ministral3 VL
  • Mistral Medium 3.5
  • Mistral-Small-4
  • InternVL
  • Llama 4
  • LLaVA-OneVision
  • SmolVLM
  • LLaVA
  • Overview
  • Qwen3-Omni
  • Phi-4-multimodal
  • Nemotron-Omni
  • Overview
  • Wan 2.1 T2V
  • FLUX.1-dev
  • HunyuanVideo 1.5
  • Qwen-Image
  • Recipes and End-to-End Examples
  • Supervised Fine-Tuning (SFT) and Parameter-Efficient Fine-Tuning (PEFT) with NeMo AutoModel
  • Function Calling with NeMo AutoModel using FunctionGemma
  • Knowledge Distillation with NeMo AutoModel
  • Fine-Tune Large MoE LLMs
  • DeepSeek V4 Flash
  • Hy3-preview
  • Pretraining Megatron Core Datasets with NeMo AutoModel
  • LLM Pre-Training with NeMo AutoModel
  • Sequence Classification (SFT/PEFT) with NeMo AutoModel
  • Fine-Tune Gemma 3 and Gemma 3n
  • Fine-Tuning Gemma 4 31B on CORD-v2 Receipts — End-to-End Guide
  • Fine-Tune Qwen3.5-VL
  • Nemotron-Omni
  • Mistral Medium 3.5 VL
  • Diffusion Model Fine-Tuning with NeMo AutoModel
  • dLLM Fine-Tuning
  • Quantization-Aware Training (QAT) in NeMo Automodel
  • Model Training on Databricks
  • Dataset Overview: LLM, VLM, and Retrieval Datasets in NeMo AutoModel
  • Integrate Your Own Text Dataset
  • Retrieval Dataset (Embedding Fine-tuning)
  • Use the ColumnMappedTextInstructionDataset
  • Use the ColumnMappedTextInstructionIterableDataset (Streaming)
  • Integrate Your Own Multi-Modal Dataset
  • Diffusion Dataset Preparation
  • Job Launchers
  • Run on Your Local Workstation
  • Run on a Cluster
  • Run with NeMo-Run
  • Run on Any Cloud with SkyPilot
  • SkyPilot k8s
  • Checkpointing in NeMo Automodel
  • Gradient (Activation) Checkpointing in NeMo AutoModel
  • Pipeline Parallelism with AutoPipeline
  • FP8 Training in NeMo AutoModel
  • MLflow Logging in NeMo AutoModel
  • Breaking Changes
  • Overview
  • Nemo Automodel
  • Autonvtx
  • Cli
  • App
  • Query Capabilities
  • Utils
  • Components
  • Attention
  • Dflash Mask
  • Flex Attention
  • Utils
  • Checkpoint
  • Addons
  • Checkpointing
  • Config
  • Conversion Mapping
  • State Dict Adapter
  • Stateful Wrappers
  • Utils
  • Config
  • Loader
  • Datasets
  • Audio
  • Collate Fns
  • Datasets
  • Multi En
  • Diffusion
  • Base Dataset
  • Collate Fns
  • Meta Files Dataset
  • Mock Dataloader
  • Multi Tier Bucketing
  • Sampler
  • Text To Image Dataset
  • Text To Video Dataset
  • Dllm
  • Collate
  • Corruption
  • Lazy Mapped Dataset
  • Llm
  • Agent Chat
  • Chat Dataset
  • Column Mapped Text Instruction Dataset
  • Column Mapped Text Instruction Iterable Dataset
  • Delta Lake Dataset
  • Eagle3
  • Eagle3 Cache
  • Formatting Utils
  • Hellaswag
  • Length Grouped Sampler
  • Megatron
  • Builder
  • Gpt Dataset
  • Helpers
  • Indexed Dataset
  • Megatron Utils
  • Sampler
  • Megatron Dataset
  • Mock
  • Mock Iterable Dataset
  • Mock Packed
  • Mock Prefix Tree
  • Mock Seq Cls
  • Nanogpt Dataset
  • Neat Packing
  • Packed Sequence
  • Prefix Tree
  • Retrieval Collator
  • Retrieval Dataset
  • Retrieval Dataset Inline
  • Seq Cls
  • Squad
  • Xlam
  • Multimodal
  • Collate Fns
  • Datasets
  • Distributed Iterable
  • Interleave
  • Packing
  • Parquet Utils
  • Transforms
  • Utils
  • Video
  • Reservoir Sampler
  • Utils
  • Vlm
  • Collate Fns
  • Datasets
  • Fake Image
  • Mock
  • Neat Packing Vlm
  • Pp Media
  • Samplers
  • Utils
  • Distributed
  • Activation Checkpointing
  • Config
  • Cp Utils
  • Ddp
  • Fsdp2
  • Grad Utils
  • Init Utils
  • Magi Attn Utils
  • Mamba Cp
  • Megatron Fsdp
  • Mesh
  • Mesh Utils
  • Optimized Tp Plans
  • Parallel Styles
  • Parallelizer
  • Parallelizer Utils
  • Pipelining
  • Autopipeline
  • Config
  • Functional
  • Hf Utils
  • Tensor Utils
  • Thd Utils
  • Utils
  • Eval
  • Tool Call Evaluator
  • Tool Call Parser
  • Flow Matching
  • Adapters
  • Base
  • Flux
  • Flux2
  • Hunyuan
  • Qwen Image
  • Simple
  • Pipeline
  • Time Shift Utils
  • Launcher
  • Base
  • Interactive
  • Nemo Run
  • Config
  • Launcher
  • Utils
  • Skypilot
  • Config
  • Launcher
  • Utils
  • Loggers
  • Comet Utils
  • Log Utils
  • Loggers
  • Metric Logger
  • Mlflow Utils
  • Wandb Utils
  • Loss
  • Chunked Ce
  • Dllm Loss
  • Kd Loss
  • Linear Ce
  • Loss
  • Masked Ce
  • Mtp
  • Soft Ce
  • Te Parallel Ce
  • Utils
  • Models
  • Bagel
  • Attention Masks
  • Autoencoder
  • Configuration
  • Connector
  • Embeddings
  • Hf Backbone Loader
  • Model
  • Modeling Qwen2 Packed
  • Modeling Siglip Navit
  • State Dict Adapter
  • Baichuan
  • Configuration
  • Model
  • Common
  • Bidirectional
  • Gated Delta Net Fp32
  • Hf Checkpointing Mixin
  • Inbatch Neg Utils
  • Mtp
  • Mtp
  • Packing
  • Utils
  • Deepseek V3
  • Layers
  • Model
  • Rope Utils
  • State Dict Adapter
  • Deepseek V32
  • Config
  • Layers
  • Model
  • State Dict Adapter
  • Deepseek V4
  • Config
  • Cp
  • Fsdp
  • Kernels
  • Sparse Attention
  • Tilelang Indexer
  • Tilelang Indexer Bwd
  • Tilelang Indexer Fwd
  • Tilelang Sparse Mla Bwd
  • Tilelang Sparse Mla Fwd
  • Layers
  • Model
  • Mtp
  • Optimized Kernels
  • State Dict Adapter
  • Diffusion Gemma
  • Attention Mask
  • Fsdp
  • Layers
  • Model
  • State Dict Adapter
  • Ernie4 5
  • Model
  • Rope Utils
  • State Dict Adapter
  • Gemma4 Drafter
  • Composite
  • Model
  • Gemma4 Moe
  • Cp Attention
  • Cp Batch
  • Model
  • State Dict Adapter
  • Glm Moe Dsa
  • Layers
  • Model
  • State Dict Adapter
  • Glm4 Moe
  • Layers
  • Model
  • State Dict Adapter
  • Glm4 Moe Lite
  • Model
  • Gpt Oss
  • Layers
  • Model
  • Rope Utils
  • State Dict Adapter
  • Gpt2
  • Hy Mt2
  • Config
  • Dispatch
  • Layers
  • Model
  • State Dict Adapter
  • Hy V3
  • Config
  • Layers
  • Model
  • State Dict Adapter
  • Kimi K25 Vl
  • Model
  • State Dict Adapter
  • Kimivl
  • Model
  • Ling V2
  • Config
  • Layers
  • Model
  • State Dict Adapter
  • Llama
  • Model
  • Rope Utils
  • State Dict Adapter
  • Llama Bidirectional
  • Export Onnx
  • Model
  • Llava Onevision
  • Model
  • Rice Vit
  • State Dict Adapter
  • Mimo V2 Flash
  • Config
  • Model
  • State Dict Adapter
  • Minimax M2
  • Layers
  • Model
  • State Dict Adapter
  • Minimax M3 Vl
  • Config
  • Layers
  • Model
  • Mtp
  • Processing
  • State Dict Adapter
  • Vision Encoder
  • Ministral Bidirectional
  • Model
  • Mistral3
  • Model
  • Mistral3 Vlm
  • Model
  • State Dict Adapter
  • Mistral4
  • Configuration
  • Model
  • State Dict Adapter
  • Nemotron Omni
  • Model
  • State Dict Adapter
  • Nemotron Parse
  • Model
  • Nemotron Parse Loss
  • Nemotron V3
  • Cache
  • Layers
  • Model
  • Mtp
  • State Dict Adapter
  • Qwen2
  • Model
  • State Dict Adapter
  • Qwen2 5 Omni
  • Model
  • State Dict Adapter
  • Qwen3 5
  • Model
  • State Dict Adapter
  • Qwen3 5 Moe
  • Cp Linear Attn
  • Model
  • State Dict Adapter
  • Qwen3 Moe
  • Layers
  • Model
  • State Dict Adapter
  • Qwen3 Next
  • Layers
  • Model
  • State Dict Adapter
  • Qwen3 Omni Moe
  • Model
  • State Dict Adapter
  • Qwen3 Vl Moe
  • Model
  • State Dict Adapter
  • Step3p5
  • Layers
  • Model
  • State Dict Adapter
  • Step3p7
  • Configuration Step3p7
  • Model
  • Mtp
  • Processing Step3
  • State Dict Adapter
  • Vision Encoder
  • Moe
  • Config
  • Experts
  • Fsdp Mixin
  • Layers
  • Load Balance Metrics
  • Megatron
  • Fused A2a
  • Fused Indices Converter
  • Moe Utils
  • Token Dispatcher
  • Mxfp8
  • Parallelizer
  • State Dict Mixin
  • State Dict Utils
  • Uccl Ep
  • Buffer
  • Optim
  • Dion
  • Optimizer
  • Precision Warnings
  • Scheduler
  • Quantization
  • Fp8
  • Qat
  • Qlora
  • Speculative
  • Bench Sglang
  • Dflash
  • Core
  • Draft Qwen3
  • Registry
  • Target
  • Eagle
  • Backend
  • Core
  • Core V12
  • Draft Gpt Oss
  • Draft Llama
  • Draft Llama V12
  • Peagle Attention
  • Peagle Data
  • Peagle Draft
  • Peagle Trainer
  • Registry
  • Remote
  • Client
  • Protocol
  • Server
  • Transport
  • Wire
  • Target
  • Target V12
  • Precompute Eagle3
  • Regenerate
  • Serve Sglang
  • Serve Target
  • Training
  • Ema
  • Garbage Collection
  • Model Output Utils
  • Neftune
  • Rng
  • Signal Handler
  • Step Scheduler
  • Timers
  • Utils
  • Utils
  • Compile Utils
  • Flops Utils
  • Model Utils
  • Yaml Utils
  • Package Info
  • Shared
  • Import Utils
  • Te Patches
  • Torch Patches
  • Transformers Patches
  • Utils
On this page
  • Submodules
DevelopmentAPI ReferenceFull Library ReferenceNemo AutomodelNemo AutomodelComponentsModels

nemo_automodel.components.models.nemotron_omni

||View as Markdown|

Submodules

  • nemo_automodel.components.models.nemotron_omni.model
  • nemo_automodel.components.models.nemotron_omni.state_dict_adapter
Previous

nemo_automodel.components.models.mistral4.state_dict_adapter

Next

nemo_automodel.components.models.nemotron_omni.model

NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Your Privacy Choices | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.