For AI agents: a documentation index is available at the root level at /llms.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
LogoLogoNeMo AutoModel
    • Home
  • Get Started
    • About
    • Key Features
    • Install NeMo AutoModel
    • Configuration
    • 🤗 HF Compatibility
    • Repo Structure
  • What's New
    • Announcements
    • Release Notes
    • Model Release Log
  • Performance
    • Performance Summary
  • Model Coverage
    • Overview
    • Release Log
  • Recipes & E2E Examples
    • Recipes and End-to-End Examples
    • SFT & PEFT
    • Function Calling with FunctionGemma
    • Multi-Turn Agent (Tool-Calling) SFT
    • Knowledge Distillation
    • Fine-Tune Large MoE LLMs
    • DeepSeek V4 Flash
    • Hy3-preview
    • Nemotron-3-Ultra-550B
    • Pretraining
    • NanoGPT Pretraining
    • Sequence Classification (SFT/PEFT)
    • Retrieval Fine-Tuning
    • Gemma 3 / 3n
    • Gemma 4 31B
    • Qwen3.5-VL
    • Nemotron-Omni
    • Mistral Medium 3.5 VL
    • Fine-Tune Step-3.7-Flash
    • ASR with Qwen3-Omni
    • Wan2.1-T2V Fine-Tuning
    • Fine-Tuning DiffusionGemma
    • Train an EAGLE Drafter for Speculative Decoding
    • dLLM Fine-Tuning
    • Quantization-Aware Training (QAT)
    • Model Training on Databricks
  • Data
    • Overview
    • Text Dataset
    • Retrieval Dataset
    • ColumnMapped Dataset
    • ColumnMapped Iterable
    • Multi-Modal Dataset
    • Diffusion Dataset
  • Run Jobs
    • Overview
    • Local Workstation
    • SLURM Cluster
    • NeMo Run
    • SkyPilot
    • k8s with SkyPilot
  • Advanced Training
    • Checkpointing
    • Gradient Checkpointing
    • Pipeline Parallelism
    • FP8 Training
    • Mixed-Precision Training
    • MLflow Logging
  • Reference
    • API Reference
        • Nemo Automodel
          • Autonvtx
          • Cli
          • Components
            • Attention
            • Checkpoint
            • Config
            • Datasets
            • Distributed
            • Eval
            • Flow Matching
            • Launcher
            • Loggers
            • Loss
            • Models
              • Bagel
              • Baichuan
              • Common
              • Deepseek V3
              • Deepseek V32
              • Deepseek V4
              • Diffusion Gemma
              • Ernie4 5
              • Gemma4 Drafter
              • Gemma4 Moe
              • Glm Moe Dsa
              • Glm4 Moe
              • Glm4 Moe Lite
              • Gpt Oss
              • Gpt2
              • Hy Mt2
              • Hy V3
              • Kimi K25 Vl
              • Kimivl
              • Ling V2
              • Llama
              • Llama Bidirectional
              • Llava Onevision
              • Mimo V2 Flash
              • Minimax M2
              • Minimax M3 Vl
              • Ministral Bidirectional
              • Mistral3
              • Mistral3 Vlm
              • Mistral4
              • Nemotron Omni
              • Nemotron Parse
              • Nemotron V3
              • Qwen2
              • Qwen2 5 Omni
              • Qwen3 5
              • Qwen3 5 Moe
              • Qwen3 Moe
              • Qwen3 Next
              • Qwen3 Omni Moe
              • Qwen3 Vl Moe
              • Step3p5
              • Step3p7
            • Moe
            • Optim
            • Quantization
            • Speculative
            • Training
            • Utils
          • Package Info
          • Shared
  • Home
  • About
  • Key Features
  • Install NeMo AutoModel
  • Configuration
  • 🤗 HF Compatibility
  • Repo Structure
  • Announcements
  • Release Notes
  • Model Release Log
  • Performance Summary
  • Overview
  • Release Log
  • Overview
  • Llama
  • Gemma
  • Qwen2
  • Qwen2 MoE
  • Qwen3
  • Qwen3 MoE
  • Qwen3-Next
  • ERNIE 4.5
  • DeepSeek
  • DeepSeek-V3
  • DeepSeek-V4 Flash
  • Mistral
  • Mixtral
  • Ministral3 / Devstral
  • Phi
  • Phi-3 / Phi-4
  • Phi-3-Small
  • Nemotron / Minitron
  • Nemotron-H
  • Nemotron-Flash
  • Nemotron-Super (Llama-3.3-Nemotron-Super-49B)
  • ChatGLM
  • GLM-4
  • GLM-4 MoE (GLM-4.5 / GLM-4.7)
  • GLM-5 MoE (DSA)
  • Granite
  • Granite MoE
  • Bamba
  • OLMo
  • OLMo2
  • OLMoE
  • GPT-OSS
  • GPT-2
  • GPT-J
  • GPT-NeoX / Pythia
  • StarCoder
  • StarCoder2
  • Aquila / Aquila2
  • Baichuan / Baichuan2
  • Command-R
  • Falcon
  • EXAONE
  • InternLM
  • Jais
  • MiniMax-M2
  • MiniCPM
  • Moonlight
  • Seed (ByteDance)
  • Solar Pro
  • Orion
  • StableLM
  • Step-3.5
  • GritLM
  • Hy3-preview
  • Hy-MT2
  • MiMo-V2-Flash
  • Ling 2.0
  • Overview
  • Kimi-VL
  • Gemma 3 VL / Gemma 3n
  • Gemma 4
  • Qwen2.5-VL
  • Qwen3-VL / Qwen3-VL-MoE
  • Qwen3.5-VL
  • Nemotron-Parse
  • Ministral3 VL
  • Mistral Medium 3.5
  • Mistral-Small-4
  • InternVL
  • Llama 4
  • LLaVA-OneVision
  • SmolVLM
  • LLaVA
  • Step-3.7-Flash
  • Overview
  • BAGEL
  • Overview
  • Qwen3-Omni
  • Qwen2.5-Omni
  • Phi-4-multimodal
  • Nemotron-Omni
  • Overview
  • DiffusionGemma
  • Overview
  • Wan 2.1 T2V
  • Wan 2.2 T2V-A14B
  • FLUX.1-dev
  • HunyuanVideo 1.5
  • Qwen-Image
  • Overview
  • Llama (Bidirectional)
  • Ministral3 (Bidirectional)
  • Overview
  • Llama (Bidirectional)
  • Recipes and End-to-End Examples
  • SFT & PEFT
  • Function Calling with FunctionGemma
  • Multi-Turn Agent (Tool-Calling) SFT
  • Knowledge Distillation
  • Fine-Tune Large MoE LLMs
  • DeepSeek V4 Flash
  • Hy3-preview
  • Nemotron-3-Ultra-550B
  • Pretraining
  • NanoGPT Pretraining
  • Sequence Classification (SFT/PEFT)
  • Retrieval Fine-Tuning
  • Gemma 3 / 3n
  • Gemma 4 31B
  • Qwen3.5-VL
  • Nemotron-Omni
  • Mistral Medium 3.5 VL
  • Fine-Tune Step-3.7-Flash
  • ASR with Qwen3-Omni
  • Wan2.1-T2V Fine-Tuning
  • Fine-Tuning DiffusionGemma
  • Train an EAGLE Drafter for Speculative Decoding
  • dLLM Fine-Tuning
  • Quantization-Aware Training (QAT)
  • Model Training on Databricks
  • Overview
  • Text Dataset
  • Retrieval Dataset
  • ColumnMapped Dataset
  • ColumnMapped Iterable
  • Multi-Modal Dataset
  • Diffusion Dataset
  • Overview
  • Local Workstation
  • SLURM Cluster
  • NeMo Run
  • SkyPilot
  • k8s with SkyPilot
  • Checkpointing
  • Gradient Checkpointing
  • Pipeline Parallelism
  • FP8 Training
  • Mixed-Precision Training
  • MLflow Logging
  • API Reference
  • Nemo Automodel
  • Autonvtx
  • Cli
  • App
  • Query Capabilities
  • Utils
  • Components
  • Attention
  • Dflash Mask
  • Flex Attention
  • Utils
  • Checkpoint
  • Addons
  • Checkpointing
  • Config
  • Conversion Mapping
  • State Dict Adapter
  • Stateful Wrappers
  • Utils
  • Config
  • Loader
  • Datasets
  • Audio
  • Collate Fns
  • Datasets
  • Multi En
  • Diffusion
  • Base Dataset
  • Collate Fns
  • Meta Files Dataset
  • Mock Dataloader
  • Multi Tier Bucketing
  • Sampler
  • Text To Image Dataset
  • Text To Video Dataset
  • Dllm
  • Collate
  • Corruption
  • Lazy Mapped Dataset
  • Llm
  • Agent Chat
  • Chat Dataset
  • Column Mapped Text Instruction Dataset
  • Column Mapped Text Instruction Iterable Dataset
  • Delta Lake Dataset
  • Eagle3
  • Eagle3 Cache
  • Formatting Utils
  • Hellaswag
  • Length Grouped Sampler
  • Megatron
  • Builder
  • Gpt Dataset
  • Helpers
  • Indexed Dataset
  • Megatron Utils
  • Sampler
  • Megatron Dataset
  • Mock
  • Mock Iterable Dataset
  • Mock Packed
  • Mock Prefix Tree
  • Mock Seq Cls
  • Nanogpt Dataset
  • Neat Packing
  • Packed Sequence
  • Prefix Tree
  • Retrieval Collator
  • Retrieval Dataset
  • Retrieval Dataset Inline
  • Seq Cls
  • Squad
  • Xlam
  • Multimodal
  • Collate Fns
  • Datasets
  • Distributed Iterable
  • Interleave
  • Packing
  • Parquet Utils
  • Transforms
  • Utils
  • Video
  • Reservoir Sampler
  • Utils
  • Vlm
  • Collate Fns
  • Datasets
  • Fake Image
  • Mock
  • Neat Packing Vlm
  • Pp Media
  • Samplers
  • Utils
  • Distributed
  • Activation Checkpointing
  • Config
  • Cp Utils
  • Ddp
  • Fsdp2
  • Grad Utils
  • Init Utils
  • Magi Attn Utils
  • Mamba Cp
  • Megatron Fsdp
  • Mesh
  • Mesh Utils
  • Optimized Tp Plans
  • Parallel Styles
  • Parallelizer
  • Parallelizer Utils
  • Pipelining
  • Autopipeline
  • Config
  • Functional
  • Hf Utils
  • Tensor Utils
  • Thd Utils
  • Utils
  • Eval
  • Tool Call Evaluator
  • Tool Call Parser
  • Flow Matching
  • Adapters
  • Base
  • Flux
  • Flux2
  • Hunyuan
  • Qwen Image
  • Simple
  • Pipeline
  • Time Shift Utils
  • Launcher
  • Base
  • Interactive
  • Nemo Run
  • Config
  • Launcher
  • Utils
  • Skypilot
  • Config
  • Launcher
  • Utils
  • Loggers
  • Comet Utils
  • Log Utils
  • Loggers
  • Metric Logger
  • Mlflow Utils
  • Wandb Utils
  • Loss
  • Chunked Ce
  • Dllm Loss
  • Kd Loss
  • Linear Ce
  • Loss
  • Masked Ce
  • Mtp
  • Soft Ce
  • Te Parallel Ce
  • Utils
  • Models
  • Bagel
  • Attention Masks
  • Autoencoder
  • Configuration
  • Connector
  • Embeddings
  • Hf Backbone Loader
  • Model
  • Modeling Qwen2 Packed
  • Modeling Siglip Navit
  • State Dict Adapter
  • Baichuan
  • Configuration
  • Model
  • Common
  • Bidirectional
  • Gated Delta Net Fp32
  • Hf Checkpointing Mixin
  • Inbatch Neg Utils
  • Mtp
  • Mtp
  • Packing
  • Utils
  • Deepseek V3
  • Layers
  • Model
  • Rope Utils
  • State Dict Adapter
  • Deepseek V32
  • Config
  • Layers
  • Model
  • State Dict Adapter
  • Deepseek V4
  • Config
  • Cp
  • Fsdp
  • Kernels
  • Sparse Attention
  • Tilelang Indexer
  • Tilelang Indexer Bwd
  • Tilelang Indexer Fwd
  • Tilelang Sparse Mla Bwd
  • Tilelang Sparse Mla Fwd
  • Layers
  • Model
  • Mtp
  • Optimized Kernels
  • State Dict Adapter
  • Diffusion Gemma
  • Attention Mask
  • Fsdp
  • Layers
  • Model
  • State Dict Adapter
  • Ernie4 5
  • Model
  • Rope Utils
  • State Dict Adapter
  • Gemma4 Drafter
  • Composite
  • Model
  • Gemma4 Moe
  • Cp Attention
  • Cp Batch
  • Model
  • State Dict Adapter
  • Glm Moe Dsa
  • Layers
  • Model
  • State Dict Adapter
  • Glm4 Moe
  • Layers
  • Model
  • State Dict Adapter
  • Glm4 Moe Lite
  • Model
  • Gpt Oss
  • Layers
  • Model
  • Rope Utils
  • State Dict Adapter
  • Gpt2
  • Hy Mt2
  • Config
  • Dispatch
  • Layers
  • Model
  • State Dict Adapter
  • Hy V3
  • Config
  • Layers
  • Model
  • State Dict Adapter
  • Kimi K25 Vl
  • Model
  • State Dict Adapter
  • Kimivl
  • Model
  • Ling V2
  • Config
  • Layers
  • Model
  • State Dict Adapter
  • Llama
  • Model
  • Rope Utils
  • State Dict Adapter
  • Llama Bidirectional
  • Export Onnx
  • Model
  • Llava Onevision
  • Model
  • Rice Vit
  • State Dict Adapter
  • Mimo V2 Flash
  • Config
  • Model
  • State Dict Adapter
  • Minimax M2
  • Layers
  • Model
  • State Dict Adapter
  • Minimax M3 Vl
  • Config
  • Layers
  • Model
  • Mtp
  • Processing
  • State Dict Adapter
  • Vision Encoder
  • Ministral Bidirectional
  • Model
  • Mistral3
  • Model
  • Mistral3 Vlm
  • Model
  • State Dict Adapter
  • Mistral4
  • Configuration
  • Model
  • State Dict Adapter
  • Nemotron Omni
  • Model
  • State Dict Adapter
  • Nemotron Parse
  • Model
  • Nemotron Parse Loss
  • Nemotron V3
  • Cache
  • Layers
  • Model
  • Mtp
  • State Dict Adapter
  • Qwen2
  • Model
  • State Dict Adapter
  • Qwen2 5 Omni
  • Model
  • State Dict Adapter
  • Qwen3 5
  • Model
  • State Dict Adapter
  • Qwen3 5 Moe
  • Cp Linear Attn
  • Model
  • State Dict Adapter
  • Qwen3 Moe
  • Layers
  • Model
  • State Dict Adapter
  • Qwen3 Next
  • Layers
  • Model
  • State Dict Adapter
  • Qwen3 Omni Moe
  • Model
  • State Dict Adapter
  • Qwen3 Vl Moe
  • Model
  • State Dict Adapter
  • Step3p5
  • Layers
  • Model
  • State Dict Adapter
  • Step3p7
  • Configuration Step3p7
  • Model
  • Mtp
  • Processing Step3
  • State Dict Adapter
  • Vision Encoder
  • Moe
  • Config
  • Experts
  • Fsdp Mixin
  • Layers
  • Load Balance Metrics
  • Megatron
  • Fused A2a
  • Fused Indices Converter
  • Moe Utils
  • Token Dispatcher
  • Mxfp8
  • Parallelizer
  • State Dict Mixin
  • State Dict Utils
  • Uccl Ep
  • Buffer
  • Optim
  • Dion
  • Optimizer
  • Precision Warnings
  • Scheduler
  • Quantization
  • Fp8
  • Qat
  • Qlora
  • Speculative
  • Bench Sglang
  • Dflash
  • Core
  • Draft Qwen3
  • Registry
  • Target
  • Eagle
  • Backend
  • Core
  • Core V12
  • Draft Gpt Oss
  • Draft Llama
  • Draft Llama V12
  • Peagle Attention
  • Peagle Data
  • Peagle Draft
  • Peagle Trainer
  • Registry
  • Remote
  • Client
  • Protocol
  • Server
  • Transport
  • Wire
  • Target
  • Target V12
  • Precompute Eagle3
  • Regenerate
  • Serve Sglang
  • Serve Target
  • Training
  • Ema
  • Garbage Collection
  • Model Output Utils
  • Neftune
  • Rng
  • Signal Handler
  • Step Scheduler
  • Timers
  • Utils
  • Utils
  • Compile Utils
  • Flops Utils
  • Model Utils
  • Yaml Utils
  • Package Info
  • Shared
  • Import Utils
  • Te Patches
  • Torch Patches
  • Transformers Patches
  • Utils
On this page
  • Submodules
ReferenceFull Library ReferenceNemo AutomodelNemo AutomodelComponentsModels

nemo_automodel.components.models.baichuan

||View as Markdown|

Submodules

  • nemo_automodel.components.models.baichuan.configuration
  • nemo_automodel.components.models.baichuan.model
Previous

nemo_automodel.components.models.bagel.state_dict_adapter

Next

nemo_automodel.components.models.baichuan.configuration

NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Your Privacy Choices | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.