Configuration Overview#
The ConfigContainer
is the central configuration object in Megatron Bridge that holds all settings for training. It acts as a single source of truth that brings together model architecture, training parameters, data loading, optimization, checkpointing, logging, and distributed training settings.
What is ConfigContainer#
ConfigContainer
is a dataclass that holds all the configuration objects needed for training:
from megatron.bridge.training.config import ConfigContainer
# ConfigContainer brings together all training configurations
config = ConfigContainer(
model=model_provider, # Model architecture and parallelism
train=training_config, # Training loop parameters
optimizer=optimizer_config, # Megatron Optimization settings
scheduler=scheduler_config, # Learning rate scheduling
dataset=dataset_config, # Data loading configuration
logger=logger_config, # Logging and monitoring
tokenizer=tokenizer_config, # Tokenization settings
checkpoint=checkpoint_config, # Checkpointing and resuming
dist=distributed_config, # Distributed training setup
ddp=ddp_config, # Megatron Distributed Data Parallel settings
# Optional configurations
peft=peft_config, # Parameter-efficient fine-tuning
profiling=profiling_config, # Performance profiling
mixed_precision=mp_config, # Mixed precision training
comm_overlap=comm_overlap_config, # Communication overlap settings
# ... and more
)
Configuration Components#
Component |
Purpose |
Required |
Default |
---|---|---|---|
|
Model architecture and parallelism strategy (GPT, T5, Mamba) |
β |
- |
|
Training loop parameters (batch sizes, iterations, validation) |
β |
- |
|
Optimizer type and hyperparameters (from Megatron Core) |
β |
- |
|
Learning rate and weight decay scheduling |
β |
- |
|
Data loading and preprocessing configuration |
β |
- |
|
Logging, TensorBoard, and WandB configuration |
β |
- |
|
Tokenizer settings and vocabulary |
β |
- |
|
Checkpointing, saving, and loading |
β |
- |
|
Distributed training initialization |
|
|
|
Data parallel configuration (from Megatron Core) |
|
|
|
Random number generation settings |
|
|
|
Result validation and error injection |
|
|
|
Mixed precision training settings |
|
|
|
Communication overlap optimizations |
|
|
|
Parameter-efficient fine-tuning (LoRA, DoRA, etc.) |
|
|
|
Performance profiling with nsys or PyTorch profiler |
|
|
|
Fault tolerance and automatic recovery |
|
|
|
GPU straggler detection |
|
|
|
NVIDIA Resiliency Extension straggler detection |
|
|
|
In-process restart for fault tolerance |
|
Design Philosophy#
Interoperability with External Config Systems#
Megatron Bridgeβs Python configurations are designed to be amenable to other configuration systems you already use, such as:
Programmatic configuration: Direct Python object manipulation
argparse: Command-line arguments can be easily mapped to dataclass fields
File-based overrides: JSON, YAML, or other config files can override Python configs
All of these approaches can be translated into Python dataclass instances. The framework provides utilities as a convenience for YAML-based overrides with OmegaConf, but the framework is not tied to any particular configuration system.
# All of these approaches work seamlessly:
# 1. Direct Python configuration
config = ConfigContainer(
model=GPTModelProvider(num_layers=24, hidden_size=2048),
train=TrainingConfig(global_batch_size=256, train_iters=10000),
# ... other configs
)
# 2. YAML-based serialization and deserialization (round-trip)
config.to_yaml("my_config.yaml")
config = ConfigContainer.from_yaml("my_config.yaml") # Load previously saved config
# 3. Programmatic override after creation
config.train.global_batch_size = 512 # Override after instantiation
config.model.num_layers = 48 # Modify model architecture
Centralized Configuration#
Megatron provides extensive flexibility through a rich set of configuration options. The ConfigContainer
brings all these settings together in a single, organized object. This centralization makes configuration discoverable and maintainable - you have one place to understand and control all aspects of your training run.
Unlike pure YAML-based configuration systems, ConfigContainer
provides centralization with the full power of Python. You get the organizational benefits of a single configuration file combined with the programmatic flexibility of Python.
The configuration system is built using nested dataclasses, providing:
Modularity: Each config component is independently defined and testable
Type safety: Full static type checking
IDE support: Autocomplete and type hints in development environments
Serialization: Easy conversion to/from YAML, JSON, or other formats
Validation: Built-in field validation
@dataclass
class ConfigContainer:
model: GPTModelProvider # Dataclass for model architecture
train: TrainingConfig # Dataclass for training parameters
optimizer: OptimizerConfig # Dataclass for optimization settings
# ... nested dataclasses for each concern
Lazy Configuration and Deferred Validation#
For training workloads, configurations are lazy to support flexible user workflows:
Problem with Eager Validation:
# This would be problematic with eager validation:
config = TrainingConfig(train_iters=1000)
# __post_init__ calculates dependent values immediately
config.train_iters = 5000 # User override
# Dependent values are now stale and incorrect!
Solution with Lazy Finalization:
# Megatron Bridge approach - deferred validation
config = TrainingConfig(train_iters=1000)
config.train_iters = 5000 # User can safely override
# Validation happens automatically right when training starts
pretrain(config, forward_step_func) # All dependent values calculated correctly
Benefits:
Users can instantiate configs and subsequently override fields safely
Dependent values are calculated correctly after all user modifications are applied
Validation happens at the right time, right before training begins
Flexible configuration workflows are supported
Model Independence#
Model configurations are designed to be independently usable outside the full training loop provided by thr framework:
# Models can be used standalone
model_provider = GPTModelProvider(
num_layers=24,
hidden_size=2048,
vocab_size=50000, # Must be explicitly set
seq_length=2048, # Must be explicitly set
)
# This works independently of other configs
model_provider.finalize()
model = model_provider.provide()
Trade-off: The price for this flexibility is the need to explicitly set values like seq_length
in multiple places during training. These settings are checked for consistency at the beginning of training.
Usage#
# Create and configure
config = ConfigContainer(
model=GPTModelProvider(num_layers=24, seq_length=2048),
train=TrainingConfig(train_iters=1000),
dataset=GPTDatasetConfig(sequence_length=2048), # Must match model seq_length
# ... other required configs
)
# Modify as needed
config.train.train_iters = 5000
config.model.hidden_size = 4096
# Start training - validation happens automatically
pretrain(config, forward_step_func)
Configuration Export and Import#
Export to YAML#
# Print YAML configuration to console
config.print_yaml()
# Save to file
config.to_yaml("config.yaml")
Load from YAML#
# Load configuration from YAML file
config = ConfigContainer.from_yaml("config.yaml")