bridge.recipes.utils.optimizer_utils
#
Module Contents#
Functions#
Creates a distributed fused Adam optimizer with cosine annealing scheduler. |
|
Creates a distributed fused Adam optimizer with cosine annealing scheduler for sample-based training. |
API#
- bridge.recipes.utils.optimizer_utils.distributed_fused_adam_with_cosine_annealing(
- precision: str = 'bf16-mixed',
- lr_warmup_iters: int = 2000,
- lr_decay_iters: Optional[int] = None,
- adam_beta1: float = 0.9,
- adam_beta2: float = 0.95,
- adam_eps: float = 1e-05,
- weight_decay: float = 0.1,
- max_lr: float = 0.0001,
- min_lr: Optional[float] = None,
- clip_grad: float = 1.0,
Creates a distributed fused Adam optimizer with cosine annealing scheduler.
- Parameters:
precision – Mixed precision type (“bf16-mixed”, “16-mixed”, etc.)
lr_warmup_iters – Number of iterations for learning rate warmup
lr_decay_iters – Number of iterations for learning rate decay. If None, defaults to train_iters during training.
adam_beta1 – Adam optimizer beta1 parameter
adam_beta2 – Adam optimizer beta2 parameter
adam_eps – Adam optimizer epsilon parameter
weight_decay – Weight decay coefficient
max_lr – Maximum learning rate
min_lr – Minimum learning rate (defaults to 0.1 * max_lr)
clip_grad – Gradient clipping value
- Returns:
Tuple of (OptimizerConfig, SchedulerConfig)
- bridge.recipes.utils.optimizer_utils.distributed_fused_adam_with_cosine_annealing_samples(
- precision: str = 'bf16-mixed',
- lr_warmup_samples: Optional[int] = None,
- lr_decay_samples: Optional[int] = None,
- adam_beta1: float = 0.9,
- adam_beta2: float = 0.95,
- adam_eps: float = 1e-05,
- weight_decay: float = 0.1,
- max_lr: float = 0.0001,
- min_lr: Optional[float] = None,
- clip_grad: float = 1.0,
Creates a distributed fused Adam optimizer with cosine annealing scheduler for sample-based training.
This is the sample-based equivalent of distributed_fused_adam_with_cosine_annealing().
- Parameters:
precision – Mixed precision mode (“bf16-mixed”, “16-mixed”, etc.)
lr_warmup_samples – Number of samples for learning rate warmup (None = auto from train_samples)
lr_decay_samples – Number of samples for learning rate decay (None = auto from train_samples)
adam_beta1 – Adam optimizer beta1 parameter
adam_beta2 – Adam optimizer beta2 parameter
adam_eps – Adam optimizer epsilon parameter
weight_decay – Weight decay coefficient
max_lr – Maximum learning rate
min_lr – Minimum learning rate (defaults to 0.1 * max_lr)
clip_grad – Gradient clipping value
- Returns:
A tuple of (OptimizerConfig, SchedulerConfig) configured for sample-based training