bridge.training.optim#

Module Contents#

Functions#

setup_optimizer

Set up the optimizer and scheduler.

_get_scheduler

Get the optimizer parameter scheduler.

API#

bridge.training.optim.setup_optimizer(
optimizer_config: megatron.core.optimizer.OptimizerConfig,
scheduler_config: megatron.bridge.training.config.SchedulerConfig,
model: Union[megatron.core.transformer.module.MegatronModule, list[megatron.core.transformer.module.MegatronModule]],
use_gloo_process_groups: bool = False,
no_weight_decay_cond: Optional[Callable[[str, torch.nn.Parameter], bool]] = None,
scale_lr_cond: Optional[Callable[[str, torch.nn.Parameter], bool]] = None,
lr_mult: float = 1.0,
) tuple[megatron.core.optimizer.MegatronOptimizer, megatron.core.optimizer_param_scheduler.OptimizerParamScheduler]#

Set up the optimizer and scheduler.

Parameters:
  • optimizer_config – Configuration for the optimizer

  • scheduler_config – Configuration for the scheduler

  • model – The model to optimize

  • use_gloo_process_groups – Whether to use Gloo process groups

  • no_weight_decay_cond – Condition for parameters to exclude from weight decay

  • scale_lr_cond – Condition for parameters to scale learning rate

  • lr_mult – Learning rate multiplier

Returns:

tuple containing the optimizer and scheduler

bridge.training.optim._get_scheduler(
optimizer_config: megatron.core.optimizer.OptimizerConfig,
scheduler_config: megatron.bridge.training.config.SchedulerConfig,
optimizer: megatron.core.optimizer.MegatronOptimizer,
) megatron.core.optimizer_param_scheduler.OptimizerParamScheduler#

Get the optimizer parameter scheduler.

Parameters:
  • optimizer_config – Configuration for the optimizer

  • scheduler_config – Configuration for the scheduler

  • optimizer – The optimizer to schedule

Returns:

The optimizer parameter scheduler