core.optimizer#
Subpackages#
Submodules#
Package Contents#
Functions#
Returns true if passed-in parameter (with name) matches |
|
Create parameter groups for optimizer. |
|
Returns parameter groups and buffer for optimizer. |
|
Get Megatron optimizer based on parameter groups. |
|
Retrieve the Megatron optimizer for model chunks. |
Data#
API#
- core.optimizer.logger#
‘getLogger(…)’
- core.optimizer._matches(
- param: torch.nn.Parameter,
- param_name: str,
- param_key: core.optimizer.optimizer_config.ParamKey,
Returns true if passed-in parameter (with name) matches
param_key.- Parameters:
param (torch.nn.Parameter) – Handle to parameter object.
param_name (str) – Name of parameter in underlying PyTorch module.
param_key (ParamKey) – ParamKey object.
- Returns:
True if parameter matches passed-in param_key.
- Return type:
bool
- core.optimizer._get_param_groups(
- model_chunks: List[core.transformer.module.MegatronModule],
- config: core.optimizer.optimizer_config.OptimizerConfig,
- config_overrides: Optional[Dict[core.optimizer.optimizer_config.ParamKey, core.optimizer.optimizer_config.OptimizerConfig]],
Create parameter groups for optimizer.
Creates parameter groups from provided optimizer config object.
- Parameters:
model_chunks (List[MegatronModule]) – model chunks to create parameter groups for.
config (OptimizerConfig) – optimizer configuration object.
config_overrides (Optional[Dict[LayerKey, OptimizerConfig]) – optimizer overrides, specified on a per-layer basis.
- Returns:
List of parameter groups.
- core.optimizer._get_param_groups_and_buffers(
- model_chunks: List[core.transformer.module.MegatronModule],
- model_chunk_offset: int,
- config: core.optimizer.optimizer_config.OptimizerConfig,
- config_overrides: Optional[Dict[core.optimizer.optimizer_config.ParamKey, core.optimizer.optimizer_config.OptimizerConfig]],
- filter_fn: Callable,
- buffer_name: str,
Returns parameter groups and buffer for optimizer.
- Parameters:
model_chunks (List[MegatronModule]) – model chunks to create parameter groups for.
model_chunk_offset (int) – offset of model_chunks in global model_chunks list.
config (OptimizerConfig) – optimizer configuration object.
config_overrides (Optional[Dict[LayerKey, OptimizerConfig]) – optimizer overrides, specified on a per-layer basis.
lr (float) – learning rate.
min_lr (float) – minimum learning rate.
filter_fn (callable) – filtering function for param_groups.
buffer_name (str) – name of buffer.
- Returns:
List of parameter groups and dictionary of model chunk IDs to buffers.
- core.optimizer._get_megatron_optimizer_based_on_param_groups(
- config: core.optimizer.optimizer_config.OptimizerConfig,
- model_chunks: List[core.transformer.module.MegatronModule],
- param_groups: List,
- per_model_buffers: Optional[Dict[int, List[core.distributed.param_and_grad_buffer._ParamAndGradBuffer]]] = None,
- model_parallel_group: Optional[torch.distributed.ProcessGroup] = None,
- data_parallel_group: Optional[torch.distributed.ProcessGroup] = None,
- data_parallel_group_gloo: Optional[torch.distributed.ProcessGroup] = None,
- data_parallel_group_idx: Optional[int] = None,
- intra_dist_opt_group: Optional[torch.distributed.ProcessGroup] = None,
- distributed_optimizer_instance_id: Optional[int] = 0,
Get Megatron optimizer based on parameter groups.
- Parameters:
config (OptimizerConfig) – optimizer configuration object.
model_chunks (list) – list of model chunks.
param_groups (list) – list of parameter groups.
per_model_buffers (dict, optional) – buffers for distributed optimizer. Defaults to None.
data_parallel_group (torch.distributed.ProcessGroup, optional) – data-parallel group for distributed optimizer. Defaults to None.
data_parallel_group_gloo (torch.distributed.ProcessGroup, optional) – gloo data-parallel group for distributed optimizer. Defaults to None.
data_parallel_group_idx (int, optional) – data-parallel group index for distributed optimizer. Defaults to None.
distributed_optimizer_instance_id (int, optional) – Distributed optimizer instance. Defaults 0.
- Returns:
Instance of MegatronOptimizer.
- core.optimizer.get_megatron_optimizer(
- config: core.optimizer.optimizer_config.OptimizerConfig,
- model_chunks: List[core.transformer.module.MegatronModule],
- config_overrides: Optional[Dict[core.optimizer.optimizer_config.ParamKey, core.optimizer.optimizer_config.OptimizerConfig]] = None,
- use_gloo_process_groups: bool = True,
- pg_collection: Optional[megatron.core.process_groups_config.ProcessGroupCollection] = None,
- dump_param_to_param_group_map: Optional[str] = None,
Retrieve the Megatron optimizer for model chunks.
We use separate optimizers for expert parameters and non-expert parameters.
- Parameters:
config (OptimizerConfig) – optimizer configuration object.
model_chunks (List[MegatronModule]) – model chunks to get optimizer for.
config_overrides (Optional[Dict[ParamKey, OptimizerConfig]]) – optional dictionary of optimizer configuration objects to override default optimizer behavior for different subsets of parameters (identified by ParamKey).
use_gloo_process_groups (bool) – if false, disable use of Gloo process groups in underlying Megatron optimizers.
pg_collection – Optional unified process group for distributed training.
dump_param_to_param_group_map (Optional[str]) – path to dump parameter to param group map.
- Returns:
Instance of MegatronOptimizer.