bridge.recipes.utils.determinism_utils#
Config-level overrides for deterministic training.
Module Contents#
Functions#
Apply determinism config overrides to an existing ConfigContainer in-place. |
API#
- bridge.recipes.utils.determinism_utils.apply_determinism_overrides(
- cfg: megatron.bridge.training.config.ConfigContainer,
Apply determinism config overrides to an existing ConfigContainer in-place.
Sets the model-level flags required for bit-exact reproducibility and disables TP comm overlap (which uses non-deterministic NCCL collectives). Attention backend selection is a separate concern and is not touched here.
The matching validator that enforces these flags at training time is
- Meth:
megatron.bridge.training.config.ConfigContainer._validate_and_apply_deterministic_mode.
This function is idempotent and is safe to call on configs with
comm_overlap = None... note::
Bit-exact reproducibility additionally requires runtime env vars (
NCCL_ALGO=Ring,NVTE_ALLOW_NONDETERMINISTIC_ALGO=0,CUBLAS_WORKSPACE_CONFIG=:4096:8). The performance launcher sets these viaPerfEnvPlugin(deterministic=True); callers outside that launcher must set them themselves.- Parameters:
cfg – Recipe config to modify.