bridge.recipes.utils.determinism_utils#

Config-level overrides for deterministic training.

Module Contents#

Functions#

apply_determinism_overrides

Apply determinism config overrides to an existing ConfigContainer in-place.

API#

bridge.recipes.utils.determinism_utils.apply_determinism_overrides(
cfg: megatron.bridge.training.config.ConfigContainer,
) None#

Apply determinism config overrides to an existing ConfigContainer in-place.

Sets the model-level flags required for bit-exact reproducibility and disables TP comm overlap (which uses non-deterministic NCCL collectives). Attention backend selection is a separate concern and is not touched here.

The matching validator that enforces these flags at training time is

Meth:

megatron.bridge.training.config.ConfigContainer._validate_and_apply_deterministic_mode.

This function is idempotent and is safe to call on configs with comm_overlap = None.

.. note::

Bit-exact reproducibility additionally requires runtime env vars (NCCL_ALGO=Ring, NVTE_ALLOW_NONDETERMINISTIC_ALGO=0, CUBLAS_WORKSPACE_CONFIG=:4096:8). The performance launcher sets these via PerfEnvPlugin(deterministic=True); callers outside that launcher must set them themselves.

Parameters:

cfg – Recipe config to modify.