bridge.perf_recipes._common#

Shared helpers for flat performance benchmark recipes.

_benchmark_common applies throughput-measurement defaults. _perf_precision returns a mixed-precision config for a given dtype.

Module Contents#

Functions#

_benchmark_common

Apply benchmark-mode defaults that prioritize throughput measurement over convergence.

_enable_overlap_param_gather_with_optimizer_step

Enable optimizer-step parameter gather overlap on optimizer and comm-overlap configs.

_perf_precision

Return mixed-precision config tuned for perf benchmarks.

API#

bridge.perf_recipes._common._benchmark_common(
cfg: megatron.bridge.training.config.ConfigContainer,
cross_entropy_impl: str = 'te',
) None#

Apply benchmark-mode defaults that prioritize throughput measurement over convergence.

Intended for performance benchmark recipes only. Sets short training runs, disables checkpointing/eval, tunes scheduler, and enables perf-oriented kernels.

Must stay in sync with _set_common_perf_overrides in scripts/performance/utils/overrides.py.

Individual recipes may override any of these after calling this function (e.g. Kimi K2 sets grad_reduce_in_fp32 = True).

bridge.perf_recipes._common._enable_overlap_param_gather_with_optimizer_step(
cfg: megatron.bridge.training.config.ConfigContainer,
) None#

Enable optimizer-step parameter gather overlap on optimizer and comm-overlap configs.

bridge.perf_recipes._common._perf_precision(compute_dtype: str)#

Return mixed-precision config tuned for perf benchmarks.

Identical to scripts/performance/utils/precision.get_precision_config but importable from the library side. Always sets grad_reduce_in_fp32=False so that callers that replace cfg.mixed_precision after _benchmark_common() still get the benchmark-mode default.