core.optimizer.qk_clip#

Module Contents#

Functions#

clip_qk

Clips QK attention logits to prevent numerical instability.

API#

core.optimizer.qk_clip.clip_qk(model, log_max_only=False) float#

Clips QK attention logits to prevent numerical instability.

Parameters:
  • model (List[MegatronModule]) – Model chunks containing attention layers.

  • log_max_only (bool) – If True, only computes max logit without clipping.

Returns:

The maximum QK logit value across all chunks.

Return type:

float