core.optimizer.qk_clip#
Module Contents#
Functions#
Clips QK attention logits to prevent numerical instability. |
API#
- core.optimizer.qk_clip.clip_qk(model, log_max_only=False) float#
Clips QK attention logits to prevent numerical instability.
- Parameters:
model (List[MegatronModule]) – Model chunks containing attention layers.
log_max_only (bool) – If True, only computes max logit without clipping.
- Returns:
The maximum QK logit value across all chunks.
- Return type:
float