core.optimizer.qk_clip#
Module Contents#
Functions#
Clip the QK attention logits to the threshold, recommended for Muon optimizer. |
API#
- core.optimizer.qk_clip.clip_qk(model, log_max_only=False) float#
Clip the QK attention logits to the threshold, recommended for Muon optimizer.
- Parameters:
model – The model to clip the QK attention logits, a list of model chunks.
log_only – Whether to only log the max attention logit, without updating the weights.
- Returns:
The maximum attention logit, a float.