core.optimizer.qk_clip#

Module Contents#

Functions#

clip_qk

Clip the QK attention logits to the threshold, recommended for Muon optimizer.

API#

core.optimizer.qk_clip.clip_qk(model, log_max_only=False) float#

Clip the QK attention logits to the threshold, recommended for Muon optimizer.

Parameters:
  • model – The model to clip the QK attention logits, a list of model chunks.

  • log_only – Whether to only log the max attention logit, without updating the weights.

Returns:

The maximum attention logit, a float.