`core.optimizer.qk_clip`#

Module Contents#

Clip the QK attention logits to the threshold, recommended for Muon optimizer.

core.optimizer.qk_clip.clip_qk(model, log_max_only=False) → float#

Clip the QK attention logits to the threshold, recommended for Muon optimizer.

Parameters:

model – The model to clip the QK attention logits, a list of model chunks.
log_only – Whether to only log the max attention logit, without updating the weights.

Returns:

The maximum attention logit, a float.