emerging_optimizers.mixin#

class emerging_optimizers.mixin.WeightDecayMixin[source]#

Mixin for weight decay

Supports different types of weight decay:

  • “decoupled”: weight decay is applied directly to params without changing gradients

  • “independent”: similar as decoupled weight decay, but without tying weight decay and learning rate

  • “l2”: classic L2 regularization

_apply_weight_decay_inplace(
p,
grad,
lr,
weight_decay,
)[source]#

Depends on the weight decay option, p or grad will be updated in place

Parameters:
Return type:

None