emerging_optimizers.mixin#

class emerging_optimizers.mixin.WeightDecayMixin[source]#

Mixin for weight decay

Supports different types of weight decay:

“decoupled”: weight decay is applied directly to params without changing gradients
“independent”: similar as decoupled weight decay, but without tying weight decay and learning rate
“l2”: classic L2 regularization

_apply_weight_decay_inplace( p, grad, lr, weight_decay, )[source]#

Depends on the weight decay option, p or grad will be updated in place

Parameters:

Return type:

None