class NovoGrad(learning_rate=1.0, beta1=0.95, beta2=0.98, epsilon=1e-08, weight_decay=0.0, grad_averaging=False, use_locking=False, name='NovoGrad')

Bases: tensorflow.python.training.momentum.MomentumOptimizer

Optimizer that implements SGD with layer-wise normalized gradients, when normalization is done by sqrt(ema(sqr(grads))), similar to Adam

Second moment = ema of Layer-wise sqr of grads:: v_t <– beta2*v_{t-1} + (1-beta2)*(g_t)^2

First moment has two mode: 1. moment of grads normalized by u_t:

m_t <- beta1*m_{t-1} + lr_t * [ g_t/sqrt(v_t+epsilon)]

moment similar to Adam: ema of grads normalized by u_t: m_t <- beta1*m_{t-1} + lr_t * [(1-beta1)*(g_t/sqrt(v_t+epsilon))]

if weight decay add wd term after grads are rescaled by 1/sqrt(v_t):: m_t <- beta1*m_{t-1} + lr_t * [g_t/sqrt(v_t+epsilon) + wd*w_{t-1}]
Weight update:: w_t <- w_{t-1} - *m_t

Parameters

learning_rate – A Tensor or a floating point value. The learning rate.
beta1 – A Tensor or a float, used in ema for momentum.Default = 0.95.
beta2 – A Tensor or a float, used in ema for grad norms.Default = 0.99.
epsilon – a float. Default = 1e-8.
weight_decay – A Tensor or a float, Default = 0.0.
grad_averaging – switch between Momentum and SAG, Default = False,
use_locking – If True use locks for update operations.
name – Optional, name prefix for the ops created when applying gradients. Defaults to “NovoGrad”.
use_nesterov – If True use Nesterov Momentum.

Constructor:

Parameters

learning_rate – A Tensor or a floating point value. The learning rate.
beta1 – A Tensor or a float, used in ema for momentum.Default = 0.95.
beta2 – A Tensor or a float, used in ema for grad norms.Default = 0.99.
epsilon – a float. Default = 1e-8.
weight_decay – A Tensor or a float, Default = 0.0.
grad_averaging – switch between Momentum and SAG, Default = False,
use_locking – If True use locks for update operations.
name – Optional, name prefix for the ops created when applying gradients. Defaults to “NovoGrad”.
use_nesterov – If True use Nesterov Momentum.

apply_gradients(grads_and_vars, global_step=None, name=None)

Apply gradients to variables.

This is the second part of minimize(). It returns an Operation that applies gradients.

Parameters

grads_and_vars – List of (gradient, variable) pairs as returned by compute_gradients().
global_step – Optional Variable to increment by one after the variables have been updated.
name – Optional name for the returned operation. Default to the name passed to the Optimizer constructor.

Returns

An Operation that applies the specified gradients. If global_step was not None, that operation also increments global_step.

Raises

TypeError – If grads_and_vars is malformed.
ValueError – If none of the variables have gradients.
RuntimeError – If you should use _distributed_apply() instead.

ai4med.libs.optimizers package

Submodules

ai4med.libs.optimizers.novograd module

Module contents