ai4med.libs.optimizers package
-
class
NovoGrad
(learning_rate=1.0, beta1=0.95, beta2=0.98, epsilon=1e-08, weight_decay=0.0, grad_averaging=False, use_locking=False, name='NovoGrad') Bases:
tensorflow.python.training.momentum.MomentumOptimizer
Optimizer that implements SGD with layer-wise normalized gradients, when normalization is done by sqrt(ema(sqr(grads))), similar to Adam
- Second moment = ema of Layer-wise sqr of grads:
v_t <– beta2*v_{t-1} + (1-beta2)*(g_t)^2
First moment has two mode: 1. moment of grads normalized by u_t:
m_t <- beta1*m_{t-1} + lr_t * [ g_t/sqrt(v_t+epsilon)]
moment similar to Adam: ema of grads normalized by u_t: m_t <- beta1*m_{t-1} + lr_t * [(1-beta1)*(g_t/sqrt(v_t+epsilon))]
- if weight decay add wd term after grads are rescaled by 1/sqrt(v_t):
m_t <- beta1*m_{t-1} + lr_t * [g_t/sqrt(v_t+epsilon) + wd*w_{t-1}]
- Weight update:
w_t <- w_{t-1} - *m_t
- Parameters
learning_rate – A Tensor or a floating point value. The learning rate.
beta1 – A Tensor or a float, used in ema for momentum.Default = 0.95.
beta2 – A Tensor or a float, used in ema for grad norms.Default = 0.99.
epsilon – a float. Default = 1e-8.
weight_decay – A Tensor or a float, Default = 0.0.
grad_averaging – switch between Momentum and SAG, Default = False,
use_locking – If True use locks for update operations.
name – Optional, name prefix for the ops created when applying gradients. Defaults to “NovoGrad”.
use_nesterov – If True use Nesterov Momentum.
Constructor:
- Parameters
learning_rate – A Tensor or a floating point value. The learning rate.
beta1 – A Tensor or a float, used in ema for momentum.Default = 0.95.
beta2 – A Tensor or a float, used in ema for grad norms.Default = 0.99.
epsilon – a float. Default = 1e-8.
weight_decay – A Tensor or a float, Default = 0.0.
grad_averaging – switch between Momentum and SAG, Default = False,
use_locking – If True use locks for update operations.
name – Optional, name prefix for the ops created when applying gradients. Defaults to “NovoGrad”.
use_nesterov – If True use Nesterov Momentum.
-
apply_gradients
(grads_and_vars, global_step=None, name=None) Apply gradients to variables.
This is the second part of minimize(). It returns an Operation that applies gradients.
- Parameters
grads_and_vars – List of (gradient, variable) pairs as returned by compute_gradients().
global_step – Optional Variable to increment by one after the variables have been updated.
name – Optional name for the returned operation. Default to the name passed to the Optimizer constructor.
- Returns
An Operation that applies the specified gradients. If global_step was not None, that operation also increments global_step.
- Raises
TypeError – If grads_and_vars is malformed.
ValueError – If none of the variables have gradients.
RuntimeError – If you should use _distributed_apply() instead.