Determinism
Determinism is to have exact reproducibility from run to run with models training to the same weights and inference achieving the same results. This can be helpful for reasons ranging from traceability and auditing to experimentation and debugging.
For additional details, see: https://github.com/NVIDIA/tensorflow-determinism
All that is needed in Clara is for seeds (numbers used to initialize pseudorandom number generators) to be set in determinism in config.json:
{
"epochs": 1240,
"num_training_epoch_per_valid": 20,
"learning_rate": 1e-4,
"determinism": {
"python_seed": "20191015",
"random_seed": 123456,
"numpy_seed": 654321,
"tf_seed": 11111
},
...
The values used in the example above are merely for demonstration purposes. You can set your own seeds:
The python_seed
is set with a string of numerical value. All others are set with positive integers, similar to what most random functions take.
python_seed
is responsible for setting the environment variable PYTHONHASHSEED
, which is used as a fixed seed for generating the hash() of the types covered by the hash randomization.
random_seed
is used for seeding Python’s built-in random library. numpy_seed
and tf_seed
are used exactly as their names suggest, for numpy and Tensorflow.
There are still two other sources of non-determinism, namely multiple workers in the tf.data.Dataset
pipeline and Horovod Tensor Fusion
. Both cases are handled internally when determinism is used. To be more specific,
the number of workers will be set to 1 for deterministic training. The environment variable HOROVOD_FUSION_THRESHOLD
will be set to '0'
when determinism is enabled in multi-gpu training.
For deterministic training to work, it is important to eliminate all sources of randomness. It is recommended to keep same the number of GPUs, the GPU architecture, driver versions, all framework versions, and the setup.
Known issue: Determinism may not work under any of the following conditions.
Both NovoGrad and AMP are enabled
XLA is enabled (see note 5 of this page https://github.com/NVIDIA/framework-determinism#confirmed-current-gpu-specific-sources-of-non-determinism-with-solutions)