Determinism

3.1

Determinism is to have exact reproducibility from run to run with models training to the same weights and inference achieving the same results. This can be helpful for reasons ranging from traceability and auditing to experimentation and debugging.

For additional details, see: https://github.com/NVIDIA/tensorflow-determinism

All that is needed in Clara is for seeds (numbers used to initialize pseudorandom number generators) to be set in determinism in config.json:

Copy
Copied!
            

{ "epochs": 1240, "num_training_epoch_per_valid": 20, "learning_rate": 1e-4, "determinism": { "python_seed": "20191015", "random_seed": 123456, "numpy_seed": 654321, "tf_seed": 11111 }, ...

The values used in the example above are merely for demonstration purposes. You can set your own seeds:

The python_seed is set with a string of numerical value. All others are set with positive integers, similar to what most random functions take.

python_seed is responsible for setting the environment variable PYTHONHASHSEED, which is used as a fixed seed for generating the hash() of the types covered by the hash randomization. random_seed is used for seeding Python’s built-in random library. numpy_seed and tf_seed are used exactly as their names suggest, for numpy and Tensorflow.

There are still two other sources of non-determinism, namely multiple workers in the tf.data.Dataset pipeline and Horovod Tensor Fusion. Both cases are handled internally when determinism is used. To be more specific, the number of workers will be set to 1 for deterministic training. The environment variable HOROVOD_FUSION_THRESHOLD will be set to '0' when determinism is enabled in multi-gpu training.

Note

For deterministic training to work, it is important to eliminate all sources of randomness. It is recommended to keep same the number of GPUs, the GPU architecture, driver versions, all framework versions, and the setup.

Note

Known issue: Determinism may not work under any of the following conditions.

© Copyright 2020, NVIDIA. Last updated on Feb 2, 2023.