Deterministic Generation Mode in NVIDIA NIM for LLMs#
NVIDIA NIM for LLMs supports deterministic generation mode, which ensures consistent text generation across multiple inference runs and requests. This feature is particularly valuable for applications requiring reproducible outputs, testing, and validation scenarios.
Enabling Deterministic Mode#
Currently, this is supported for tensorrt_llm buildable profiles. To enable deterministic generation, set the following environment variable:
NIM_FORCE_DETERMINISTIC=1
Hardware Requirements#
FP8 Profiles : H100 GPUs with NVIDIA NVLink. A100 and H100-NVL are not supported.
FP16 Profiles : Run on any supported GPU.
Consequences of Using Deterministic Mode#
When deterministic mode is enabled:
The generated text remains consistent across multiple inference runs, particularly important for batched requests.
There may be a slight degradation in performance metrics, including latency and throughput, due to the additional constraints required to ensure determinism.
The following graph shows experimental results comparing LLama3 70B FP8 performance on 8xH100 with and without deterministic mode enabled: