Known Issues#

This page lists known issues and limitations in the current release.

25.11#

  • Deepseek V3 on H100 has an issue when using DeepEP and fails with RuntimeError: DeepEP error: timeout (dispatch CPU).

  • MODEL_TFLOP/s/GPU is printed as 0 to stdout for all Hybrid models, such as Nemotron-H 56B.

25.09#

  • Pretraining DeepSeek in subchannel FP8 precision is not working. Pretraining DeepSeek with current scaling FP8 is a workaround, but MTP loss does not converge.