Important

NeMo 2.0 is an experimental feature and currently released in the dev container only: nvcr.io/nvidia/nemo:dev. Please refer to NeMo 2.0 overview for information on getting started.

Access Long Context Recipe

NeMo 2.0 is providing tested recipes to train long context models. The recipe is available in the NeMo LLM recipes directory.

The following charts for the Llama 3, Mixtral, and Nemotron models show the different sequence lengths supported by each model at various sizes.

Llama 3

Sequence Length	8B	70B
16k	Yes	Yes
64k	Yes	Yes

Mixtral

Sequence Length	8x3B	8x7B
16k	Yes	Yes
64k	Yes	Yes

Nemotron 4

Sequence Length	15B	22B
16k	Yes	Yes
64k	Yes	Yes