Important
NeMo 2.0 is an experimental feature and currently released in the dev container only: nvcr.io/nvidia/nemo:dev. Please refer to NeMo 2.0 overview for information on getting started.
Access Long Context Recipe
NeMo 2.0 is providing tested recipes to train long context models. The recipe is available in the NeMo LLM recipes directory.
The following charts for the Llama 3, Mixtral, and Nemotron models show the different sequence lengths supported by each model at various sizes.
Llama 3
Sequence Length |
8B |
70B |
---|---|---|
16k |
Yes |
Yes |
64k |
Yes |
Yes |
Mixtral
Sequence Length |
8x3B |
8x7B |
---|---|---|
16k |
Yes |
Yes |
64k |
Yes |
Yes |
Nemotron 4
Sequence Length |
15B |
22B |
---|---|---|
16k |
Yes |
Yes |
64k |
Yes |
Yes |