Important
NeMo 2.0 is an experimental feature and currently released in the dev container only: nvcr.io/nvidia/nemo:dev. Please refer to NeMo 2.0 overview for information on getting started.
Access Long Context Recipe#
NeMo 2.0 is providing tested recipes to train long context models. The recipe is available in the NeMo LLM recipes directory.
The following charts for the Llama 3, Mixtral, and Nemotron models show the different sequence lengths supported by each model at various sizes.
Llama 3#
Sequence Length |
8B |
70B |
|---|---|---|
16k |
Yes |
Yes |
64k |
Yes |
Yes |
Mixtral#
Sequence Length |
8x3B |
8x7B |
|---|---|---|
16k |
Yes |
Yes |
64k |
Yes |
Yes |
Nemotron 4#
Sequence Length |
15B |
22B |
|---|---|---|
16k |
Yes |
Yes |
64k |
Yes |
Yes |