Qwen3#
Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support
We provide recipes for pretraining and fine-tuning Qwen3 models for the following sizes: 0.6B, 1.7B, 4B, 8B, 14B, 32B,
30B-A3B, and 235B-A30B using NeMo 2.0 and NeMo-Run.
These recipes configure a run.Partial
for one of the nemo.collections.llm api functions introduced in NeMo 2.0.
The recipes are hosted in
qwen3_600m,
qwen3_1p7b,
qwen3_4b,
qwen3_8b,
qwen3_14b,
qwen3_32b,
qwen3_30b_a3b, and
qwen3_235b_a22b.
NeMo 2.0 Pretraining Recipes#
Note
The pretraining recipes use the MockDataModule
for the data
argument. You are expected to replace the MockDataModule
with your own custom dataset.
We provide an example below on how to invoke the default recipe and override the data argument:
from nemo.collections import llm
pretrain = llm.qwen3_8b.pretrain_recipe(
name="qwen3_8b_pretraining",
dir=f"/path/to/checkpoints",
num_nodes=2,
num_gpus_per_node=8,
)
# # To override the data argument
# dataloader = a_function_that_configures_your_custom_dataset(
# global_batch_size=global_batch_size,
# micro_batch_size=micro_batch_size,
# seq_length=pretrain.model.config.seq_length,
# )
# pretrain.data = dataloader
NeMo 2.0 Fine-tuning Recipes#
Note
The fine-tuning recipes use the SquadDataModule
for the data
argument. You can replace the SquadDataModule
with your custom dataset.
Warning
When using import_ckpt
in NeMo 2.0, ensure your script includes if __name__ == "__main__":
. Without this, Python’s multiprocessing won’t initialize threads properly, causing a “Failure to acquire lock” error.
To import the HF model and convert to NeMo 2.0 format, run the following command (this only needs to be done once):
from nemo.collections import llm
llm.import_ckpt(model=llm.Qwen3Model(llm.Qwen3Config8B()), source='hf://Qwen/Qwen3-8B')
We provide an example below on how to invoke the default recipe and override the data argument:
from nemo.collections import llm
recipe = llm.qwen3_8b.finetune_recipe(
name="qwen3_8b_finetuning",
dir=f"/path/to/checkpoints",
num_nodes=1,
num_gpus_per_node=8,
peft_scheme='lora', # 'lora', 'none'
packed_sequence=False,
)
# # To override the data argument
# dataloader = a_function_that_configures_your_custom_dataset(
# gbs=gbs,
# mbs=mbs,
# seq_length=recipe.model.config.seq_length,
# )
# recipe.data = dataloader
By default, the fine-tuning recipe will run LoRA finetuning with LoRA applied to all linear layers in the language model.
To fine-tune the entire model without LoRA, set peft_scheme='none'
in the recipe argument.
To fine-tune with sequence packing for a higher throughput, set packed_sequence=True
. Note that you may need to
tune the global batch size in order to achieve similar convergence.
Note
The configuration in the recipes is done using the NeMo-Run run.Config
and run.Partial
configuration objects.
Please review the NeMo-Run documentation to learn more about its configuration and execution system.
Once you have your final configuration ready, you can execute it on any of the NeMo-Run supported executors. The simplest is the local executor, which just runs the pretraining locally in a separate process. You can use it as follows:
import nemo_run as run
run.run(pretrain, executor=run.LocalExecutor())
Additionally, you can also run it directly in the same Python process as follows:
run.run(pretrain, direct=True)