Qwen3-VL#
Qwen3-VL is the latest generation of vision-language models from Alibaba Cloud, supporting multimodal understanding across text, images, and videos. Qwen3-VL includes both dense models and Mixture-of-Experts (MoE) variants for improved efficiency.
NeMo Megatron Bridge supports finetuning Qwen3-VL models (8B dense and 30B MoE variants).
Tip
We use the following environment variables throughout this page
HF_MODEL_PATH=Qwen/Qwen3-VL-8B-Instruct(orQwen/Qwen3-VL-30B-A3B-Instructfor MoE)MEGATRON_MODEL_PATH=/models/Qwen3-VL-8B-Instruct(feel free to set your own path) Unless explicitly stated, any megatron model path in the commands below should NOT contain the iteration numberiter_xxxxxx. For more details on checkpointing, please see here
Conversion with 🤗 Hugging Face#
Import HF → Megatron#
To import the HF model to your desired $MEGATRON_MODEL_PATH, run the following command.
python examples/conversion/convert_checkpoints.py import \
--hf-model $HF_MODEL_PATH \
--megatron-path $MEGATRON_MODEL_PATH
Export Megatron → HF#
You can export a trained model with the following command.
python examples/conversion/convert_checkpoints.py export \
--hf-model $HF_MODEL_PATH \
--megatron-path <trained megatron model path> \
--hf-path <output hf model path>
Run In-Framework Inference on Converted Checkpoint#
You can run a quick sanity check on the converted checkpoint with the following command.
python examples/conversion/hf_to_megatron_generate_vlm.py \
--hf_model_path $HF_MODEL_PATH \
--megatron_model_path $MEGATRON_MODEL_PATH \
--image_path <example image path> \
--prompt "Describe this image." \
--max_new_tokens 100
Finetuning Recipes#
Before training, ensure the following environment variables are set:
SAVE_DIR: to specify a checkpoint and log saving directoryHF_TOKEN: to download models from HF Hub (if required)HF_HOME: (optional) to avoid re-downloading models and datasetsWANDB_API_KEY: (optional) to enable WandB logging
Full Finetuning#
Example usage for full parameter finetuning:
torchrun --nproc-per-node=8 examples/recipes/qwen_vl/finetune_qwen_vl.py \
--pretrained-checkpoint $MEGATRON_MODEL_PATH \
--recipe qwen3_vl_8b_finetune_config \
--dataset-type hf \
dataset.maker_name=make_cord_v2_dataset \
train.global_batch_size=<batch size> \
train.train_iters=<number of iterations> \
logger.wandb_project=<optional wandb project name> \
logger.wandb_save_dir=$SAVE_DIR \
checkpoint.save=$SAVE_DIR/<experiment name>
For MoE models with expert parallelism:
torchrun --nproc-per-node=8 examples/recipes/qwen_vl/finetune_qwen_vl.py \
--pretrained-checkpoint $MEGATRON_MODEL_PATH \
--recipe qwen3_vl_3b_active_30b_moe_finetune_config \
--dataset-type hf \
dataset.maker_name=make_cord_v2_dataset \
train.global_batch_size=<batch size> \
train.train_iters=<number of iterations> \
checkpoint.save=$SAVE_DIR/<experiment name>
Note:
The
--recipeparameter selects the model configuration:qwen3_vl_8b_finetune_config- for 8B dense modelqwen3_vl_3b_active_30b_moe_finetune_config- for 30B MoE model
For dataset formats and additional information, refer to the [Qwen2.5-VL documentation]
See the full script with examples at
examples/recipes/qwen_vl/finetune_qwen_vl.py
Hugging Face Model Cards#
Qwen3-VL-8B:
https://huggingface.co/Qwen/Qwen3-VL-8B-InstructQwen3-VL-30B-A3B (MoE):
https://huggingface.co/Qwen/Qwen3-VL-30B-A3B-Instruct