Qwen 3.5#
Qwen3.5 is a family of vision-language models supporting multimodal understanding across text, images, and videos. Qwen3.5-VL includes both dense models and Mixture-of-Experts (MoE) variants for improved efficiency at scale.
Qwen 3.5 models feature a hybrid architecture combining GDN (Gated DeltaNet) layers with standard attention layers, SwiGLU activations, and RMSNorm. MoE variants use top-k routing with shared experts for better quality.
Qwen 3.5 models are supported via Megatron Bridge with auto-detected configuration and weight mapping.
Important
Please upgrade to transformers >= 5.2.0 in order to use the Qwen 3.5 models.
Available Models#
Dense Models#
Qwen3.5 0.8B (
Qwen/Qwen3.5-0.8B): 0.8B parameter vision-language modelRecommended: 1 node, 8 GPUs
Qwen3.5 2B (
Qwen/Qwen3.5-2B): 2B parameter vision-language modelRecommended: 1 node, 8 GPUs
Qwen3.5 4B (
Qwen/Qwen3.5-4B): 4B parameter vision-language modelRecommended: 1 node, 8 GPUs
Qwen3.5 9B (
Qwen/Qwen3.5-9B): 9B parameter vision-language modelRecommended: 1 node, 8 GPUs
Qwen3.5 27B (
Qwen/Qwen3.5-27B): 27B parameter vision-language modelRecommended: 2 nodes, 16 GPUs
Mixture-of-Experts (MoE) Models#
Qwen3.5 35B-A3B (
Qwen/Qwen3.5-35B-A3B): 35B total parameters, 3B activated per tokenRecommended: 2 nodes, 16 GPUs
Qwen3.5 122B-A10B (
Qwen/Qwen3.5-122B-A10B): 122B total parameters, 10B activated per tokenRecommended: 4 nodes, 32 GPUs
Qwen3.5 397B-A17B (
Qwen/Qwen3.5-397B-A17B): 397B total parameters, 17B activated per token512 experts with top-10 routing and shared experts
Recommended: 16 nodes, 128 GPUs
Examples#
For checkpoint conversion, inference, finetuning recipes, and step-by-step training guides, see the Qwen 3.5 Examples.
Hugging Face Model Cards#
Qwen3.5 0.8B: https://huggingface.co/Qwen/Qwen3.5-0.8B
Qwen3.5 2B: https://huggingface.co/Qwen/Qwen3.5-2B
Qwen3.5 4B: https://huggingface.co/Qwen/Qwen3.5-4B
Qwen3.5 9B: https://huggingface.co/Qwen/Qwen3.5-9B
Qwen3.5 27B: https://huggingface.co/Qwen/Qwen3.5-27B
Qwen3.5 35B-A3B (MoE): https://huggingface.co/Qwen/Qwen3.5-35B-A3B
Qwen3.5 122B-A10B (MoE): https://huggingface.co/Qwen/Qwen3.5-122B-A10B
Qwen3.5 397B-A17B (MoE): https://huggingface.co/Qwen/Qwen3.5-397B-A17B