Qwen3-Omni
Qwen3-Omni
Qwen3-Omni is Alibaba Cloud’s omnimodal model supporting text, image, audio, and video inputs in a single unified architecture with a MoE language backbone.
Available Models
- Qwen3-Omni-30B-A3B: 30B total, 3B activated (MoE)
Architecture
Qwen3OmniForConditionalGeneration
Example HF Models
Example Recipes
Try with NeMo AutoModel
1. Install (full instructions):
2. Clone the repo to get the example recipes:
3. Run the recipe from inside the repo:
Run with Docker
1. Pull the container and mount a checkpoint directory:
2. Navigate to the AutoModel directory (where the recipes are):
3. Run the recipe:
See the Installation Guide and Omni Fine-Tuning Guide.
Fine-Tuning
See the VLM / Omni Fine-Tuning Guide.