Qwen2.5-VL
Qwen2.5-VL
Qwen2.5-VL is Alibaba Cloud’s vision language model series supporting image and video understanding. It features dynamic resolution processing and integrates with the Qwen2.5 language backbone.
Available Models
- Qwen2.5-VL-72B-Instruct
- Qwen2.5-VL-32B-Instruct
- Qwen2.5-VL-7B-Instruct
- Qwen2.5-VL-3B-Instruct
- Qwen2-VL-7B-Instruct, Qwen2-VL-2B-Instruct (Qwen2 VL)
Architectures
Qwen2_5VLForConditionalGeneration— Qwen2.5-VLQwen2VLForConditionalGeneration— Qwen2-VL
Example HF Models
Example Recipes
Try with NeMo AutoModel
1. Install (full instructions):
2. Clone the repo to get the example recipes:
3. Run the recipe from inside the repo:
Run with Docker
1. Pull the container and mount a checkpoint directory:
2. Navigate to the AutoModel directory (where the recipes are):
3. Run the recipe:
See the Installation Guide and VLM Fine-Tuning Guide.
Fine-Tuning
See the VLM Fine-Tuning Guide.