Multimodal Language Models

User Guide (Latest Version)

The endeavor to extend Language Models (LLMs) into multimodal domains by integrating additional structures like visual encoders has become a focal point of recent research, especially given its potential to significantly lower the cost compared to training multimodal universal models from scratch. Please refer to NeMo Framework User Guide for Multimodal Models for detailed support information.

Previous Quantization
Next Multimodal Language Model Datasets
© | | | | | | |. Last updated on May 30, 2024.