Vision Language Models#

NeMo 2.0 has everything needed to train Large Vision Language Models (VLMs). NeMo 2.0 uses NeMo-Run to make it easy to scale VLMs to thousands of GPUs. The following VLMs are currently supported in NeMo 2.0:

Default configurations are provided for each model and are outlined in the model-specific documentation linked above. Each configuration can be modified to train on new datasets or to test new model hyperparameters.