NeMo multimodal provides implementations of multiple image-to-text models, including Stable Diffusion, Imagen, DreamBooth, ControlNet, and InstructPix2Pix. Please refer to NeMo Framework User Guide for Multimodal Models for detailed support information.
Text to Image Models
Previous
CLIP
Next
Datasets