Multimodal API - NVIDIA Docs

NVIDIA Docs Hub NVIDIA NeMo Framework User Guide Multimodal API

Model Classes

class nemo.collections.nlp.models.language_modeling.megatron_base_model.MegatronBaseModel(*args: Any, **kwargs: Any)

Bases: nemo.collections.nlp.models.nlp_model.NLPModel

Megatron base class. All NeMo Megatron models inherit from this class.

Initialize the model parallel world for nemo.
Turn on all of the nvidia optimizations.
If cfg.tokenizer is available, it loads the tokenizer and pad the vocab to the correct size for tensor model parallelism.
If using distributed optimizer, configure to be compatible with O2 level optimizations and/or model parallelism.
Perform gradient clipping: grad_clip_pl_default triggers the PyTorch Lightning default implementation, with_distributed_adam triggers the distributed optimizer’s implementation, megatron_amp_O2 triggers gradient clipping on the main grads, and otherwise gradient clipping is performed on the model grads.

__init__(cfg: omegaconf.dictconfig.DictConfig, trainer: pytorch_lightning.trainer.trainer.Trainer, no_lm_init=True)

Base class from which all NeMo models should inherit

Parameters

class nemo.collections.multimodal.models.text_to_image.stable_diffusion.ldm.ddpm.MegatronLatentDiffusion(*args: Any, **kwargs: Any)

Bases: nemo.collections.nlp.parts.mixins.nlp_adapter_mixins.NLPAdapterModelMixin, nemo.collections.nlp.models.language_modeling.megatron_base_model.MegatronBaseModel

Megatron LatentDiffusion Model.

__init__(cfg: omegaconf.DictConfig, trainer: pytorch_lightning.Trainer)

Base class from which all NeMo models should inherit

Parameters

setup(stage=None)

PTL hook that is executed after DDP spawns.

Parameters

training_step(batch)

Modules

class nemo.collections.multimodal.modules.stable_diffusion.diffusionmodules.openaimodel.UNetModel(*args: Any, **kwargs: Any)

Bases: torch.nn.Module

The full UNet model with attention and timestep embedding.

Parameters

class nemo.collections.multimodal.modules.imagen.diffusionmodules.nets.UNetModel(*args: Any, **kwargs: Any)

Bases: torch.nn.Module

The full UNet model with attention and timestep embedding used for Imagen Base and SR model.

Parameters

class nemo.collections.multimodal.modules.imagen.diffusionmodules.nets.EfficientUNetModel(*args: Any, **kwargs: Any)

Bases: torch.nn.Module

The full Efficient UNet model with attention and timestep embedding used for Imagen SR model.

Parameters

class nemo.collections.multimodal.models.text_to_image.stable_diffusion.ldm.autoencoder.AutoencoderKL(*args: Any, **kwargs: Any)

Bases: pytorch_lightning.LightningModule

__init__(ddconfig, embed_dim, lossconfig=None, ckpt_path=None, ignore_keys=[], image_key='image', colorize_nlabels=None, monitor=None, from_pretrained: Optional[str] = None)

decode(z)

encode(x)

class nemo.collections.multimodal.modules.stable_diffusion.encoders.modules.FrozenMegatronCLIPEmbedder(*args: Any, **kwargs: Any)

Bases: nemo.collections.multimodal.modules.stable_diffusion.encoders.modules.AbstractEmbModel

__init__(restore_from_path, device='cuda', layer='last', freeze=True, cfg=None, always_return_pooled=False, enable_lora_finetune=False)

forward(text)

class nemo.collections.multimodal.modules.imagen.encoder.t5encoder.T5Encoder(*args: Any, **kwargs: Any)

Bases: torch.nn.Module

__init__(max_seq_len=512, encoder_path=None)

Initialize the T5 Encoder.

Parameters

encode(text_batch, device='cuda')

Datasets

class nemo.collections.multimodal.data.common.webdataset.WebDatasetCommon(*args: Any, **kwargs: Any)

class nemo.collections.multimodal.data.dreambooth.dreambooth_dataset.DreamBoothDataset(*args: Any, **kwargs: Any)

Bases: torch.utils.data.Dataset

A dataset to prepare the instance and class images with the prompts for fine-tuning the model. It pre-processes the images and the tokenizes prompts.

Parameters

Previous DreamFusion

Next Text-to-Speech (TTS)