Large language Model API

Pretraining Model Classes

class nemo.collections.nlp.models.language_modeling.megatron_base_model.MegatronBaseModel(*args: Any, **kwargs: Any)

Bases: nemo.collections.nlp.models.nlp_model.NLPModel

Megatron base class. All NeMo Megatron models inherit from this class.

Initialize the model parallel world for nemo.
Turn on all of the nvidia optimizations.
If cfg.tokenizer is available, it loads the tokenizer and pad the vocab to the correct size for tensor model parallelism.
If using distributed optimizer, configure to be compatible with O2 level optimizations and/or model parallelism.
Perform gradient clipping: grad_clip_pl_default triggers the PyTorch Lightning default implementation, with_distributed_adam triggers the distributed optimizer’s implementation, megatron_amp_O2 triggers gradient clipping on the main grads, and otherwise gradient clipping is performed on the model grads.

__init__(cfg: omegaconf.dictconfig.DictConfig, trainer: pytorch_lightning.trainer.trainer.Trainer, no_lm_init=True)

Base class from which all NeMo models should inherit

Parameters

class nemo.collections.nlp.models.language_modeling.megatron_gpt_model.MegatronGPTModel(*args: Any, **kwargs: Any)

Bases: nemo.collections.nlp.models.language_modeling.megatron_base_model.MegatronBaseModel, nemo.collections.nlp.modules.common.transformer.text_generation.TextGeneration

Megatron GPT pretraining

generate(inputs: Union[List[str], torch.Tensor, List[dict]], length_params: nemo.collections.nlp.modules.common.transformer.text_generation.LengthParam, sampling_params: Optional[nemo.collections.nlp.modules.common.transformer.text_generation.SamplingParam] = None, *, strategy: Optional[nemo.collections.nlp.modules.common.text_generation_strategy.TextGenerationStrategy] = None) → nemo.collections.nlp.modules.common.transformer.text_generation.OutputType

Public method to generate text.

Parameters

inputs (Union[List[str], Tensor, List[dict]]) –
Can be one of the 3 types:
1. List of strings. Each element of the list provides input prompt. The model will apply tokenizer on it.
  E.g [‘sentence’, ‘sentence2’ … ]
2. Tuple of Pytorch Tensors (context_tokens, context_lengths). The context_tokens has shape (batch_size, seq_length), it’s the batched sequences of tokens used as a prompst for the generation or as model inputs to the encoder.
  The generative model will skip the tokenization and padding step. The context_lengths has shape (batch_size,), it indicates the length of the context tokens for each of the input sequences. E.g. ( torch.tensor([[23,5234,23,35,…], [223,323,23,23232,232,…] …]), torch.tensor([20, 30, …]))
3. List of python dict objects. Used for prompt/p-tuning inputs where a set of key-value pairs are converted into input token embeddings for the model.
  E.g. [{“prompt-tag”: “sentiment”, “sentence”: “this is a good movie”}, {“prompt-tag”: “qa”, “context”: “some context text”, “question”: “a simple question”} … ] where ‘prompt-tag’ is used to identify the type of NLP task to solve.
length_params (LengthParam) –
a dictionary type which controls the sampling length.

max_length: int, The maximum length of the sequence to be generated.

min_length: int, The minimum length of the sequence to be generated.

If None, max_length is set to 30, and min_length is set to None
sampling_params (SamplingParam) –
a dictionary type which contains the parameters for text sampling. It has the following keys

use_greedy: bool, Whether or not to use sampling ; use greedy decoding otherwise top_k: int, The number of highest probability vocabulary tokens to keep for top-k-filtering. top_p: float, If set to float < 1, only the most probable tokens with probabilities that add up to top_p or higher are kept for generation. repetition_penalty: float, The parameter for repetition penalty. 1.0 means no penalty. add_BOS: bool, Whether add the bos token at the begining of the prompt all_probs: bool # whether return the log prob for all the tokens in vocab compute_logprob: bool # a flag used to compute logprob of all the input text, a very special case of running inference, default False end_strings: List[str] # generation will stop when one of these tokens is generated

Default None, If it is None, use_greedy will be “True”.

Returns

It generates the output in a dictionary type. It has the following keys:

sentences: List[str], output sentences tokens: List[List[str]], output sentences borken into tokens logprob: List[List[float]], log prob of generated tokens full_logprob: List[List[float]], log prob of all the tokens in the vocab token_ids: List[List[int]], output sentence token ids offsets: List[List[int]] # list of tokens start positions in text

Return type

OutputType

on_load_checkpoint(checkpoint) → None

on_save_checkpoint(checkpoint) → None

setup(stage=None)

PTL hook that is executed after DDP spawns.

Parameters

training_step(dataloader_iter)

validation_step(dataloader_iter, dataloader_idx=0)

class nemo.collections.nlp.models.language_modeling.megatron_bert_model.MegatronBertModel(*args: Any, **kwargs: Any)

Bases: nemo.collections.nlp.models.language_modeling.megatron_base_model.MegatronBaseModel

Megatron Bert pretraining. Model returns [batch, seq, hidden] shape

on_load_checkpoint(checkpoint) → None

on_save_checkpoint(checkpoint) → None

setup(stage=None)

PTL hook that is executed after DDP spawns.

Parameters

class nemo.collections.nlp.models.language_modeling.megatron_bart_model.MegatronBARTModel(*args: Any, **kwargs: Any)

Bases: nemo.collections.nlp.models.language_modeling.megatron_t5_model.MegatronT5Model

Megatron BART pretraining

setup(stage=None)

PTL hook that is executed after DDP spawns.

Parameters

training_step(dataloader_iter)

validation_step(dataloader_iter)

class nemo.collections.nlp.models.language_modeling.megatron_retrieval_model.MegatronRetrievalModel(*args: Any, **kwargs: Any)

Bases: nemo.collections.nlp.models.language_modeling.megatron_base_model.MegatronBaseModel, nemo.collections.nlp.modules.common.transformer.text_generation.TextGeneration

Megatron Retrieval enhanced language model

generate(inputs: Union[List[str], torch.Tensor, List[dict]], length_params: nemo.collections.nlp.modules.common.transformer.text_generation.LengthParam, sampling_params: Optional[nemo.collections.nlp.modules.common.transformer.text_generation.SamplingParam] = None, **args) → nemo.collections.nlp.modules.common.transformer.text_generation.OutputType

Public method to generate text.

Parameters

inputs (Union[List[str], Tensor, List[dict]]) –
Can be one of the 3 types:
1. List of strings. Each element of the list provides input prompt. The model will apply tokenizer on it.
  E.g [‘sentence’, ‘sentence2’ … ]
2. Tuple of Pytorch Tensors (context_tokens, context_lengths). The context_tokens has shape (batch_size, seq_length), it’s the batched sequences of tokens used as a prompst for the generation or as model inputs to the encoder.
  The generative model will skip the tokenization and padding step. The context_lengths has shape (batch_size,), it indicates the length of the context tokens for each of the input sequences. E.g. ( torch.tensor([[23,5234,23,35,…], [223,323,23,23232,232,…] …]), torch.tensor([20, 30, …]))
3. List of python dict objects. Used for prompt/p-tuning inputs where a set of key-value pairs are converted into input token embeddings for the model.
  E.g. [{“prompt-tag”: “sentiment”, “sentence”: “this is a good movie”}, {“prompt-tag”: “qa”, “context”: “some context text”, “question”: “a simple question”} … ] where ‘prompt-tag’ is used to identify the type of NLP task to solve.
length_params (LengthParam) –
a dictionary type which controls the sampling length.

max_length: int, The maximum length of the sequence to be generated.

min_length: int, The minimum length of the sequence to be generated.

If None, max_length is set to 30, and min_length is set to None
sampling_params (SamplingParam) –
a dictionary type which contains the parameters for text sampling. It has the following keys

use_greedy: bool, Whether or not to use sampling ; use greedy decoding otherwise top_k: int, The number of highest probability vocabulary tokens to keep for top-k-filtering. top_p: float, If set to float < 1, only the most probable tokens with probabilities that add up to top_p or higher are kept for generation. repetition_penalty: float, The parameter for repetition penalty. 1.0 means no penalty. add_BOS: bool, Whether add the bos token at the begining of the prompt all_probs: bool # whether return the log prob for all the tokens in vocab compute_logprob: bool # a flag used to compute logprob of all the input text, a very special case of running inference, default False end_strings: List[str] # generation will stop when one of these tokens is generated

Default None, If it is None, use_greedy will be “True”.

Returns

It generates the output in a dictionary type. It has the following keys:

sentences: List[str], output sentences tokens: List[List[str]], output sentences borken into tokens logprob: List[List[float]], log prob of generated tokens full_logprob: List[List[float]], log prob of all the tokens in the vocab token_ids: List[List[int]], output sentence token ids offsets: List[List[int]] # list of tokens start positions in text

Return type

OutputType

setup(stage=None)

Called at the beginning of fit, validate, test, or predict. This is called on every process when using DDP.

Parameters

class nemo.collections.nlp.models.language_modeling.megatron_t5_model.MegatronT5Model(*args: Any, **kwargs: Any)

Bases: nemo.collections.nlp.models.language_modeling.megatron_lm_encoder_decoder_model.MegatronLMEncoderDecoderModel

Megatron T5 pretraining

complete(request: Dict)

Autoregressively invokes language model in the inference mode

Parameters

request – Dictionary with the following fields * prompt: a string which text the model should complete. * tokens_to_generate: how many tokens to generate while doing prompt completion.

Returns

A python dictionary with the following fields

prompt: original text of the prompt
tokenized_prompt: list of (str) tokens from prompt
completion: a python dictionary with the following subfields:
- tokens: a list of triples (token, token_id, log_prob) comprising completion
- text: completion text (as a single string)

Return type

response

decode(tokens_enc, enc_mask, num_tokens_to_generate, encoder_input=None, tokenizer=None, enc_output=None, enc_output_attn_mask=None, ignore_ids=[], bos_id=None, predicted_tokens_dec=None, batch_data=None, sampling_method: str = 'greedy-search', sampling_kwargs: dict = {})

Parameters
Returns

encode(tokens_enc, enc_mask, encoder_input=None, batch_data=None, reconfigure_microbatch=True)

Parameters

setup(stage=None)

PTL hook that is executed after DDP spawns.

Parameters

training_step(dataloader_iter)

validation_step(dataloader_iter)

Customization Model Classes

class nemo.collections.nlp.models.language_modeling.megatron_gpt_sft_model.MegatronGPTSFTModel(*args: Any, **kwargs: Any)

Bases: nemo.collections.nlp.parts.mixins.nlp_adapter_mixins.NLPAdapterModelMixin, nemo.collections.nlp.models.language_modeling.megatron_gpt_model.MegatronGPTModel

Megatron GPT Supervised Fine-Tuning

generate(inputs: Union[List[str], torch.Tensor, List[dict]], length_params: nemo.collections.nlp.modules.common.transformer.text_generation.LengthParam, sampling_params: Optional[nemo.collections.nlp.modules.common.transformer.text_generation.SamplingParam] = None, *, strategy: Optional[nemo.collections.nlp.modules.common.text_generation_strategy.TextGenerationStrategy] = None) → nemo.collections.nlp.modules.common.transformer.text_generation.OutputType

Public method to generate text.

Parameters

inputs (Union[List[str], Tensor, List[dict]]) –
Can be one of the 3 types:
1. List of strings. Each element of the list provides input prompt. The model will apply tokenizer on it.
  E.g [‘sentence’, ‘sentence2’ … ]
2. Tuple of Pytorch Tensors (context_tokens, context_lengths). The context_tokens has shape (batch_size, seq_length), it’s the batched sequences of tokens used as a prompst for the generation or as model inputs to the encoder.
  The generative model will skip the tokenization and padding step. The context_lengths has shape (batch_size,), it indicates the length of the context tokens for each of the input sequences. E.g. ( torch.tensor([[23,5234,23,35,…], [223,323,23,23232,232,…] …]), torch.tensor([20, 30, …]))
3. List of python dict objects. Used for prompt/p-tuning inputs where a set of key-value pairs are converted into input token embeddings for the model.
  E.g. [{“prompt-tag”: “sentiment”, “sentence”: “this is a good movie”}, {“prompt-tag”: “qa”, “context”: “some context text”, “question”: “a simple question”} … ] where ‘prompt-tag’ is used to identify the type of NLP task to solve.
length_params (LengthParam) –
a dictionary type which controls the sampling length.

max_length: int, The maximum length of the sequence to be generated.

min_length: int, The minimum length of the sequence to be generated.

If None, max_length is set to 30, and min_length is set to None
sampling_params (SamplingParam) –
a dictionary type which contains the parameters for text sampling. It has the following keys

use_greedy: bool, Whether or not to use sampling ; use greedy decoding otherwise top_k: int, The number of highest probability vocabulary tokens to keep for top-k-filtering. top_p: float, If set to float < 1, only the most probable tokens with probabilities that add up to top_p or higher are kept for generation. repetition_penalty: float, The parameter for repetition penalty. 1.0 means no penalty. add_BOS: bool, Whether add the bos token at the begining of the prompt all_probs: bool # whether return the log prob for all the tokens in vocab compute_logprob: bool # a flag used to compute logprob of all the input text, a very special case of running inference, default False end_strings: List[str] # generation will stop when one of these tokens is generated

Default None, If it is None, use_greedy will be “True”.

Returns

It generates the output in a dictionary type. It has the following keys:

sentences: List[str], output sentences tokens: List[List[str]], output sentences borken into tokens logprob: List[List[float]], log prob of generated tokens full_logprob: List[List[float]], log prob of all the tokens in the vocab token_ids: List[List[int]], output sentence token ids offsets: List[List[int]] # list of tokens start positions in text

Return type

OutputType

setup(stage=None)

PTL hook that is executed after DDP spawns.

Parameters

training_step(dataloader_iter)

validation_step(dataloader_iter)

class nemo.collections.nlp.models.language_modeling.megatron_gpt_adapter_model.MegatronGPTAdapterLearningModel(*args: Any, **kwargs: Any)

Bases: nemo.collections.nlp.models.language_modeling.megatron_gpt_adapter_model.MegatronGPTBaseAdapterModel

MegatronGPTAdapterLearningModel is a model that combines a base model (GPTModel) with a adapters. This class only supports the canonical Adapter training described in Houlsby et al. (https://arxiv.org/pdf/1902.00751.pdf)

Two adapter’s are inserted into each Transformer layer in the base GPT Model.

It is assumed that these set of adapters will then be trained for a specific task. Once trained, the adapter weights will be saved and can be re-loaded and infused into the same GPT Model for inference.

__init__(cfg: omegaconf.dictconfig.DictConfig, trainer: pytorch_lightning.trainer.trainer.Trainer)

Base class from which all NeMo models should inherit

Parameters

generate(inputs: Union[List[str], torch.Tensor, List[dict]], length_params: nemo.collections.nlp.modules.common.transformer.text_generation.LengthParam, sampling_params: Optional[nemo.collections.nlp.modules.common.transformer.text_generation.SamplingParam] = None, batch_size: Optional[int] = 1)

Public method to generate text.

Parameters

inputs (Union[List[str], Tensor, List[dict]]) –
Can be one of the 3 types:
1. List of strings. Each element of the list provides input prompt. The model will apply tokenizer on it.
  E.g [‘sentence’, ‘sentence2’ … ]
2. Tuple of Pytorch Tensors (context_tokens, context_lengths). The context_tokens has shape (batch_size, seq_length), it’s the batched sequences of tokens used as a prompst for the generation or as model inputs to the encoder.
  The generative model will skip the tokenization and padding step. The context_lengths has shape (batch_size,), it indicates the length of the context tokens for each of the input sequences. E.g. ( torch.tensor([[23,5234,23,35,…], [223,323,23,23232,232,…] …]), torch.tensor([20, 30, …]))
3. List of python dict objects. Used for prompt/p-tuning inputs where a set of key-value pairs are converted into input token embeddings for the model.
  E.g. [{“prompt-tag”: “sentiment”, “sentence”: “this is a good movie”}, {“prompt-tag”: “qa”, “context”: “some context text”, “question”: “a simple question”} … ] where ‘prompt-tag’ is used to identify the type of NLP task to solve.
length_params (LengthParam) –
a dictionary type which controls the sampling length.

max_length: int, The maximum length of the sequence to be generated.

min_length: int, The minimum length of the sequence to be generated.

If None, max_length is set to 30, and min_length is set to None
sampling_params (SamplingParam) –
a dictionary type which contains the parameters for text sampling. It has the following keys

use_greedy: bool, Whether or not to use sampling ; use greedy decoding otherwise top_k: int, The number of highest probability vocabulary tokens to keep for top-k-filtering. top_p: float, If set to float < 1, only the most probable tokens with probabilities that add up to top_p or higher are kept for generation. repetition_penalty: float, The parameter for repetition penalty. 1.0 means no penalty. add_BOS: bool, Whether add the bos token at the begining of the prompt all_probs: bool # whether return the log prob for all the tokens in vocab compute_logprob: bool # a flag used to compute logprob of all the input text, a very special case of running inference, default False end_strings: List[str] # generation will stop when one of these tokens is generated

Default None, If it is None, use_greedy will be “True”.

Returns

It generates the output in a dictionary type. It has the following keys:

sentences: List[str], output sentences tokens: List[List[str]], output sentences borken into tokens logprob: List[List[float]], log prob of generated tokens full_logprob: List[List[float]], log prob of all the tokens in the vocab token_ids: List[List[int]], output sentence token ids offsets: List[List[int]] # list of tokens start positions in text

Return type

OutputType

setup(stage=None)

Called at the beginning of fit, validate, test, or predict. This is called on every process when using DDP.

Parameters

state_dict(destination=None, prefix=None, keep_vars=False)

class nemo.collections.nlp.models.language_modeling.megatron_gpt_adapter_model.MegatronGPTInfusedAdapterModel(*args: Any, **kwargs: Any)

Bases: nemo.collections.nlp.models.language_modeling.megatron_gpt_adapter_model.MegatronGPTBaseAdapterModel

MegatronGPTInfusedAdapterModel is a model that combines a base model (GPTModel) with a “Infused Adapter that can Inhibiting and Amplify Inner Activations”, known as IA3. This class supports the addition of IA3 into a transformer based LM as described in Liu et al. (https://arxiv.org/pdf/2205.05638.pdf)

Three adapter’s are inserted into each Transformer layer in the base GPT Model. Each adapter is basically a vector that simply scales the key, value or ffn hidden representations.

It is assumed that these set of adapters will then be trained for a specific task. Once trained, the adapter weights will be saved and can be re-loaded and infused into the same GPT Model for inference.

__init__(cfg: omegaconf.dictconfig.DictConfig, trainer: pytorch_lightning.trainer.trainer.Trainer)

Base class from which all NeMo models should inherit

Parameters

generate(inputs: Union[List[str], torch.Tensor, List[dict]], length_params: nemo.collections.nlp.modules.common.transformer.text_generation.LengthParam, sampling_params: Optional[nemo.collections.nlp.modules.common.transformer.text_generation.SamplingParam] = None, batch_size: Optional[int] = 1)

Public method to generate text.

Parameters

inputs (Union[List[str], Tensor, List[dict]]) –
Can be one of the 3 types:
1. List of strings. Each element of the list provides input prompt. The model will apply tokenizer on it.
  E.g [‘sentence’, ‘sentence2’ … ]
2. Tuple of Pytorch Tensors (context_tokens, context_lengths). The context_tokens has shape (batch_size, seq_length), it’s the batched sequences of tokens used as a prompst for the generation or as model inputs to the encoder.
  The generative model will skip the tokenization and padding step. The context_lengths has shape (batch_size,), it indicates the length of the context tokens for each of the input sequences. E.g. ( torch.tensor([[23,5234,23,35,…], [223,323,23,23232,232,…] …]), torch.tensor([20, 30, …]))
3. List of python dict objects. Used for prompt/p-tuning inputs where a set of key-value pairs are converted into input token embeddings for the model.
  E.g. [{“prompt-tag”: “sentiment”, “sentence”: “this is a good movie”}, {“prompt-tag”: “qa”, “context”: “some context text”, “question”: “a simple question”} … ] where ‘prompt-tag’ is used to identify the type of NLP task to solve.
length_params (LengthParam) –
a dictionary type which controls the sampling length.

max_length: int, The maximum length of the sequence to be generated.

min_length: int, The minimum length of the sequence to be generated.

If None, max_length is set to 30, and min_length is set to None
sampling_params (SamplingParam) –
a dictionary type which contains the parameters for text sampling. It has the following keys

use_greedy: bool, Whether or not to use sampling ; use greedy decoding otherwise top_k: int, The number of highest probability vocabulary tokens to keep for top-k-filtering. top_p: float, If set to float < 1, only the most probable tokens with probabilities that add up to top_p or higher are kept for generation. repetition_penalty: float, The parameter for repetition penalty. 1.0 means no penalty. add_BOS: bool, Whether add the bos token at the begining of the prompt all_probs: bool # whether return the log prob for all the tokens in vocab compute_logprob: bool # a flag used to compute logprob of all the input text, a very special case of running inference, default False end_strings: List[str] # generation will stop when one of these tokens is generated

Default None, If it is None, use_greedy will be “True”.

Returns

It generates the output in a dictionary type. It has the following keys:

sentences: List[str], output sentences tokens: List[List[str]], output sentences borken into tokens logprob: List[List[float]], log prob of generated tokens full_logprob: List[List[float]], log prob of all the tokens in the vocab token_ids: List[List[int]], output sentence token ids offsets: List[List[int]] # list of tokens start positions in text

Return type

OutputType

setup(stage=None)

Called at the beginning of fit, validate, test, or predict. This is called on every process when using DDP.

Parameters

state_dict(destination=None, prefix=None, keep_vars=False)

class nemo.collections.nlp.models.language_modeling.megatron_gpt_prompt_learning_model.MegatronGPTPromptLearningModel(*args: Any, **kwargs: Any)

Bases: nemo.collections.nlp.models.language_modeling.megatron_base_prompt_learning_model.MegatronBasePromptLearningModel

Model class for prompt-tuning or p-tuning a pretrained Megatron GPT model.

Prompt Tuning initalizes virtual prompt embeddings directly from a copy of certain token embeddings from the the pretrained GPT model’s vocabulary and directly tunes these embedding weights. The token embeddings used in initalization are specified by the user in the config file. The model can be prompt-tuned for multiple tasks at once. virtual prompts are stored in a prompt table and can be added or deleted without disrupting virtual prompts for other tasks.

P-tuning initializes an LSTM encoder model that generates virtual prompt embeddings for every task. Each task shares the same encoder. After ptuning is compelete, the learned virtual prompts can be saved to the prompt table using add_ptuned_prompts_to_prompt_table(). Thus, if a user wants to add a new virtual prompt via p-tuning, they do not need to retrain on all previous tasks. This gives p-tuning the same task flexiblity as prompt-tuning.

generate(inputs: Union[List[str], torch.Tensor, List[dict]], length_params: nemo.collections.nlp.modules.common.transformer.text_generation.LengthParam, sampling_params: Optional[nemo.collections.nlp.modules.common.transformer.text_generation.SamplingParam] = None, batch_size: Optional[int] = 1)

Public method to generate text.

Parameters

inputs (Union[List[str], Tensor, List[dict]]) –
Can be one of the 3 types:
1. List of strings. Each element of the list provides input prompt. The model will apply tokenizer on it.
  E.g [‘sentence’, ‘sentence2’ … ]
2. Tuple of Pytorch Tensors (context_tokens, context_lengths). The context_tokens has shape (batch_size, seq_length), it’s the batched sequences of tokens used as a prompst for the generation or as model inputs to the encoder.
  The generative model will skip the tokenization and padding step. The context_lengths has shape (batch_size,), it indicates the length of the context tokens for each of the input sequences. E.g. ( torch.tensor([[23,5234,23,35,…], [223,323,23,23232,232,…] …]), torch.tensor([20, 30, …]))
3. List of python dict objects. Used for prompt/p-tuning inputs where a set of key-value pairs are converted into input token embeddings for the model.
  E.g. [{“prompt-tag”: “sentiment”, “sentence”: “this is a good movie”}, {“prompt-tag”: “qa”, “context”: “some context text”, “question”: “a simple question”} … ] where ‘prompt-tag’ is used to identify the type of NLP task to solve.
length_params (LengthParam) –
a dictionary type which controls the sampling length.

max_length: int, The maximum length of the sequence to be generated.

min_length: int, The minimum length of the sequence to be generated.

If None, max_length is set to 30, and min_length is set to None
sampling_params (SamplingParam) –
a dictionary type which contains the parameters for text sampling. It has the following keys

use_greedy: bool, Whether or not to use sampling ; use greedy decoding otherwise top_k: int, The number of highest probability vocabulary tokens to keep for top-k-filtering. top_p: float, If set to float < 1, only the most probable tokens with probabilities that add up to top_p or higher are kept for generation. repetition_penalty: float, The parameter for repetition penalty. 1.0 means no penalty. add_BOS: bool, Whether add the bos token at the begining of the prompt all_probs: bool # whether return the log prob for all the tokens in vocab compute_logprob: bool # a flag used to compute logprob of all the input text, a very special case of running inference, default False end_strings: List[str] # generation will stop when one of these tokens is generated

Default None, If it is None, use_greedy will be “True”.

Returns

It generates the output in a dictionary type. It has the following keys:

sentences: List[str], output sentences tokens: List[List[str]], output sentences borken into tokens logprob: List[List[float]], log prob of generated tokens full_logprob: List[List[float]], log prob of all the tokens in the vocab token_ids: List[List[int]], output sentence token ids offsets: List[List[int]] # list of tokens start positions in text

Return type

OutputType

setup(stage=None)

Called at the beginning of fit, validate, test, or predict. This is called on every process when using DDP.

Parameters

class nemo.collections.nlp.models.language_modeling.megatron_t5_adapter_model.MegatronT5AdapterLearningModel(*args: Any, **kwargs: Any)

Bases: nemo.collections.nlp.models.language_modeling.megatron_t5_adapter_model.MegatronT5BaseAdapterModel

TODO (@adithyare)

__init__(cfg: omegaconf.dictconfig.DictConfig, trainer: pytorch_lightning.trainer.trainer.Trainer)

Base class from which all NeMo models should inherit

Parameters

setup(stage=None)

Called at the beginning of fit, validate, test, or predict. This is called on every process when using DDP.

Parameters

state_dict(destination=None, prefix=None, keep_vars=False)

class nemo.collections.nlp.models.language_modeling.megatron_t5_adapter_model.MegatronT5AdapterLearningModel(*args: Any, **kwargs: Any)

Bases: nemo.collections.nlp.models.language_modeling.megatron_t5_adapter_model.MegatronT5BaseAdapterModel

TODO (@adithyare)

__init__(cfg: omegaconf.dictconfig.DictConfig, trainer: pytorch_lightning.trainer.trainer.Trainer)

Base class from which all NeMo models should inherit

Parameters

setup(stage=None)

Called at the beginning of fit, validate, test, or predict. This is called on every process when using DDP.

Parameters

state_dict(destination=None, prefix=None, keep_vars=False)

class nemo.collections.nlp.models.language_modeling.megatron_t5_adapter_model.MegatronT5InfusedAdapterModel(*args: Any, **kwargs: Any)

Bases: nemo.collections.nlp.models.language_modeling.megatron_t5_adapter_model.MegatronT5BaseAdapterModel

MegatronGPTInfusedAdapterModel is a model that combines a base model (GPTModel) with a “Infused Adapter that can Inhibiting and Amplify Inner Activations”, known as IA3. This class supports the addition of IA3 into a transformer based LM as described in Liu et al. (https://arxiv.org/pdf/2205.05638.pdf)

Three adapter’s are inserted into each Transformer layer in the base GPT Model. Each adapter is basically a vector that simply scales the key, value or ffn hidden representations.

It is assumed that these set of adapters will then be trained for a specific task. Once trained, the adapter weights will be saved and can be re-loaded and infused into the same GPT Model for inference.

__init__(cfg: omegaconf.dictconfig.DictConfig, trainer: pytorch_lightning.trainer.trainer.Trainer)

Base class from which all NeMo models should inherit

Parameters

setup(stage=None)

Called at the beginning of fit, validate, test, or predict. This is called on every process when using DDP.

Parameters

state_dict(destination=None, prefix=None, keep_vars=False)

Modules

class nemo.collections.nlp.modules.common.megatron.module.MegatronModule(*args: Any, **kwargs: Any)

class nemo.collections.nlp.modules.common.megatron.module.Float16Module(*args: Any, **kwargs: Any)

class nemo.collections.nlp.models.language_modeling.megatron.gpt_model.GPTModel(*args: Any, **kwargs: Any)

class nemo.collections.nlp.modules.common.megatron.token_level_encoder_decoder.MegatronTokenLevelEncoderDecoderModule(*args: Any, **kwargs: Any)

Bases: nemo.collections.nlp.modules.common.megatron.module.MegatronModule, nemo.core.classes.mixins.adapter_mixins.AdapterModuleMixin

Token-based (input/output is tokens) encoder-decoder model (e.g. T5 Language model.)

forward(enc_input_ids=None, enc_attn_mask=None, dec_input_ids=None, dec_attn_mask=None, token_type_ids=None, labels=None, batch_data=None, enc_output=None, enc_output_attn_mask=None, enc_input=None, output_enc_hidden_only=False)

class nemo.collections.nlp.modules.common.megatron.retrieval_token_level_encoder_decoder.MegatronRetrievalTokenLevelEncoderDecoderModule(*args: Any, **kwargs: Any)

Bases: nemo.collections.nlp.modules.common.megatron.module.MegatronModule

Token-based (input/output is tokens) retrieval encoder-decoder model

forward(input_ids, input_attn_mask, retrieved_ids, retrieved_attn_mask, token_type_ids=None, labels=None, input_emb=None, set_inference_key_value_memory=False, inference_max_sequence_len=None, neighbors=None, position_ids=None)

Datasets

class nemo.collections.nlp.data.language_modeling.megatron.blendable_dataset.BlendableDataset(*args: Any, **kwargs: Any)

class nemo.collections.nlp.data.language_modeling.megatron.gpt_dataset.GPTDataset(*args: Any, **kwargs: Any)

class nemo.collections.nlp.data.language_modeling.megatron.gpt_dataset.MockGPTDataset(*args: Any, **kwargs: Any)

class nemo.collections.nlp.data.language_modeling.megatron.bert_dataset.BertDataset(*args: Any, **kwargs: Any)

class nemo.collections.nlp.data.language_modeling.megatron.base_prompt_learning_dataset.BasePromptLearningDataset(*args: Any, **kwargs: Any)

class nemo.collections.nlp.data.language_modeling.megatron.gpt_sft_dataset.GPTSFTDataset(*args: Any, **kwargs: Any)

class nemo.collections.nlp.data.language_modeling.megatron.gpt_sft_chat_dataset.GPTSFTChatDataset(*args: Any, **kwargs: Any)

class nemo.collections.nlp.data.language_modeling.megatron.retro_dataset.RETRODataset(*args: Any, **kwargs: Any)

class nemo.collections.nlp.data.language_modeling.megatron.t5_dataset.T5Dataset(*args: Any, **kwargs: Any)

class nemo.collections.nlp.data.language_modeling.megatron.t5_prompt_learning_dataset.T5PromptLearningDataset(*args: Any, **kwargs: Any)

class nemo.collections.nlp.data.language_modeling.megatron.ul2_dataset.UL2Dataset(*args: Any, **kwargs: Any)

Adapter Mixin Class

class nemo.collections.nlp.parts.mixins.nlp_adapter_mixins.NLPAdapterModelMixin(*args, **kwargs)

Bases: object

NLP Adapter Mixin that can augment any transformer-based model with Adapter module support. This mixin class should be used only with a top level ModelPT subclass, that includes either a model or an enc_dec_model submodule. This mixin class adds several utility methods to add, load and save adapters.

An Adapter module is any Pytorch nn.Module that possess a few properties :

It’s input and output dimension are the same, while the hidden dimension need not be the same.
The final layer of the Adapter module is zero-initialized, so that the residual connection to the adapter yields the original output.

This mixin class aims to integrate with PEFT, which is one or more adapters modules. The two features of PEFT, layer selection and weight tying, are also supported in this mixin class.

add_adapter(peft_cfgs: Union[nemo.collections.nlp.parts.peft_config.PEFTConfig, List[nemo.collections.nlp.parts.peft_config.PEFTConfig]])

High level API to add one or more adapter modules to the model, and freeze the base weights This method supports adding adapter modules from PEFTConfig or list of PEFTConfig. It would add corresponding adapter modules. Layer selection and weight tying would be applied if it’s in PEFTConfig

Parameters

load_adapters(filepath: str, peft_cfgs: Optional[Union[nemo.collections.nlp.parts.peft_config.PEFTConfig, List[nemo.collections.nlp.parts.peft_config.PEFTConfig]]] = None, map_location: Optional[str] = None)

Utility method that restores only the adapter module(s), and not the entire model itself. This allows the sharing of adapters which are often just a fraction of the size of the full model, enabling easier delivery.

Note

During restoration, assumes that the model does not currently already have one or more adapter modules.

Parameters

classmethod merge_cfg_with(path: str, cfg: omegaconf.DictConfig) → omegaconf.DictConfig

Merge a given configuration dictionary cfg with the configuration dictionary obtained from restoring a MegatronGPTSFTModel or MegatronT5SFTModel at the specified path.

Parameters
Returns
Return type

Examples

Copy
Copied!

            
            >>> path = "/path/to/model/checkpoint"
>>> cfg = DictConfig({"model": {"key": "value"}, "trainer": {"precision": 16}})
>>> merged_cfg = merge_cfg_with(path, cfg)

Notes

The function resolves variables within the cfg dictionary using OmegaConf.resolve.
Keys in cfg.model will override the corresponding keys in the output dictionary.
If “train_ds” exists in cfg.model.data, it updates micro_batch_size and global_batch_size.
If cfg.trainer contains a “precision” key, it updates output.precision.

classmethod merge_inference_cfg(path: str, cfg: omegaconf.DictConfig) → omegaconf.DictConfig

Generate a configuration dictionary by a given configuration dictionary cfg with the configuration dictionary obtained from restoring a MegatronGPTSFTModel or MegatronT5SFTModel at the specified path and modify cfg for inference

Parameters
Returns
Return type

Examples

Copy
Copied!

            
            >>> path = "/path/to/model/checkpoint"
>>> cfg = DictConfig({"model": {"key": "value"}, "trainer": {"precision": 16}})
>>> merged_cfg = merge_inference_cfg(path, cfg)

Notes

“precision” and “test_ds” from cfg will override the corresponding keys in the output dictionary
“activations_checkpoint” will be ovrrided to None in the output dictionary
“use_flash_attention” will be True if in one of the configuration dictionarys is True
“seq_len_interpolation_factor” will be overrided from cfg if it’s not None from checkpoint

Exportable Model Classes

class nemo.collections.nlp.models.language_modeling.megatron_gpt_model.MegatronGPTExportableModel(*args: Any, **kwargs: Any)

ONNX Export of Megatron Models