Common Collection#

The common collection contains things that could be used across all collections.


Wrapper of HuggingFace AutoTokenizer


Args: model_path: path to sentence piece tokenizer model. To create the model use create_spt_model() special_tokens: either list of special tokens or dictionary of token name to token value legacy: when set to True, the previous behavior of the SentecePiece wrapper will be restored,

including the possibility to add special tokens inside wrapper.

Inherit this class to implement a new tokenizer.

nemo.collections.common.tokenizers.TokenizerSpec.__init__(self, /, *args, **kwargs)#

Initialize self. See help(type(self)) for accurate signature.


Sums several losses into one.

param num_inputs

number of input losses

param weights

a list of coefficient for merging losses



Calculates Cross-entropy loss with label smoothing for a batch of sequences.

SmoothedCrossEntropyLoss: 1) excludes padding tokens from loss calculation 2) allows to use label smoothing regularization 3) allows to calculate loss for the desired number of last tokens 4) per_token_reduction - if False disables reduction per token

param label_smoothing

label smoothing regularization coefficient

type label_smoothing


param predict_last_k

parameter which sets the number of last tokens to calculate the loss for, for example 0: (default) calculate loss on the entire sequence (e.g., NMT) 1: calculate loss on the last token only (e.g., LM evaluation) Intermediate values allow to control the trade-off between eval time (proportional to the number of batches) and eval performance (proportional to the number of context tokens)

type predict_last_k


param pad_id

padding id

type pad_id


param eps

the small eps number to avoid division buy zero

type eps


implements start and end loss of a span e.g. for Question Answering.


class nemo.collections.common.metrics.Perplexity(*args: Any, **kwargs: Any)[source]#

Bases: torchmetrics.Metric

This class computes mean perplexity of distributions in the last dimension of inputs. It is a wrapper around torch.distributions.Categorical.perplexity method. You have to provide either probs or logits to the update() method. The class computes perplexities for distributions passed to update() method in probs or logits arguments and averages the perplexities. Reducing results between all workers is done via SUM operations. See PyTorch Lightning Metrics for the metric usage instructions. :param compute_on_step: Forward only calls update() and returns None if this is set to False. default: True :param dist_sync_on_step: Synchronize metric state across processes at each forward()

before returning the value at the step.

  • process_group

    Specify the process group on which synchronization is called. default: None (which selects the entire


  • validate_args – If True values of update() method parameters are checked. logits has to not contain NaNs and probs last dim has to be valid probability distribution.


Returns perplexity across all workers and resets to 0 perplexities_sum and num_distributions.

update(probs=None, logits=None)[source]#

Updates perplexities_sum and num_distributions. :param probs: A torch.Tensor which innermost dimension is valid probability distribution. :param logits: A torch.Tensor without NaNs.