NeMo Speech Intent Classification and Slot Filling collection API

Model Classes

class nemo.collections.asr.models.SLUIntentSlotBPEModel(*args: Any, **kwargs: Any)

Bases: nemo.collections.asr.models.asr_model.ASRModel, nemo.collections.asr.models.asr_model.ExportableEncDecModel, nemo.collections.asr.parts.mixins.mixins.ASRModuleMixin, nemo.collections.asr.parts.mixins.mixins.ASRBPEMixin, nemo.collections.asr.parts.mixins.transcription.ASRTranscriptionMixin

Model for end-to-end speech intent classification and slot filling, which is formulated as a speech-to-sequence task

forward(input_signal=None, input_signal_length=None, target_semantics=None, target_semantics_length=None, processed_signal=None, processed_signal_length=None)

Forward pass of the model.

Params:

input_signal: Tensor that represents a batch of raw audio signals, of shape [B, T]. T here represents timesteps, with 1 second of audio represented as self.sample_rate number of floating point values.

input_signal_length: Vector of length B, that contains the individual lengths of the audio sequences.

target_semantics: Tensor that represents a batch of semantic tokens, of shape [B, L].

target_semantics_length: Vector of length B, that contains the individual lengths of the semantic sequences.

processed_signal: Tensor that represents a batch of processed audio signals, of shape (B, D, T) that has undergone processing via some DALI preprocessor.

processed_signal_length: Vector of length B, that contains the individual lengths of the processed audio sequences.

Returns

A tuple of 3 elements - 1) The log probabilities tensor of shape [B, T, D]. 2) The lengths of the output sequence after decoder, of shape [B]. 3) The token predictions of the model of shape [B, T].

property input_types: Optional[Dict[str, nemo.core.neural_types.neural_type.NeuralType]]

Define these to enable input neural type checks

classmethod list_available_models() Optional[nemo.core.classes.common.PretrainedModelInfo]

This method returns a list of pre-trained model which can be instantiated directly from NVIDIA’s NGC cloud.

Returns

List of available pre-trained models.

property output_types: Optional[Dict[str, nemo.core.neural_types.neural_type.NeuralType]]

Define these to enable output neural type checks

setup_test_data(test_data_config: Optional[Union[omegaconf.DictConfig, Dict]])

Sets up the test data loader via a Dict-like object.

Parameters

test_data_config – A config that contains the information regarding construction of an ASR Training dataset.

Supported Datasets:
setup_training_data(train_data_config: Optional[Union[omegaconf.DictConfig, Dict]])

Sets up the training data loader via a Dict-like object.

Parameters

train_data_config – A config that contains the information regarding construction of an ASR Training dataset.

Supported Datasets:
setup_validation_data(val_data_config: Optional[Union[omegaconf.DictConfig, Dict]])

Sets up the validation data loader via a Dict-like object.

Parameters

val_data_config – A config that contains the information regarding construction of an ASR Training dataset.

Supported Datasets:
transcribe(audio: Union[List[str], torch.utils.data.DataLoader], batch_size: int = 4, return_hypotheses: bool = False, num_workers: int = 0, verbose: bool = True) Union[List[str], List[Hypothesis], Tuple[List[str]], Tuple[List[Hypothesis]]]

Uses greedy decoding to transcribe audio files into SLU semantics. Use this method for debugging and prototyping.

Parameters
  • audio – (a single or list) of paths to audio files or a np.ndarray audio array. Can also be a dataloader object that provides values that can be consumed by the model. Recommended length per file is between 5 and 25 seconds. But it is possible to pass a few hours long file if enough GPU memory is available.

  • batch_size – (int) batch size to use during inference. Bigger will result in better throughput performance but would use more memory.

  • return_hypotheses – (bool) Either return hypotheses or text With hypotheses can do some postprocessing like getting timestamp or rescoring

  • num_workers – (int) number of workers for DataLoader

  • verbose – (bool) whether to display tqdm progress bar

Returns

A list of transcriptions (or raw log probabilities if logprobs is True) in the same order as paths2audio_files

Mixins