`stages.audio.inference.asr_nemo`#

Module Contents#

Classes#

InferenceAsrNemoStage

Stage that do speech recognition inference using NeMo model.

API#

class stages.audio.inference.asr_nemo.InferenceAsrNemoStage#

Bases: nemo_curator.stages.base.ProcessingStage[nemo_curator.tasks.FileGroupTask | nemo_curator.tasks.DocumentBatch | nemo_curator.tasks.AudioBatch, nemo_curator.tasks.AudioBatch]

Stage that do speech recognition inference using NeMo model.

Args: model_name (str): name of the speech recognition NeMo model. See full list at https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/all_chkpt.html asr_model (Any): ASR model object. Defaults to None filepath_key (str): which key of the data object should be used to find the path to audiofile. Defaults to “audio_filepath” pred_text_key (str): key is used to identify the field containing the predicted transcription associated with a particular audio sample. Defaults to “pred_text” name (str): Stage name. Defaults to “ASR_inference”

asr_model: Any | None#: None

batch_size: int#: 16

check_cuda() → torch.device#

filepath_key: str#: ‘audio_filepath’

inputs() → tuple[list[str], list[str]]#

Define the input attributes required by this stage.

Returns: Tuple of (top_level_attrs, data_attrs) where: - top_level_attrs: [“data”] - requires FileGroupTask.data to be populated

model_name: str#: None

name: str#: ‘ASR_inference’

outputs() → tuple[list[str], list[str]]#

Define the output attributes produced by this stage.

Returns: Tuple of (top_level_attrs, data_attrs) where: - top_level_attrs: [“data”] - populates FileGroupTask.data - data_attrs: [self.filepath_key, self.pred_text_key] - audiofile path and predicted text.

pred_text_key: str#: ‘pred_text’

process( task: nemo_curator.tasks.FileGroupTask | nemo_curator.tasks.DocumentBatch | nemo_curator.tasks.AudioBatch, ) → nemo_curator.tasks.AudioBatch#

Process a audio task by reading audio file and do ASR inference.

Args: tasks: List of FileGroupTask containing a path to audop file for inference.

Returns: List of SpeechObject with self.filepath_key . If errors occur, the task is returned with error information stored.

resources: nemo_curator.stages.resources.Resources#: ‘field(…)’

setup( _worker_metadata: nemo_curator.backends.base.WorkerMetadata = None, ) → None#: Initialise heavy object self.asr_model: nemo_asr.models.ASRModel

setup_on_node( _node_info: nemo_curator.backends.base.NodeInfo | None = None, _worker_metadata: nemo_curator.backends.base.WorkerMetadata = None, ) → None#: Setup method called once per node in distributed settings. Override this method to perform node-level initialization. Args: node_info (NodeInfo, optional): Information about the node (provided by some backends) worker_metadata (WorkerMetadata, optional): Information about the worker (provided by some backends)

transcribe(files: list[str]) → list[str]#

Run inference for speech recognition model Args: files: list of audio file paths.

Returns: list of predicted texts.

stages.audio.inference.asr_nemo#

Module Contents#

Classes#

API#

`stages.audio.inference.asr_nemo`#