stages.audio.inference.asr_nemo#

Module Contents#

Classes#

InferenceAsrNemoStage

Stage that do speech recognition inference using NeMo model.

API#

class stages.audio.inference.asr_nemo.InferenceAsrNemoStage#

Bases: nemo_curator.stages.base.ProcessingStage[nemo_curator.tasks.FileGroupTask | nemo_curator.tasks.DocumentBatch | nemo_curator.tasks.AudioBatch, nemo_curator.tasks.AudioBatch]

Stage that do speech recognition inference using NeMo model.

Args: model_name (str): name of the speech recognition NeMo model. See full list at https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/all_chkpt.html asr_model (Any): ASR model object. Defaults to None filepath_key (str): which key of the data object should be used to find the path to audiofile. Defaults to “audio_filepath” pred_text_key (str): key is used to identify the field containing the predicted transcription associated with a particular audio sample. Defaults to “pred_text” name (str): Stage name. Defaults to “ASR_inference”

asr_model: Any | None#

None

check_cuda() torch.device#
filepath_key: str#

‘audio_filepath’

inputs() tuple[list[str], list[str]]#

Define the input attributes required by this stage.

Returns: Tuple of (top_level_attrs, data_attrs) where: - top_level_attrs: [“data”] - requires FileGroupTask.data to be populated

model_name: str#

None

outputs() tuple[list[str], list[str]]#

Define the output attributes produced by this stage.

Returns: Tuple of (top_level_attrs, data_attrs) where: - top_level_attrs: [“data”] - populates FileGroupTask.data - data_attrs: [self.filepath_key, self.pred_text_key] - audiofile path and predicted text.

pred_text_key: str#

‘pred_text’

process(
task: nemo_curator.tasks.FileGroupTask | nemo_curator.tasks.DocumentBatch | nemo_curator.tasks.AudioBatch,
) nemo_curator.tasks.AudioBatch#

Process a audio task by reading audio file and do ASR inference.

Args: tasks: List of FileGroupTask containing a path to audop file for inference.

Returns: List of SpeechObject with self.filepath_key . If errors occur, the task is returned with error information stored.

setup(
_worker_metadata: nemo_curator.backends.base.WorkerMetadata = None,
) None#

Initialise heavy object self.asr_model: nemo_asr.models.ASRModel

setup_on_node(
_node_info: nemo_curator.backends.base.NodeInfo | None = None,
_worker_metadata: nemo_curator.backends.base.WorkerMetadata = None,
) None#

Setup method called once per node in distributed settings. Override this method to perform node-level initialization. Args: node_info (NodeInfo, optional): Information about the node (provided by some backends) worker_metadata (WorkerMetadata, optional): Information about the worker (provided by some backends)

transcribe(files: list[str]) list[str]#

Run inference for speech recognition model Args: files: list of audio file paths.

Returns: list of predicted texts.