Convert processed audio data from AudioTask to DocumentBatch format using the built-in AudioToDocumentStage. This enables you to export audio processing results or integrate with custom text processing workflows.
The AudioToDocumentStage provides straightforward format conversion between NeMo Curator’s audio and text data structures:
AudioTask objects to DocumentBatch formatCommon use cases:
Use AudioToDocumentStage to convert audio processing results to document format:
Parameters:
AudioToDocumentStage() has no configuration parameters; it performs direct format conversionReturns:
DocumentBatch objects containing a pandas DataFrame with all original audio fieldsThe conversion preserves all fields from your audio processing pipeline:
Field names and values are preserved exactly as they appear in the AudioTask. No data transformation or cleaning is performed during conversion.
The most common use case is adding AudioToDocumentStage at the end of your audio pipeline to enable result export:
Output format: The JsonlWriter creates a JSONL file where each line contains one audio sample with all fields:
While AudioToDocumentStage converts audio data to DocumentBatch format, NeMo Curator’s built-in text processing stages (filters, classifiers, and so on) are designed for text documents, not audio transcriptions. For audio-specific text processing, implement custom stages that operate on the converted DocumentBatch data.
After conversion, your data will be in DocumentBatch format with a pandas DataFrame:
Text Processing Integration: NeMo Curator’s text processing stages are designed for DocumentBatch inputs (text documents such as articles, web pages), but they are not designed for audio-derived transcriptions. You should implement custom processing stages for audio-specific workflows.
Reasons for incompatibility:
Recommendation: Use PreserveByValueStage for audio quality filtering, or create custom stages for transcription-specific processing.