nemo_curator.stages.text.experimental.translation.stages.skipped_rows

View as Markdown

Stages for skipping and later restoring already-translated rows.

Module Contents

Classes

NameDescription
RestoreSkippedRowsStageRe-merge previously skipped rows back into the translated batch.
SkipExistingTranslationsStageSplit a batch into already-translated and needs-translation rows.

Data

_SKIPPED_ROWS_METADATA_KEY

__all__

API

class nemo_curator.stages.text.experimental.translation.stages.skipped_rows.RestoreSkippedRowsStage(
name: str = 'RestoreSkippedRowsStage'
)
Dataclass

Bases: ProcessingStage[DocumentBatch, DocumentBatch]

Re-merge previously skipped rows back into the translated batch.

_COLUMN_DEFAULTS
dict[str, object]
name
str = 'RestoreSkippedRowsStage'
nemo_curator.stages.text.experimental.translation.stages.skipped_rows.RestoreSkippedRowsStage.inputs() -> tuple[list[str], list[str]]
nemo_curator.stages.text.experimental.translation.stages.skipped_rows.RestoreSkippedRowsStage.outputs() -> tuple[list[str], list[str]]
nemo_curator.stages.text.experimental.translation.stages.skipped_rows.RestoreSkippedRowsStage.process(
batch: nemo_curator.tasks.DocumentBatch
) -> nemo_curator.tasks.DocumentBatch

Merge stashed rows back and restore original order.

class nemo_curator.stages.text.experimental.translation.stages.skipped_rows.SkipExistingTranslationsStage(
name: str = 'SkipExistingTranslationsSt...,
translation_column: str = 'translated_text',
original_order_col: str = '_skip_original_idx'
)
Dataclass

Bases: ProcessingStage[DocumentBatch, DocumentBatch]

Split a batch into already-translated and needs-translation rows.

name
str = 'SkipExistingTranslationsStage'
original_order_col
str = '_skip_original_idx'
translation_column
str = 'translated_text'
nemo_curator.stages.text.experimental.translation.stages.skipped_rows.SkipExistingTranslationsStage.inputs() -> tuple[list[str], list[str]]
nemo_curator.stages.text.experimental.translation.stages.skipped_rows.SkipExistingTranslationsStage.outputs() -> tuple[list[str], list[str]]
nemo_curator.stages.text.experimental.translation.stages.skipped_rows.SkipExistingTranslationsStage.process(
batch: nemo_curator.tasks.DocumentBatch
) -> nemo_curator.tasks.DocumentBatch

Remove already-translated rows and stash them for later merge.

nemo_curator.stages.text.experimental.translation.stages.skipped_rows._SKIPPED_ROWS_METADATA_KEY = '_skipped_rows_state'
nemo_curator.stages.text.experimental.translation.stages.skipped_rows.__all__ = ['_SKIPPED_ROWS_METADATA_KEY', 'RestoreSkippedRowsStage', 'SkipExistingTranslati...