nemo_curator.stages.text.experimental.translation.stages.translate

View as Markdown

Translate segmented text with an LLM or external backend.

Module Contents

Classes

NameDescription
SegmentTranslationStageTranslate segments emitted by :class:SegmentationStage.

API

class nemo_curator.stages.text.experimental.translation.stages.translate.SegmentTranslationStage(
name: str = 'SegmentTranslationStage',
source_lang: str,
target_lang: str,
client: nemo_curator.models.client.llm_client.AsyncLLMClient | None = None,
model_name: str = '',
backend_type: str = 'llm',
backend_config: dict = dict(),
generation_config: nemo_curator.models.client.llm_client.GenerationConfig | None = None,
max_concurrent_requests: int = 64,
health_check: bool = True,
dry_run: bool = False,
dry_run_log_count: int = 5
)
Dataclass

Bases: ProcessingStage[DocumentBatch, DocumentBatch]

Translate segments emitted by :class:SegmentationStage.

Reads _seg_segments and writes _translated.

_backend
object = field(init=False, repr=False, default=None)
_initialized
bool = field(init=False, repr=False, default=False)
_system_prompt
str = field(init=False, repr=False, default='')
_user_template
str = field(init=False, repr=False, default='')
backend_config
dict = field(default_factory=dict)
backend_type
str = 'llm'
client
AsyncLLMClient | None = None
dry_run
bool = False

If True, skip actual translation and return empty strings.

dry_run_log_count
int = 5

Number of example prompts to log when dry_run is enabled.

generation_config
GenerationConfig | None = None
health_check
bool = True

If True, verify the translation backend is reachable during setup().

max_concurrent_requests
int = 64
model_name
str = ''
name
str = 'SegmentTranslationStage'
source_lang
str
target_lang
str
nemo_curator.stages.text.experimental.translation.stages.translate.SegmentTranslationStage.__post_init__() -> None
nemo_curator.stages.text.experimental.translation.stages.translate.SegmentTranslationStage._backend_failure_exceptions() -> tuple[type[BaseException], ...]

Return exception types handled at the backend boundary.

nemo_curator.stages.text.experimental.translation.stages.translate.SegmentTranslationStage._build_messages(
segment: str
) -> list[dict]

Build the prompt for one segment.

nemo_curator.stages.text.experimental.translation.stages.translate.SegmentTranslationStage._call_backend_batch(
segments: list[str]
) -> list[str]

Invoke the configured non-LLM backend for one batch of segments.

nemo_curator.stages.text.experimental.translation.stages.translate.SegmentTranslationStage._collect_backend_segments(
segments: list[str],
translated: list[str]
) -> tuple[list[int], list[str]]
staticmethod

Collect translatable segments and preserve passthrough segments.

nemo_curator.stages.text.experimental.translation.stages.translate.SegmentTranslationStage._query_llm_health_check() -> str
async

Run the lightweight LLM health-check request.

nemo_curator.stages.text.experimental.translation.stages.translate.SegmentTranslationStage._run_health_check() -> None

Verify the translation backend is reachable.

nemo_curator.stages.text.experimental.translation.stages.translate.SegmentTranslationStage._translate_all_async(
segments: list[str]
) -> tuple[list[str], list[float], list[str]]
async

Translate all segments concurrently.

nemo_curator.stages.text.experimental.translation.stages.translate.SegmentTranslationStage._translate_backend(
segments: list[str]
) -> tuple[list[str], list[float], list[str]]

Delegate translation to a non-LLM backend.

nemo_curator.stages.text.experimental.translation.stages.translate.SegmentTranslationStage._translate_backend_one_by_one(
segments: list[str],
translated: list[str],
timings: list[float],
errors: list[str]
) -> None

Fallback path that retries backend translation one segment at a time.

nemo_curator.stages.text.experimental.translation.stages.translate.SegmentTranslationStage._translate_llm_async(
segments: list[str]
) -> tuple[list[str], list[float], list[str]]

Translate segments with the async LLM client.

nemo_curator.stages.text.experimental.translation.stages.translate.SegmentTranslationStage._unwrap_translation(
text: str
) -> str
staticmethod

Extract translated text from the expected 〘...〙 wrapper.

nemo_curator.stages.text.experimental.translation.stages.translate.SegmentTranslationStage._validate_backend_result_count(
result: list[str],
translate_segments: list[str]
) -> None
staticmethod

Raise if the backend returned a different number of translations.

nemo_curator.stages.text.experimental.translation.stages.translate.SegmentTranslationStage._write_bulk_backend_results(
result: list[str],
elapsed: float,
translate_indices: list[int],
translated: list[str],
timings: list[float]
) -> None
staticmethod

Write successful bulk backend outputs into result arrays.

nemo_curator.stages.text.experimental.translation.stages.translate.SegmentTranslationStage.inputs() -> tuple[list[str], list[str]]
nemo_curator.stages.text.experimental.translation.stages.translate.SegmentTranslationStage.outputs() -> tuple[list[str], list[str]]
nemo_curator.stages.text.experimental.translation.stages.translate.SegmentTranslationStage.process(
batch: nemo_curator.tasks.DocumentBatch
) -> nemo_curator.tasks.DocumentBatch

Translate every segment in the batch.

nemo_curator.stages.text.experimental.translation.stages.translate.SegmentTranslationStage.setup(
worker_metadata: nemo_curator.backends.base.WorkerMetadata | None = None
) -> None

Initialize the client or backend on the worker.