nemo_curator.stages.text.experimental.translation.backends.base

View as Markdown

Base classes for non-LLM translation backends.

Module Contents

Classes

NameDescription
ExecutorTranslationBackendCommon base for backends with a synchronous single-text SDK call.
TranslationBackendBackend ABC for non-LLM translation (Google, AWS, NMT).

API

class nemo_curator.stages.text.experimental.translation.backends.base.ExecutorTranslationBackend()

Bases: TranslationBackend

Common base for backends with a synchronous single-text SDK call.

AWS Translate and Google Cloud Translate both expose synchronous client methods. This class centralizes the common async bridge, retry wrapper, and lightweight health check so those backends only define setup and the actual single-text translation call.

backend_name
str = 'backend'
health_check_source_lang
str = 'en'
health_check_target_lang
str = 'es'
health_check_text
str = 'Hello'
nemo_curator.stages.text.experimental.translation.backends.base.ExecutorTranslationBackend._health_check_exceptions() -> tuple[type[BaseException], ...]

Return provider exception types treated as health-check failures.

nemo_curator.stages.text.experimental.translation.backends.base.ExecutorTranslationBackend._non_retryable_exceptions() -> tuple[type[BaseException], ...]

Return exception types that should bypass retry/backoff.

nemo_curator.stages.text.experimental.translation.backends.base.ExecutorTranslationBackend._translate_single_async(
text: str,
source_lang: str,
target_lang: str
) -> str
async

Translate a single text using an executor-backed sync SDK call.

nemo_curator.stages.text.experimental.translation.backends.base.ExecutorTranslationBackend._translate_single_sync(
text: str,
source_lang: str,
target_lang: str
) -> str
abstract

Translate one text synchronously.

nemo_curator.stages.text.experimental.translation.backends.base.ExecutorTranslationBackend.check_server() -> bool

Check backend reachability with a tiny translation request.

nemo_curator.stages.text.experimental.translation.backends.base.ExecutorTranslationBackend.translate_batch_async(
texts: list[str],
source_lang: str,
target_lang: str
) -> list[str]
async

Translate texts concurrently via the sync single-text SDK call.

class nemo_curator.stages.text.experimental.translation.backends.base.TranslationBackend(
max_concurrent_requests: int = 32
)
Abstract

Backend ABC for non-LLM translation (Google, AWS, NMT).

This interface operates on in-memory text lists and returns translated text lists. It does not manage file I/O.

_semaphore
Semaphore | None = None
nemo_curator.stages.text.experimental.translation.backends.base.TranslationBackend._get_semaphore() -> asyncio.Semaphore

Return the per-backend semaphore, creating it lazily per event loop.

nemo_curator.stages.text.experimental.translation.backends.base.TranslationBackend.check_server() -> bool
abstract

Check if the translation server/service is available.

Each backend implements its own health check logic:

  • Google: test translate “Hello”
  • AWS: test translate “Hello”
  • NMT: GET to /health endpoint

Returns: bool

True if backend is reachable/healthy, False otherwise.

nemo_curator.stages.text.experimental.translation.backends.base.TranslationBackend.close() -> None

Cleanup resources (e.g., close HTTP sessions, API clients).

Override in subclasses that hold open connections.

nemo_curator.stages.text.experimental.translation.backends.base.TranslationBackend.setup() -> None
abstract

Initialize client connections.

Subclasses should call super().setup() for any future base-class initialization. The concurrency semaphore is created lazily inside translate_batch_async() so that it always belongs to the correct event loop.

nemo_curator.stages.text.experimental.translation.backends.base.TranslationBackend.translate_batch(
texts: list[str],
source_lang: str,
target_lang: str
) -> list[str]

Translate a batch of texts synchronously.

Parameters:

texts
list[str]

Source texts to translate.

source_lang
str

ISO 639-1 source language code.

target_lang
str

ISO 639-1 target language code.

Returns: list[str]

Translated texts in the same order as input.

nemo_curator.stages.text.experimental.translation.backends.base.TranslationBackend.translate_batch_async(
texts: list[str],
source_lang: str,
target_lang: str
) -> list[str]
asyncabstract

Translate a batch of texts asynchronously.

Parameters:

texts
list[str]

Source texts to translate.

source_lang
str

ISO 639-1 source language code.

target_lang
str

ISO 639-1 target language code.

Returns: list[str]

Translated texts in the same order as input.