nemo_curator.stages.text.experimental.translation.backends.base
nemo_curator.stages.text.experimental.translation.backends.base
Base classes for non-LLM translation backends.
Module Contents
Classes
API
Bases: TranslationBackend
Common base for backends with a synchronous single-text SDK call.
AWS Translate and Google Cloud Translate both expose synchronous client methods. This class centralizes the common async bridge, retry wrapper, and lightweight health check so those backends only define setup and the actual single-text translation call.
Return provider exception types treated as health-check failures.
Return exception types that should bypass retry/backoff.
Translate a single text using an executor-backed sync SDK call.
Translate one text synchronously.
Check backend reachability with a tiny translation request.
Translate texts concurrently via the sync single-text SDK call.
Backend ABC for non-LLM translation (Google, AWS, NMT).
This interface operates on in-memory text lists and returns translated text lists. It does not manage file I/O.
Return the per-backend semaphore, creating it lazily per event loop.
Check if the translation server/service is available.
Each backend implements its own health check logic:
- Google: test translate “Hello”
- AWS: test translate “Hello”
- NMT: GET to
/healthendpoint
Returns: bool
True if backend is reachable/healthy, False otherwise.
Cleanup resources (e.g., close HTTP sessions, API clients).
Override in subclasses that hold open connections.
Initialize client connections.
Subclasses should call super().setup() for any future base-class
initialization. The concurrency semaphore is created lazily inside
translate_batch_async() so that it always belongs to the correct
event loop.
Translate a batch of texts synchronously.
Parameters:
Source texts to translate.
ISO 639-1 source language code.
ISO 639-1 target language code.
Returns: list[str]
Translated texts in the same order as input.
Translate a batch of texts asynchronously.
Parameters:
Source texts to translate.
ISO 639-1 source language code.
ISO 639-1 target language code.
Returns: list[str]
Translated texts in the same order as input.