nemo_curator.stages.text.experimental.translation.backends.nmt
nemo_curator.stages.text.experimental.translation.backends.nmt
NMT (Neural Machine Translation) backend for NeMo Curator.
Communicates with an NMT server (e.g., IndicTrans2) over HTTP. Unlike the Google and AWS backends that translate one text per API call, the NMT backend sends batches of texts in a single HTTP POST for higher throughput.
Module Contents
Classes
API
Bases: TranslationBackend
NMT server backend with batched translation.
Parameters:
Base URL of the NMT server (e.g., "http://localhost:8000").
Number of texts per HTTP request. Default 32.
HTTP request timeout in seconds. Default 120.
Semaphore size for async concurrency.
Lazily create or return the aiohttp session.
Translate a single sub-batch with semaphore gating and retries.
Check if the NMT server is reachable via its /health endpoint.
Falls back to a plain GET to the server root URL if /health is
not available. Uses synchronous requests for simplicity.
Returns: bool
True if the server is reachable, False otherwise.
Close the aiohttp session if open.
Handles both cases:
- Inside a running event loop: schedule close via
loop.create_taskso it is awaited by the running loop. - Outside an event loop: use
asyncio.runto close synchronously.
Validate the server URL and optionally perform a health check.
Raises:
ImportError: Ifaiohttpis not installed.
Translate a batch of texts asynchronously.
Splits texts into sub-batches and sends them concurrently, gated by the semaphore.