nemo_curator.stages.text.experimental.translation.backends.nmt

View as Markdown

NMT (Neural Machine Translation) backend for NeMo Curator.

Communicates with an NMT server (e.g., IndicTrans2) over HTTP. Unlike the Google and AWS backends that translate one text per API call, the NMT backend sends batches of texts in a single HTTP POST for higher throughput.

Module Contents

Classes

NameDescription
NMTTranslationBackendNMT server backend with batched translation.

API

class nemo_curator.stages.text.experimental.translation.backends.nmt.NMTTranslationBackend(
server_url: str,
batch_size: int = 32,
timeout: int = 120,
max_concurrent_requests: int = 32
)

Bases: TranslationBackend

NMT server backend with batched translation.

Parameters:

server_url
str

Base URL of the NMT server (e.g., "http://localhost:8000").

batch_size
intDefaults to 32

Number of texts per HTTP request. Default 32.

timeout
intDefaults to 120

HTTP request timeout in seconds. Default 120.

max_concurrent_requests
intDefaults to 32

Semaphore size for async concurrency.

_server_url
= server_url.rstrip('/')
_session_close_task
Task | None = None
nemo_curator.stages.text.experimental.translation.backends.nmt.NMTTranslationBackend._get_session() -> aiohttp.ClientSession
async

Lazily create or return the aiohttp session.

nemo_curator.stages.text.experimental.translation.backends.nmt.NMTTranslationBackend._translate_sub_batch(
texts: list[str],
source_lang: str,
target_lang: str
) -> list[str]
async

Translate a single sub-batch with semaphore gating and retries.

nemo_curator.stages.text.experimental.translation.backends.nmt.NMTTranslationBackend.check_server() -> bool

Check if the NMT server is reachable via its /health endpoint.

Falls back to a plain GET to the server root URL if /health is not available. Uses synchronous requests for simplicity.

Returns: bool

True if the server is reachable, False otherwise.

nemo_curator.stages.text.experimental.translation.backends.nmt.NMTTranslationBackend.close() -> None

Close the aiohttp session if open.

Handles both cases:

  • Inside a running event loop: schedule close via loop.create_task so it is awaited by the running loop.
  • Outside an event loop: use asyncio.run to close synchronously.
nemo_curator.stages.text.experimental.translation.backends.nmt.NMTTranslationBackend.setup() -> None

Validate the server URL and optionally perform a health check.

Raises:

  • ImportError: If aiohttp is not installed.
nemo_curator.stages.text.experimental.translation.backends.nmt.NMTTranslationBackend.translate_batch_async(
texts: list[str],
source_lang: str,
target_lang: str
) -> list[str]
async

Translate a batch of texts asynchronously.

Splits texts into sub-batches and sends them concurrently, gated by the semaphore.