Run LLM Translation#

Use this guide when backend must stay llm and you need to point nemotron steps run translate/nemo_curator at an OpenAI-compatible chat-completions endpoint and model.

Prerequisites#

Set NVIDIA_API_KEY in your shell when relying on the default server.api_key_env, for example export NVIDIA_API_KEY="<api-key>".
Confirm server.url matches your deployment. The default.yaml file targets the NVIDIA integrate API.

Procedure#

Start from default.yaml with -c default.
Override model and languages:

uv run nemotron steps run translate/nemo_curator -c default \
  backend=llm \
  input_path=/path/to/chat.jsonl \
  output_dir=/path/to/out \
  source_language=en \
  target_language=de \
  server.model=YOUR_LLM_MODEL_ID

Adjust max_concurrent_requests upward only after verifying the endpoint tolerates parallel completions.

Hosted Model Hygiene#

Hosted catalogs retire models frequently. Pin to identifiers your tenant currently exposes before large batch jobs.