Translation With Nemotron#

The nemotron steps run translate/nemo_curator command translates selected fields in JSONL or Apache Parquet files. You can use a large language model (LLM) with an OpenAI-compatible endpoint, a neural machine translation (NMT) HTTP server, Google Cloud Translation, or Amazon Translate. Optionally, you can also run FAITH evaluation with an LLM after translation to score translation quality.

Tip

New here? Read Tips for Translation With Agents if you plan to drive the work from a coding agent, then start Getting Started With Translation and use this page as the map to deeper topics.

When to Use#

Use nemotron steps run translate/nemo_curator when you need:

Localized training or synthetic corpora from translating natural-language fields while preserving structured payloads such as chat turns, tool payloads, and fenced code blocks. Field paths, output_mode, and segmentation interact with that behavior; see Configure Fields and Output and Segmentation.
Optional FAITH evaluation with configurable thresholds and filtering, without a separate evaluation CLI.
Repeatable configuration by using the checked-in default.yaml plus CLI overrides.

Pipeline Summary#

        flowchart LR
    A[Input JSONL or Parquet] --> B[Curator reader]
    B --> C[TranslationStage]
    C --> D[Curator writer]
    D --> E[Output shards under output_dir]
    C --> F{FAITH enabled?}
    F -->|yes| G[LLM scores segments]
    F -->|no| E
    G --> E

Documentation Series#

Tutorial

Run nemotron steps run translate/nemo_curator end-to-end using default.yaml and a sample chat JSONL file.

hands-on

Getting Started With Translation

Use translation with an agent

Copy-paste session prompts for supervised fine-tuning (SFT) data or exploratory FAITH scoring, plus habits for a short chat.

newcomer

Tips for Translation With Agents

How-to guides

Backends, fields and outputs, segmentation, FAITH tuning.

task-based

How-To Guides

Concepts

Pipeline architecture, segmentation, FAITH behavior.

learn

Concepts

Reference

YAML parameters and nemotron steps run translate/nemo_curator CLI.

lookup

Reference for Translation

All Documentation#

Tutorial

Guide	What you do
Tips for Translation With Agents	Paste starter prompts for an agent and keep a translation session on the rails
Getting Started With Translation	Run translation and FAITH using `default.yaml` and sample JSONL

How-to guides

Guide	Focus
Run LLM Translation	`backend: llm`
Run NMT Translation	`backend: nmt`
Run Google or AWS Translation	`backend: google` / `aws`
Configure Fields and Output	Field paths and `output_mode`
Use Fine Segmentation	`segmentation_mode`
Run FAITH Evaluation	`faith_eval` block

Concepts

Guide	Topic
Pipeline Overview	End-to-end flow
Segmentation	Coarse versus fine
FAITH Evaluation Inside Translation	FAITH semantics

Reference

Guide	Content
Translation YAML Reference	`default.yaml` field reference
CLI Reference for Translation	`nemotron steps run translate/nemo_curator` syntax
Input and Output Format	Input and output shapes

Limitations and Considerations#

Cost and rate limits: Hosted and cloud LLM backends incur usage; throttle with max_concurrent_requests and your provider’s guidance.
Remote execution: use --run <profile> or --batch <profile> with an environment profile such as lepton_translate.
Overrides: Use key=value dotlist syntax after global flags, not passthrough script arguments.
Mixed folders: Do not point input_path at one directory that contains both .jsonl and .parquet shards unless you split formats first.

Quick Paths#

Agent-first prompts: Tips for Translation With Agents
First run: Getting Started With Translation
Swap backend: How-To Guides
Lookup flags: CLI Reference for Translation