Translation With Nemotron#

The nemotron steps run translate/nemo_curator command translates selected fields in JSONL or Apache Parquet files. You can use a large language model (LLM) with an OpenAI-compatible endpoint, a neural machine translation (NMT) HTTP server, Google Cloud Translation, or Amazon Translate. Optionally, you can also run FAITH evaluation with an LLM after translation to score translation quality.

Tip

New here? Read Tips for Translation With Agents if you plan to drive the work from a coding agent, then start Getting Started With Translation and use this page as the map to deeper topics.

When to Use#

Use nemotron steps run translate/nemo_curator when you need:

  • Localized training or synthetic corpora from translating natural-language fields while preserving structured payloads such as chat turns, tool payloads, and fenced code blocks. Field paths, output_mode, and segmentation interact with that behavior; see Configure Fields and Output and Segmentation.

  • Optional FAITH evaluation with configurable thresholds and filtering, without a separate evaluation CLI.

  • Repeatable configuration by using the checked-in default.yaml plus CLI overrides.

Pipeline Summary#

        flowchart LR
    A[Input JSONL or Parquet] --> B[Curator reader]
    B --> C[TranslationStage]
    C --> D[Curator writer]
    D --> E[Output shards under output_dir]
    C --> F{FAITH enabled?}
    F -->|yes| G[LLM scores segments]
    F -->|no| E
    G --> E
    

Documentation Series#

Tutorial

Run nemotron steps run translate/nemo_curator end-to-end using default.yaml and a sample chat JSONL file.

Getting Started With Translation
Use translation with an agent

Copy-paste session prompts for supervised fine-tuning (SFT) data or exploratory FAITH scoring, plus habits for a short chat.

Tips for Translation With Agents
How-to guides

Backends, fields and outputs, segmentation, FAITH tuning.

How-To Guides
Concepts

Pipeline architecture, segmentation, FAITH behavior.

Concepts
Reference

YAML parameters and nemotron steps run translate/nemo_curator CLI.

Reference for Translation

All Documentation#

Guide

What you do

Tips for Translation With Agents

Paste starter prompts for an agent and keep a translation session on the rails

Getting Started With Translation

Run translation and FAITH using default.yaml and sample JSONL

Guide

Focus

Run LLM Translation

backend: llm

Run NMT Translation

backend: nmt

Run Google or AWS Translation

backend: google / aws

Configure Fields and Output

Field paths and output_mode

Use Fine Segmentation

segmentation_mode

Run FAITH Evaluation

faith_eval block

Guide

Topic

Pipeline Overview

End-to-end flow

Segmentation

Coarse versus fine

FAITH Evaluation Inside Translation

FAITH semantics

Guide

Content

Translation YAML Reference

default.yaml field reference

CLI Reference for Translation

nemotron steps run translate/nemo_curator syntax

Input and Output Format

Input and output shapes

Limitations and Considerations#

  • Cost and rate limits: Hosted and cloud LLM backends incur usage; throttle with max_concurrent_requests and your provider’s guidance.

  • Remote execution: use --run <profile> or --batch <profile> with an environment profile such as lepton_translate.

  • Overrides: Use key=value dotlist syntax after global flags, not passthrough script arguments.

  • Mixed folders: Do not point input_path at one directory that contains both .jsonl and .parquet shards unless you split formats first.

Quick Paths#

  1. Agent-first prompts: Tips for Translation With Agents

  2. First run: Getting Started With Translation

  3. Swap backend: How-To Guides

  4. Lookup flags: CLI Reference for Translation