Translation With Nemotron#
The nemotron steps run translate/nemo_curator command translates selected fields in JSONL or Apache Parquet files.
You can use a large language model (LLM) with an OpenAI-compatible endpoint, a neural machine translation (NMT) HTTP server, Google Cloud Translation, or Amazon Translate.
Optionally, you can also run FAITH evaluation with an LLM after translation to score translation quality.
Tip
New here? Read Tips for Translation With Agents if you plan to drive the work from a coding agent, then start Getting Started With Translation and use this page as the map to deeper topics.
When to Use#
Use nemotron steps run translate/nemo_curator when you need:
Localized training or synthetic corpora from translating natural-language fields while preserving structured payloads such as chat turns, tool payloads, and fenced code blocks. Field paths,
output_mode, and segmentation interact with that behavior; see Configure Fields and Output and Segmentation.Optional FAITH evaluation with configurable thresholds and filtering, without a separate evaluation CLI.
Repeatable configuration by using the checked-in
default.yamlplus CLI overrides.
Pipeline Summary#
flowchart LR
A[Input JSONL or Parquet] --> B[Curator reader]
B --> C[TranslationStage]
C --> D[Curator writer]
D --> E[Output shards under output_dir]
C --> F{FAITH enabled?}
F -->|yes| G[LLM scores segments]
F -->|no| E
G --> E
Documentation Series#
Run nemotron steps run translate/nemo_curator end-to-end using default.yaml and a sample chat JSONL file.
Copy-paste session prompts for supervised fine-tuning (SFT) data or exploratory FAITH scoring, plus habits for a short chat.
Backends, fields and outputs, segmentation, FAITH tuning.
Pipeline architecture, segmentation, FAITH behavior.
YAML parameters and nemotron steps run translate/nemo_curator CLI.
All Documentation#
Guide |
What you do |
|---|---|
Paste starter prompts for an agent and keep a translation session on the rails |
|
Run translation and FAITH using |
Guide |
Focus |
|---|---|
|
|
|
|
|
|
Field paths and |
|
|
|
|
Guide |
Topic |
|---|---|
End-to-end flow |
|
Coarse versus fine |
|
FAITH semantics |
Guide |
Content |
|---|---|
|
|
|
|
Input and output shapes |
Limitations and Considerations#
Cost and rate limits: Hosted and cloud LLM backends incur usage; throttle with
max_concurrent_requestsand your provider’s guidance.Remote execution: use
--run <profile>or--batch <profile>with an environment profile such aslepton_translate.Overrides: Use
key=valuedotlist syntax after global flags, not passthrough script arguments.Mixed folders: Do not point
input_pathat one directory that contains both.jsonland.parquetshards unless you split formats first.
Quick Paths#
Agent-first prompts: Tips for Translation With Agents
First run: Getting Started With Translation
Swap backend: How-To Guides
Lookup flags: CLI Reference for Translation