CLI Reference#

Command-line reference for nemotron steps run sdg/data_designer. For pipeline overview, see About Synthetic Data Generation.

Syntax#

$ nemotron steps run sdg/data_designer \
    [-c CONFIG] \
    [--run PROFILE | --batch PROFILE] \
    [--dry-run] \
    [KEY=VALUE ...]

Flags#

-c, --config CONFIG#

Config name (resolved from the step’s config/ directory) or an absolute/relative path to a YAML file.

Bundled names: default, customer_support_tools, rl_pref, tiny.

Default: default

-r, --run PROFILE#

Run attached using the env.toml profile named PROFILE. Job output streams to the terminal. Use for short interactive runs.

-b, --batch PROFILE#

Run detached using the env.toml profile named PROFILE. Job is submitted and the command returns immediately. Use for long cluster jobs.

-d, --dry-run#

Compile the config and print the resolved job spec without executing. Useful for verifying hydra overrides before submission.

Hydra Overrides#

Any KEY=VALUE argument after the flags is passed as a hydra dotlist override and merged into the resolved config. Overrides take precedence over YAML values.

Override

Example

Effect

num_records=N

num_records=50

Generate N records

preview=true

preview=true

Run in preview mode

output_path=PATH

output_path=/data/out.jsonl

Write output to PATH

seed_dataset.path=PATH

seed_dataset.path=/data/seeds.jsonl

Override seed file

models.0.inference_parameters.temperature=T

models.0.inference_parameters.temperature=0.5

Override first model’s temperature

Dotlist path follows the YAML structure. Nested keys use . as separator; list items use .N (zero-indexed).

Examples#

Preview the default config with two records:

$ nemotron steps run sdg/data_designer -c default preview=true num_records=2

Generate 100 SFT records with a custom output path:

$ nemotron steps run sdg/data_designer -c default \
    num_records=100 \
    output_path=/data/my-project/sft.jsonl

Dry-run a cluster submission to check the resolved config:

$ nemotron steps run sdg/data_designer -c default --run my-profile --dry-run

Run attached on a Lepton profile with 500 records:

$ nemotron steps run sdg/data_designer -c default --run lepton_sdg_data_designer num_records=500

Use a config at an arbitrary path:

$ nemotron steps run sdg/data_designer -c /path/to/my-config.yaml preview=true num_records=2