Troubleshooting#

This page lists common symptoms when you run nemotron steps run byob/mcq with the bring your own benchmark (BYOB) multiple choice question (MCQ) family and when you tune BYOB translation settings. Each table row pairs a symptom with the files, fields, or flags you should inspect first. For stage flow and design rationale, see the explanation pages linked from Concepts.

Configuration Rejected Before the Pipeline Runs#

Symptom

What to do

Assertion that tags should not be specified when metadata_file is absent

Remove tags from every target_source_mapping entry, or set top-level metadata_file to the comma-separated values (CSV) file that supplies tag metadata. See Prepare Your Own Domain Data.

Assertion that BYOB translation does not support FAITH evaluation

Under translation_model_config.stage, set enable_faith_eval to false or omit the field. BYOB translation expects backtranslation_quality_metrics instead of FAITH. See Translation Configuration Reference and Translation.

Message that a prompt override key is missing or is not a string, or validation fails on prompt_config

Open your prompt_config YAML and match the structure in Prompt Tuning for Benchmarks. Easiness and hallucination overrides must include the expected blocks.

Schema validation reports that distractor_validity_model_config is missing

The current generation schema requires this block even when distractor expansion is off. Add the block as described in Configure Model Endpoints for BYOB.

Skipped Stages and Missing Parquet Inputs#

Symptom

What to do

A stage fails because an expected Parquet file is missing under output_dir/<expt_name>/stage_cache/

When you use skip_until, every stage before the resume point must have written its output file to disk. Rerun from an earlier stage without skipping, or copy valid caches from a prior run. See Skip Stages When Iterating.

Generation Ends With No Final Benchmark Rows#

Symptom

What to do

Log line No questions left after filtering, or benchmark.parquet is missing after a run that exited early

Open stage_cache/filtered_questions.parquet and inspect is_easy, is_hallucination, and score columns. Loosen easiness_threshold or hallucination_threshold, set remove_easy and remove_hallucinated to false for a diagnostic run, or adjust upstream judgement and deduplication settings. See Easiness and Hallucination Filtering.

Translation Export Drops Every Row#

Symptom

What to do

benchmark.parquet exists but has zero rows after translation

When remove_low_quality is true, only rows with is_quality_metric_passed are exported. Inspect stage_cache/quality_metrics.parquet, relax metric thresholds in backtranslation_quality_metrics, or set remove_low_quality to false while you tune. See Translation.

Few-Shot Sampling Finds No Rows#

Symptom

What to do

Runtime error stating there are no samples for a given source subject and tag combination

Tag filters must match comma-separated tag strings that appear in the metadata CSV for that Hugging Face subject. Widen tags, correct the CSV, or confirm source_subjects and target_source_mapping names align with Prepare Your Own Domain Data and Getting the Right Questions From the Source Benchmark.

Hosted Models Throttle or Stall#

Symptom

What to do

Timeouts, HTTP responses in the 429 range, or bursty failures when calling remote endpoints

Reduce max_parallel_requests and related batch settings in your YAML, rerun on a smaller slice of data, and confirm API keys and quotas. See Question Generation and Configure Model Endpoints for BYOB.