Output Files#
All paths below are relative to output_dir from your YAML and the string expt_name.
Prepare#
File |
Description |
|---|---|
|
Few-shot rows plus domain chunks. |
Generate#
The final generation creates these files:
File |
Description |
|---|---|
|
Snapshot immediately before column renaming for the final schema. |
|
Final MCQ schema; column meanings and a sample row are in the next section. |
The following intermediate files that are created in the stage_cache directory:
File |
Stage |
|---|---|
|
GENERATION |
|
JUDGEMENT |
|
SEMANTIC_DEDUPLICATION |
|
DISTRACTOR_EXPANSION (only when |
|
COVERAGE_CHECK (only when |
|
DISTRACTOR_VALIDITY_CHECK |
|
SEMANTIC_OUTLIER_DETECTION |
|
HALLUCINATION_EASINESS_DETECTION |
Final MCQ Columns#
Generation and translation both export the same eight columns on the final benchmark.parquet file.
Column |
Meaning |
|---|---|
|
Stable identifier for the row, taken from the internal |
|
Stem text for the multiple-choice item. After translation this is the target-language text. |
|
Ordered list of choice strings. The list order matches the letter labels implied by |
|
Zero-based index into |
|
Letter label for the correct choice, derived from |
|
Reserved for chain-of-thought text. The current pipeline sets this column to the literal |
|
Reserved for a source document marker. The current pipeline sets this column to the literal |
|
Target key from |
Sample Row#
Values below are illustrative; your identifiers and wording will differ.
{
"question_id": "mcq-00042",
"question": "If x^2 = 9, which value can x take?",
"options": ["-3 only", "-3 or 3", "3 only", "9"],
"answer_index": 1,
"answer": "B",
"cot_content": "-",
"src": "-",
"category": "maths"
}
Translate#
The translate stage creates the following final output files:
File |
Description |
|---|---|
|
Intermediate snapshot prior to optional quality filtering. |
|
Final translated MCQ after fields are renamed back to |
The following intermediate files are created in the staged_cache directory.
File |
Stage |
|---|---|
|
TRANSLATION |
|
BACKTRANSLATION |
|
QUALITY_METRICS |
Intermediate translation Parquet files can include additional columns such as question_translated, options_translated, backtranslation fields, and metric scores.