Quality Validation#
This page summarizes optional and mandatory checks between generation and the final Parquet export.
Judgement#
judge_questions scores each candidate against the judge_model_config prompt template, producing judged_questions.parquet.
Semantic deduplication#
When semantic_deduplication_config.enabled is true, TextSemanticDeduplicationMCQ runs inside runtime/benchmark_families/mcq/deduplication.py.
If the flag is false, the stage copies the input and marks is_duplicate as false for every row.
Distractor expansion and validity#
If do_distractor_expansion is true, the pipeline expands four-choice rows toward ten choices, then runs check_distractor_validity with distractor_validity_model_config.
Coverage#
When do_coverage_check is true, TextCoverageMCQ analyzes whether generated text still reflects the source chunk windows.
Semantic outliers#
semantic_outlier_detection_config.enabled toggles TextSemanticOutlierDetectionMCQ.
When disabled, the stage writes is_outlier = False and null neighbour metadata while still emitting the Parquet file expected by later stages.
Each stage writes its own Parquet under stage_cache/; see Output Files.