`nemo_rl.environments.dapo_math_verifier`#

Module Contents#

Functions#

`last_boxed_only_string`	Extract the last LaTeX boxed expression from a string.
`remove_boxed`	Remove the LaTeX boxed command from a string.
`normalize_final_answer`	Normalize a final answer to a quantitative reasoning question.
`is_correct_minerva`	Check if the solution is correct according to Minerva criteria.
`is_correct_strict_box`	Check if the prediction is correct using strict boxed answer criteria.
`verify`	Verify if the solution is correct.
`compute_score`	Compute the reward score for a solution.

Data#

`SUBSTITUTIONS`
`REMOVED_EXPRESSIONS`

API#

nemo_rl.environments.dapo_math_verifier.last_boxed_only_string(string: str) → Optional[str][source]#

Extract the last LaTeX boxed expression from a string.

Parameters:: string – Input string containing LaTeX code
Returns:: The last boxed expression or None if not found

nemo_rl.environments.dapo_math_verifier.remove_boxed(s: str) → str[source]#

Remove the LaTeX boxed command from a string.

Parameters:: s – String with format “\boxed{content}”
Returns:: The content inside the boxed command

nemo_rl.environments.dapo_math_verifier.SUBSTITUTIONS#: [(‘an ‘, ‘’), (‘a ‘, ‘’), (‘.$', '$’), (‘\$’, ‘’), (’\ ‘, ‘’), (’ ‘, ‘’), (‘mbox’, ‘text’), (‘,\t…

nemo_rl.environments.dapo_math_verifier.REMOVED_EXPRESSIONS#: [‘square’, ‘ways’, ‘integers’, ‘dollars’, ‘mph’, ‘inches’, ‘hours’, ‘km’, ‘units’, ‘\ldots’, ‘sue’,…

nemo_rl.environments.dapo_math_verifier.normalize_final_answer(final_answer: str) → str[source]#

Normalize a final answer to a quantitative reasoning question.

Parameters:: final_answer – The answer string to normalize
Returns:: Normalized answer string

nemo_rl.environments.dapo_math_verifier.is_correct_minerva( solution_str: str, gt: str, gt_need_extract: bool = False, answer_pattern: str = '(?i)Answer\\s*:\\s*([^\\n]+)', ) → tuple[bool, str][source]#

Check if the solution is correct according to Minerva criteria.

Parameters:

solution_str – The solution string to check
gt – The ground truth answer
gt_need_extract – Whether the ground truth needs extraction
answer_pattern – Regex pattern to extract the answer

Returns:

Tuple of (is_correct, normalized_prediction)

nemo_rl.environments.dapo_math_verifier.is_correct_strict_box( pred: str, gt: str, pause_tokens_index: Optional[list[int]] = None, ) → tuple[int, Optional[str]][source]#

Check if the prediction is correct using strict boxed answer criteria.

Parameters:

pred – The prediction string
gt – The ground truth answer
pause_tokens_index – Indices of pause tokens

Returns:

Tuple of (score, extracted_prediction)

nemo_rl.environments.dapo_math_verifier.verify( solution_str: str, answer: str, strict_box_verify: bool = False, pause_tokens_index: Optional[list[int]] = None, ) → bool[source]#

Verify if the solution is correct.

Parameters:

solution_str – The solution string to verify
answer – The ground truth answer
strict_box_verify – Whether to use strict box verification
pause_tokens_index – Indices of pause tokens

Returns:

True if the solution is correct, False otherwise

nemo_rl.environments.dapo_math_verifier.compute_score( solution_str: str, ground_truth: str, strict_box_verify: bool = False, pause_tokens_index: Optional[list[int]] = None, ) → float[source]#

Compute the reward score for a solution.

Parameters:

solution_str – The solution string
ground_truth – The ground truth answer
strict_box_verify – Whether to use strict box verification
pause_tokens_index – Indices of pause tokens

Returns:

Reward score (1.0 for correct, 0.0 for incorrect)

nemo_rl.environments.dapo_math_verifier#

Module Contents#

Functions#

Data#

API#

`nemo_rl.environments.dapo_math_verifier`#