nemo_rl.environments.dapo_math_verifier#
Module Contents#
Functions#
Extract the last LaTeX boxed expression from a string. |
|
Remove the LaTeX boxed command from a string. |
|
Normalize a final answer to a quantitative reasoning question. |
|
Check if the solution is correct according to Minerva criteria. |
|
Check if the prediction is correct using strict boxed answer criteria. |
|
Verify if the solution is correct. |
|
Compute the reward score for a solution. |
Data#
API#
- nemo_rl.environments.dapo_math_verifier.last_boxed_only_string(string: str) Optional[str][source]#
Extract the last LaTeX boxed expression from a string.
- Parameters:
string – Input string containing LaTeX code
- Returns:
The last boxed expression or None if not found
- nemo_rl.environments.dapo_math_verifier.remove_boxed(s: str) str[source]#
Remove the LaTeX boxed command from a string.
- Parameters:
s – String with format “\boxed{content}”
- Returns:
The content inside the boxed command
- nemo_rl.environments.dapo_math_verifier.SUBSTITUTIONS#
[(‘an ‘, ‘’), (‘a ‘, ‘’), (‘.\(', '\)’), (‘\$’, ‘’), (’\ ‘, ‘’), (’ ‘, ‘’), (‘mbox’, ‘text’), (‘,\t…
- nemo_rl.environments.dapo_math_verifier.REMOVED_EXPRESSIONS#
[‘square’, ‘ways’, ‘integers’, ‘dollars’, ‘mph’, ‘inches’, ‘hours’, ‘km’, ‘units’, ‘\ldots’, ‘sue’,…
- nemo_rl.environments.dapo_math_verifier.normalize_final_answer(final_answer: str) str[source]#
Normalize a final answer to a quantitative reasoning question.
- Parameters:
final_answer – The answer string to normalize
- Returns:
Normalized answer string
- nemo_rl.environments.dapo_math_verifier.is_correct_minerva(
- solution_str: str,
- gt: str,
- gt_need_extract: bool = False,
- answer_pattern: str = '(?i)Answer\\s*:\\s*([^\\n]+)',
Check if the solution is correct according to Minerva criteria.
- Parameters:
solution_str – The solution string to check
gt – The ground truth answer
gt_need_extract – Whether the ground truth needs extraction
answer_pattern – Regex pattern to extract the answer
- Returns:
Tuple of (is_correct, normalized_prediction)
- nemo_rl.environments.dapo_math_verifier.is_correct_strict_box(
- pred: str,
- gt: str,
- pause_tokens_index: Optional[list[int]] = None,
Check if the prediction is correct using strict boxed answer criteria.
- Parameters:
pred – The prediction string
gt – The ground truth answer
pause_tokens_index – Indices of pause tokens
- Returns:
Tuple of (score, extracted_prediction)
- nemo_rl.environments.dapo_math_verifier.verify(
- solution_str: str,
- answer: str,
- strict_box_verify: bool = False,
- pause_tokens_index: Optional[list[int]] = None,
Verify if the solution is correct.
- Parameters:
solution_str – The solution string to verify
answer – The ground truth answer
strict_box_verify – Whether to use strict box verification
pause_tokens_index – Indices of pause tokens
- Returns:
True if the solution is correct, False otherwise
- nemo_rl.environments.dapo_math_verifier.compute_score(
- solution_str: str,
- ground_truth: str,
- strict_box_verify: bool = False,
- pause_tokens_index: Optional[list[int]] = None,
Compute the reward score for a solution.
- Parameters:
solution_str – The solution string
ground_truth – The ground truth answer
strict_box_verify – Whether to use strict box verification
pause_tokens_index – Indices of pause tokens
- Returns:
Reward score (1.0 for correct, 0.0 for incorrect)