nemo_rl.environments.dapo_math_verifier
#
Module Contents#
Functions#
Extract the last LaTeX boxed expression from a string. |
|
Remove the LaTeX boxed command from a string. |
|
Normalize a final answer to a quantitative reasoning question. |
|
Check if the solution is correct according to Minerva criteria. |
|
Check if the prediction is correct using strict boxed answer criteria. |
|
Verify if the solution is correct. |
|
Compute the reward score for a solution. |
Data#
API#
- nemo_rl.environments.dapo_math_verifier.last_boxed_only_string(string: str) Optional[str] [source]#
Extract the last LaTeX boxed expression from a string.
- Parameters:
string β Input string containing LaTeX code
- Returns:
The last boxed expression or None if not found
- nemo_rl.environments.dapo_math_verifier.remove_boxed(s: str) str [source]#
Remove the LaTeX boxed command from a string.
- Parameters:
s β String with format β\boxed{content}β
- Returns:
The content inside the boxed command
- nemo_rl.environments.dapo_math_verifier.SUBSTITUTIONS#
[(βan β, ββ), (βa β, ββ), (β.\(', '\)β), (β\$β, ββ), (β\ β, ββ), (β β, ββ), (βmboxβ, βtextβ), (β,\tβ¦
- nemo_rl.environments.dapo_math_verifier.REMOVED_EXPRESSIONS#
[βsquareβ, βwaysβ, βintegersβ, βdollarsβ, βmphβ, βinchesβ, βhoursβ, βkmβ, βunitsβ, β\ldotsβ, βsueβ,β¦
- nemo_rl.environments.dapo_math_verifier.normalize_final_answer(final_answer: str) str [source]#
Normalize a final answer to a quantitative reasoning question.
- Parameters:
final_answer β The answer string to normalize
- Returns:
Normalized answer string
- nemo_rl.environments.dapo_math_verifier.is_correct_minerva(
- solution_str: str,
- gt: str,
- gt_need_extract: bool = False,
- answer_pattern: str = '(?i)Answer\\s*:\\s*([^\\n]+)',
Check if the solution is correct according to Minerva criteria.
- Parameters:
solution_str β The solution string to check
gt β The ground truth answer
gt_need_extract β Whether the ground truth needs extraction
answer_pattern β Regex pattern to extract the answer
- Returns:
Tuple of (is_correct, normalized_prediction)
- nemo_rl.environments.dapo_math_verifier.is_correct_strict_box(
- pred: str,
- gt: str,
- pause_tokens_index: Optional[list[int]] = None,
Check if the prediction is correct using strict boxed answer criteria.
- Parameters:
pred β The prediction string
gt β The ground truth answer
pause_tokens_index β Indices of pause tokens
- Returns:
Tuple of (score, extracted_prediction)
- nemo_rl.environments.dapo_math_verifier.verify(
- solution_str: str,
- answer: str,
- strict_box_verify: bool = False,
- pause_tokens_index: Optional[list[int]] = None,
Verify if the solution is correct.
- Parameters:
solution_str β The solution string to verify
answer β The ground truth answer
strict_box_verify β Whether to use strict box verification
pause_tokens_index β Indices of pause tokens
- Returns:
True if the solution is correct, False otherwise
- nemo_rl.environments.dapo_math_verifier.compute_score(
- solution_str: str,
- ground_truth: str,
- strict_box_verify: bool = False,
- pause_tokens_index: Optional[list[int]] = None,
Compute the reward score for a solution.
- Parameters:
solution_str β The solution string
ground_truth β The ground truth answer
strict_box_verify β Whether to use strict box verification
pause_tokens_index β Indices of pause tokens
- Returns:
Reward score (1.0 for correct, 0.0 for incorrect)