Reasoning#

The reasoning interceptor processes chain-of-thought reasoning from model responses by removing reasoning tokens from content and tracking reasoning statistics.

Overview#

The ResponseReasoningInterceptor handles models that generate explicit reasoning steps, typically enclosed in special tokens. It removes reasoning content from the final response and tracks reasoning metrics for analysis.

Configuration#

Python Configuration#

from nemo_evaluator.adapters.adapter_config import AdapterConfig, InterceptorConfig

adapter_config = AdapterConfig(
    interceptors=[
        InterceptorConfig(
            name="reasoning",
            config={
                "start_reasoning_token": "<think>",
                "end_reasoning_token": "</think>",
                "add_reasoning": True,
                "enable_reasoning_tracking": True
            }
        )
    ]
)

CLI Configuration#

--overrides 'target.api_endpoint.adapter_config.interceptors=[{"name":"reasoning","config":{"start_reasoning_token":"<think>","end_reasoning_token":"</think>"}}]'

YAML Configuration#

target:
  api_endpoint:
    adapter_config:
      interceptors:
        - name: reasoning
          config:
            start_reasoning_token: "<think>"
            end_reasoning_token: "</think>"
            add_reasoning: true
            enable_reasoning_tracking: true

Configuration Options#

Parameter	Description	Default	Type
`start_reasoning_token`	Token that marks the start of reasoning section	`"<think>"`	str \| None
`end_reasoning_token`	Token that marks the end of reasoning section	`"</think>"`	str
`add_reasoning`	Whether to add reasoning information	`True`	bool
`migrate_reasoning_content`	Migrate reasoning_content to content field with tokens	`False`	bool
`enable_reasoning_tracking`	Enable reasoning tracking and logging	`True`	bool
`include_if_not_finished`	Include reasoning content if reasoning is not finished (end token not found)	`True`	bool
`stats_file_saving_interval`	How often (every N responses) to save stats to file	`None`	int \| None
`enable_caching`	Whether to enable caching of reasoning statistics	`True`	bool
`cache_dir`	Custom cache directory for reasoning stats	`"/tmp/reasoning_interceptor"`	str
`logging_aggregated_stats_interval`	How often (every N responses) to log aggregated reasoning statistics	`100`	int

Processing Examples#

Basic Reasoning Stripping#

# Original response from model
original_content = "<think>Let me solve this step by step. 2+2 is basic addition. 2 plus 2 equals 4.</think>The answer is 4."

# After reasoning interceptor processing
# The content field has reasoning removed
processed_content = "The answer is 4."

Multi-Step Reasoning#

# Original response with multi-line reasoning
original_content = """<think>
This is a word problem. Let me break it down:
1. John has 5 apples
2. He gives away 2 apples  
3. So he has 5 - 2 = 3 apples left
</think>John has 3 apples remaining."""

# After processing: reasoning tokens and content are removed
processed_content = "John has 3 apples remaining."

Tracked Metrics#

The interceptor automatically tracks the following statistics:

Metric	Description
`total_responses`	Total number of responses processed
`responses_with_reasoning`	Number of responses containing reasoning content
`reasoning_finished_count`	Number of responses where reasoning completed (end token found)
`reasoning_started_count`	Number of responses where reasoning started
`avg_reasoning_words`	Average word count in reasoning content
`avg_reasoning_tokens`	Average token count in reasoning content
`avg_original_content_words`	Average word count in original content (before processing)
`avg_updated_content_words`	Average word count in updated content (after processing)
`avg_updated_content_tokens`	Average token count in updated content
`max_reasoning_words`	Maximum word count in reasoning content
`max_reasoning_tokens`	Maximum token count in reasoning content
`max_updated_content_tokens`	Maximum token count in updated content
`total_reasoning_words`	Total word count across all reasoning content
`total_reasoning_tokens`	Total token count across all reasoning content

These statistics are saved to eval_factory_metrics.json under the reasoning key after evaluation completes.

Example: Custom Reasoning Tokens#

from nemo_evaluator.adapters.adapter_config import AdapterConfig, InterceptorConfig

# For models using different reasoning tokens
adapter_config = AdapterConfig(
    interceptors=[
        InterceptorConfig(
            name="reasoning",
            config={
                "start_reasoning_token": "[REASONING]",
                "end_reasoning_token": "[/REASONING]"
            }
        )
    ]
)

Example: Combined with Other Interceptors#

from nemo_evaluator.adapters.adapter_config import AdapterConfig, InterceptorConfig

adapter_config = AdapterConfig(
    interceptors=[
        InterceptorConfig(name="request_logging", config={"max_requests": 50}),
        InterceptorConfig(name="response_logging", config={"max_responses": 50}),
        InterceptorConfig(
            name="reasoning",
            config={
                "start_reasoning_token": "<think>",
                "end_reasoning_token": "</think>",
                "enable_reasoning_tracking": True
            }
        )
    ]
)