Test Time Compute With NVIDIA NeMo Agent Toolkit#
Test time compute reallocates compute after a model has been trained, trading extra inference cycles for much better reasoning, factuality, and robustness, often without any additional training data. The new nat.experimental.test_time_compute package codifies this idea as four strategy types (Search ▶ Editing ▶ Scoring ▶ Selection) that operate on a lightweight TTCItem record. Developers can compose these strategies manually or use several pre‑built TTC functions that wire everything up automatically. To add your own strategy, you can simply follow these steps:
Write a config subclass.
Implement a
StrategyBasechild.Register it with the
@register_ttc_strategydecorator. The remainder of this document explains each step in detail.
Core Design#
Strategy pipeline#
Stage |
Purpose |
Examples |
|---|---|---|
Search |
Generate many alternative plans, prompts, or tool invocations |
|
Editing |
Refine or transform the candidates |
|
Scoring |
Assign a numeric quality score |
|
Selection |
Down‑select or merge |
|
A pipeline type tells a strategy where it is used.
PipelineTypeEnum = { PLANNING, TOOL_USE, AGENT_EXECUTION, CUSTOM }
StageTypeEnum = { SEARCH, EDITING, SCORING, SELECTION }
Each strategy exposes the following methods to the Builder to allow the Builder to resolve dependencies and ensure type safety:
supported_pipeline_types() -> list[PipelineTypeEnum]
stage_type() -> StageTypeEnum
The Builder will ensure that when an TTC Strategy is requested, that the stage and pipeline types match the implementation’s supported types.
StrategyBase#
Every concrete strategy extends StrategyBase.
class MyStrategy(StrategyBase):
async def build_components(self, builder): ...
async def ainvoke(
self,
items: list[TTCItem],
original_prompt: str | None = None,
agent_context: str | None = None,
) -> list[TTCItem]:
...
Implementation hint: Use the Builder helpers (get_llm, get_function, …) during build_components to resolve references once and cache them.
TTCItem#
A single, interoperable record passed between stages.
Field |
Meaning |
|---|---|
|
Raw user task / tool |
|
Generated answer / tool result |
|
Execution plan (planning pipelines) |
|
Review comments from editing stages |
|
Numeric quality metric |
|
Arbitrary auxiliary data |
|
Tool name or other identifier |
Because it is a pydantic.BaseModel, you get .model_dump() and validation for free.
Built‑in Strategies#
Below is a non‑exhaustive catalog you can use immediately; refer to the inline doc‑strings for full parameter lists.
Category |
|
One‑liner |
|---|---|---|
Search |
|
Few‑shot prompt that emits n candidate plans at different temperatures. |
|
Query multiple LLMs in parallel, then concatenate plans. |
|
|
Reformulate a retrieval query from diverse perspectives. |
|
Editing |
|
Loop: plan → critique → edit. |
|
“Feedback LLM + editing LLM” cooperative refinement. |
|
|
Grounded summary that respects user’s “motivation”. |
|
Scoring |
|
Judge execution plans on a 1‑10 scale. |
|
Judge final agent answers. |
|
|
Score w.r.t. task + motivation context. |
|
Selection |
|
Keep the highest‑scoring item. |
|
Filter by score ≥ τ. |
|
|
Let an LLM choose or merge. |
Pre‑Built TTC Functions#
NeMo Agent toolkit ships higher‑level wrappers that hide all orchestration.
Function |
Use‑case |
|---|---|
|
Turn an arbitrary function into a tool; the wrapper asks an LLM to translate free‑text into structured arguments. |
|
Accepts a list of tool invocations, optionally runs search/edit/score/select, then executes each tool concurrently. |
|
Run a function k times, score each output, pick the best. |
|
End‑to‑end: plan → optionally edit/score → select plan → feed downstream agent. |
These are declared in nat.experimental.test_time_compute.functions.* and can be referenced in your Config just like any other function.
Creating and Registering a New Strategy#
Follow the steps below to create and register a new strategy.
Define a
configmodel.class MyStrategyConfig(TTCStrategyBaseConfig, name="my_strategy"): my_param: float = 0.5
Implement the strategy
from nat.experimental.test_time_compute.models.strategy_base import StrategyBase class MyStrategy(StrategyBase): ...
Register the strategy.
from nat.cli.register_workflow import register_ttc_strategy @register_ttc_strategy(config_type=MyStrategyConfig) async def register_my_strategy(cfg: MyStrategyConfig, builder: Builder): strat = MyStrategy(cfg) await strat.build_components(builder) yield strat
Your strategy is now discoverable by TypeRegistry and can be referenced in Config fields.
Composing Strategies in a Config#
TTC Strategies can be part of workflow configurations, just like other components such as LLMs. For example, the following configuration excerpt shows how an TTC strategy can be
configured in a config.yml file and used in a workflow function:
ttc_strategies:
selection_strategy:
_type: llm_based_agent_output_merging
selection_llm: nim_llm
workflow:
_type: execute_score_select_function
selector: selection_strategy
augmented_fn: react_agent_executor
num_executions: 3
Extending Tools and Pipelines#
Multiple stages: Nothing stops you from chaining search → edit → search again, as long as each stage returns
List[TTCItem].Streaming: Strategies themselves are non‑streaming, but you can wrap a streaming LLM in an TTC pipeline by choosing an appropriate pre‑built function such as
plan_select_execute_function, which keeps streaming support if the downstream agent streams.Debugging: Log levels are respected through the standard
loggingmodule; exportNAT_LOG_LEVEL=DEBUGfor verbose traces, including every intermediateTTCItem.
Testing your strategy#
Write isolated unit tests by instantiating your config and strategy directly, then call ainvoke with hand‑crafted TTCItem lists. Refer to the companion tests/ directory for reference tests on ThresholdSelector and BestOfNSelector.
Happy scaling!