Test Time Compute With NVIDIA NeMo Agent Toolkit#
Test time compute reallocates compute after a model has been trained, trading extra inference cycles for much better reasoning, factuality, and robustness, often without any additional training data. The new nat.experimental.test_time_compute
package codifies this idea as four strategy types (Search ▶ Editing ▶ Scoring ▶ Selection) that operate on a lightweight TTCItem
record. Developers can compose these strategies manually or use several pre‑built TTC functions that wire everything up automatically. To add your own strategy, you can simply follow these steps:
Write a config subclass.
Implement a
StrategyBase
child.Register it with the
@register_ttc_strategy
decorator. The remainder of this document explains each step in detail.
Core Design#
Strategy pipeline#
Stage |
Purpose |
Examples |
---|---|---|
Search |
Generate many alternative plans, prompts, or tool invocations |
|
Editing |
Refine or transform the candidates |
|
Scoring |
Assign a numeric quality score |
|
Selection |
Down‑select or merge |
|
A pipeline type tells a strategy where it is used.
PipelineTypeEnum = { PLANNING, TOOL_USE, AGENT_EXECUTION, CUSTOM }
StageTypeEnum = { SEARCH, EDITING, SCORING, SELECTION }
Each strategy exposes the following methods to the Builder
to allow the Builder
to resolve dependencies and ensure type safety:
supported_pipeline_types() -> list[PipelineTypeEnum]
stage_type() -> StageTypeEnum
The Builder
will ensure that when an TTC Strategy
is requested, that the stage and pipeline types match the implementation’s supported types.
StrategyBase
#
Every concrete strategy extends StrategyBase
.
class MyStrategy(StrategyBase):
async def build_components(self, builder): ...
async def ainvoke(
self,
items: list[TTCItem],
original_prompt: str | None = None,
agent_context: str | None = None,
) -> list[TTCItem]:
...
Implementation hint: Use the Builder
helpers (get_llm
, get_function
, …) during build_components
to resolve references once and cache them.
TTCItem
#
A single, interoperable record passed between stages.
Field |
Meaning |
---|---|
|
Raw user task / tool |
|
Generated answer / tool result |
|
Execution plan (planning pipelines) |
|
Review comments from editing stages |
|
Numeric quality metric |
|
Arbitrary auxiliary data |
|
Tool name or other identifier |
Because it is a pydantic.BaseModel
, you get .model_dump()
and validation for free.
Built‑in Strategies#
Below is a non‑exhaustive catalog you can use immediately; refer to the inline doc‑strings for full parameter lists.
Category |
|
One‑liner |
---|---|---|
Search |
|
Few‑shot prompt that emits n candidate plans at different temperatures. |
|
Query multiple LLMs in parallel, then concatenate plans. |
|
|
Reformulate a retrieval query from diverse perspectives. |
|
Editing |
|
Loop: plan → critique → edit. |
|
“Feedback LLM + editing LLM” cooperative refinement. |
|
|
Grounded summary that respects user’s “motivation”. |
|
Scoring |
|
Judge execution plans on a 1‑10 scale. |
|
Judge final agent answers. |
|
|
Score w.r.t. task + motivation context. |
|
Selection |
|
Keep the highest‑scoring item. |
|
Filter by score ≥ τ. |
|
|
Let an LLM choose or merge. |
Pre‑Built TTC Functions#
NeMo Agent toolkit ships higher‑level wrappers that hide all orchestration.
Function |
Use‑case |
---|---|
|
Turn an arbitrary function into a tool; the wrapper asks an LLM to translate free‑text into structured arguments. |
|
Accepts a list of tool invocations, optionally runs search/edit/score/select, then executes each tool concurrently. |
|
Run a function k times, score each output, pick the best. |
|
End‑to‑end: plan → optionally edit/score → select plan → feed downstream agent. |
These are declared in nat.experimental.test_time_compute.functions.*
and can be referenced in your Config
just like any other function.
Creating and Registering a New Strategy#
Follow the steps below to create and register a new strategy.
Define a
config
model.class MyStrategyConfig(TTCStrategyBaseConfig, name="my_strategy"): my_param: float = 0.5
Implement the strategy
from nat.experimental.test_time_compute.models.strategy_base import StrategyBase class MyStrategy(StrategyBase): ...
Register the strategy.
from nat.cli.register_workflow import register_ttc_strategy @register_ttc_strategy(config_type=MyStrategyConfig) async def register_my_strategy(cfg: MyStrategyConfig, builder: Builder): strat = MyStrategy(cfg) await strat.build_components(builder) yield strat
Your strategy is now discoverable by TypeRegistry
and can be referenced in Config
fields.
Composing Strategies in a Config
#
TTC Strategies can be part of workflow configurations, just like other components such as LLMs
. For example, the following configuration excerpt shows how an TTC strategy can be
configured in a config.yml
file and used in a workflow function:
ttc_strategies:
selection_strategy:
_type: llm_based_agent_output_merging
selection_llm: nim_llm
workflow:
_type: execute_score_select_function
selector: selection_strategy
augmented_fn: react_agent_executor
num_executions: 3
Extending Tools and Pipelines#
Multiple stages: Nothing stops you from chaining search → edit → search again, as long as each stage returns
List[TTCItem]
.Streaming: Strategies themselves are non‑streaming, but you can wrap a streaming LLM in an TTC pipeline by choosing an appropriate pre‑built function such as
plan_select_execute_function
, which keeps streaming support if the downstream agent streams.Debugging: Log levels are respected through the standard
logging
module; exportNAT_LOG_LEVEL=DEBUG
for verbose traces, including every intermediateTTCItem
.
Testing your strategy#
Write isolated unit tests by instantiating your config and strategy directly, then call ainvoke
with hand‑crafted TTCItem
lists. Refer to the companion tests/
directory for reference tests on ThresholdSelector
and BestOfNSelector
.
Happy scaling!