Test Time Compute With NVIDIA NeMo Agent Toolkit#

Test time compute reallocates compute after a model has been trained, trading extra inference cycles for much better reasoning, factuality, and robustness, often without any additional training data. The new nat.experimental.test_time_compute package codifies this idea as four strategy types (Search ▶ Editing ▶ Scoring ▶ Selection) that operate on a lightweight TTCItem record. Developers can compose these strategies manually or use several pre‑built TTC functions that wire everything up automatically. To add your own strategy, you can simply follow these steps:

Write a config subclass.
Implement a StrategyBase child.
Register it with the @register_ttc_strategy decorator. The remainder of this document explains each step in detail.

Core Design#

Strategy pipeline#

Stage	Purpose	Examples
Search	Generate many alternative plans, prompts, or tool invocations	`single_shot_multi_plan`, `multi_llm_plan`, `multi_query_retrieval_search`
Editing	Refine or transform the candidates	`iterative_plan_refinement`, `llm_as_a_judge_editor`, `motivation_aware_summarization`
Scoring	Assign a numeric quality score	`llm_based_plan_scorer`, `llm_based_agent_scorer`, `motivation_aware_scorer`
Selection	Down‑select or merge	`best_of_n_selector`, `threshold_selector`, `llm_based_plan_selector`, `llm_based_output_merging_selector`, `llm_based_agent_output_selector`

A pipeline type tells a strategy where it is used.

PipelineTypeEnum = { PLANNING, TOOL_USE, AGENT_EXECUTION, CUSTOM }
StageTypeEnum    = { SEARCH, EDITING, SCORING, SELECTION }

Each strategy exposes the following methods to the Builder to allow the Builder to resolve dependencies and ensure type safety:

supported_pipeline_types() -> list[PipelineTypeEnum]
stage_type()                -> StageTypeEnum

The Builder will ensure that when an TTC Strategy is requested, that the stage and pipeline types match the implementation’s supported types.

`StrategyBase`#

Every concrete strategy extends StrategyBase.

class MyStrategy(StrategyBase):
    async def build_components(self, builder): ...
    async def ainvoke(
            self,
            items: list[TTCItem],
            original_prompt: str | None = None,
            agent_context:  str | None = None,
    ) -> list[TTCItem]:
        ...

Implementation hint: Use the Builder helpers (get_llm, get_function, …) during build_components to resolve references once and cache them.

`TTCItem`#

A single, interoperable record passed between stages.

Field	Meaning
`input`	Raw user task / tool `args`
`output`	Generated answer / tool result
`plan`	Execution plan (planning pipelines)
`feedback`	Review comments from editing stages
`score`	Numeric quality metric
`metadata`	Arbitrary auxiliary data
`name`	Tool name or other identifier

Because it is a pydantic.BaseModel, you get .model_dump() and validation for free.

Built‑in Strategies#

Below is a non‑exhaustive catalog you can use immediately; refer to the inline doc‑strings for full parameter lists.

Category	`Config` class	One‑liner
Search	`SingleShotMultiPlanConfig`	Few‑shot prompt that emits n candidate plans at different temperatures.
	`MultiLLMPlanConfig`	Query multiple LLMs in parallel, then concatenate plans.
	`MultiQueryRetrievalSearchConfig`	Reformulate a retrieval query from diverse perspectives.
Editing	`IterativePlanRefinementConfig`	Loop: plan → critique → edit.
	`LLMAsAJudgeEditorConfig`	“Feedback LLM + editing LLM” cooperative refinement.
	`MotivationAwareSummarizationConfig`	Grounded summary that respects user’s “motivation”.
Scoring	`LLMBasedPlanScoringConfig`	Judge execution plans on a 1‑10 scale.
	`LLMBasedAgentScoringConfig`	Judge final agent answers.
	`MotivationAwareScoringConfig`	Score w.r.t. task + motivation context.
Selection	`BestOfNSelectionConfig`	Keep the highest‑scoring item.
	`ThresholdSelectionConfig`	Filter by score ≥ τ.
	`LLMBasedPlanSelectionConfig` / …AgentOutput… / …OutputMerging…	Let an LLM choose or merge.

Pre‑Built TTC Functions#

NeMo Agent toolkit ships higher‑level wrappers that hide all orchestration.

Function	Use‑case
`ttc_tool_wrapper_function`	Turn an arbitrary function into a tool; the wrapper asks an LLM to translate free‑text into structured arguments.
`ttc_tool_orchestration_function`	Accepts a list of tool invocations, optionally runs search/edit/score/select, then executes each tool concurrently.
`execute_score_select_function`	Run a function k times, score each output, pick the best.
`plan_select_execute_function`	End‑to‑end: plan → optionally edit/score → select plan → feed downstream agent.

These are declared in nat.experimental.test_time_compute.functions.* and can be referenced in your Config just like any other function.

Creating and Registering a New Strategy#

Follow the steps below to create and register a new strategy.

Define a config model.

class MyStrategyConfig(TTCStrategyBaseConfig, name="my_strategy"):
    my_param: float = 0.5

Implement the strategy

from nat.experimental.test_time_compute.models.strategy_base import StrategyBase
class MyStrategy(StrategyBase):
    ...

from nat.cli.register_workflow import register_ttc_strategy

@register_ttc_strategy(config_type=MyStrategyConfig)
async def register_my_strategy(cfg: MyStrategyConfig, builder: Builder):
    strat = MyStrategy(cfg)
    await strat.build_components(builder)
    yield strat

Your strategy is now discoverable by TypeRegistry and can be referenced in Config fields.

Composing Strategies in a `Config`#

TTC Strategies can be part of workflow configurations, just like other components such as LLMs. For example, the following configuration excerpt shows how an TTC strategy can be configured in a config.yml file and used in a workflow function:

ttc_strategies:
  selection_strategy:
    _type: llm_based_agent_output_merging
    selection_llm: nim_llm

workflow:
  _type: execute_score_select_function
  selector: selection_strategy
  augmented_fn: react_agent_executor
  num_executions: 3

Extending Tools and Pipelines#

Multiple stages: Nothing stops you from chaining search → edit → search again, as long as each stage returns List[TTCItem].
Streaming: Strategies themselves are non‑streaming, but you can wrap a streaming LLM in an TTC pipeline by choosing an appropriate pre‑built function such as plan_select_execute_function, which keeps streaming support if the downstream agent streams.
Debugging: Log levels are respected through the standard logging module; export NAT_LOG_LEVEL=DEBUG for verbose traces, including every intermediate TTCItem.

Testing your strategy#

Write isolated unit tests by instantiating your config and strategy directly, then call ainvoke with hand‑crafted TTCItem lists. Refer to the companion tests/ directory for reference tests on ThresholdSelector and BestOfNSelector.

Happy scaling!