For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
  • About NVIDIA NeMo Relay
    • Overview
    • Architecture
    • Ecosystem
    • Concepts
    • Release Notes
  • Getting Started
    • Agent Runtime Primer
    • Prerequisites
    • Installation
    • Configuration / Setup
    • Quick Start
  • NVIDIA NeMo Relay CLI
    • About
    • Basic Usage
    • Claude Code
    • Codex
    • Cursor
    • Hermes Agent
  • Supported Integrations
    • About
    • OpenClaw Plugin Guide
    • LangChain Integration Guide
    • LangGraph Integration Guide
    • Deep Agents Integration Guide
  • Instrument Applications
    • About
    • Adding Scopes and Marks
    • Instrument a Tool Call
    • Instrument an LLM Call
    • Add Middleware
    • Code Examples
  • Observability Plugin
    • About
    • Configuration
    • Agent Trajectory Interchange Format (ATIF)
    • Agent Trajectory Observability Format (ATOF)
    • OpenTelemetry
    • OpenInference
  • Adaptive Plugin
    • About
    • Configuration
    • Adaptive Cache Governor (ACG)
    • Adaptive Hints
  • NeMo Guardrails Plugin
    • About
    • Configuration
  • Integrate into Frameworks
    • About
    • Adding Scopes
    • Wrap Tool Calls
    • Wrap LLM Calls
    • Handle Non-Serializable Data
    • Using Codecs
    • Provider Codecs
    • Provider Response Codecs
    • Code Examples
  • Build Plugins
    • About
    • Define a Plugin
    • Validate Plugin Configuration
    • Plugin Configuration Files
    • Register Plugin Behavior
    • Design Plugin Configuration
    • NeMo Guardrails Example Plugin
    • Code Examples
  • Contribute
    • About
    • Development Setup
    • Workflow and Reviews
    • Testing and Documentation
  • Reference
    • APIs
    • Performance
  • Resources
    • Support and FAQs
    • Glossary
    • Troubleshooting Guide
    • Community
    • Legal
NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Your Privacy Choices | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.

LogoLogo
On this page
  • plugins.toml Example
  • Plugin Configuration
  • Manual API
  • Fields
  • Expected Output
  • Common Validation Failures
Adaptive Plugin

Adaptive Cache Governor (ACG)

||View as Markdown|
Previous

Adaptive Configuration

Next

Adaptive Hints

Use the Adaptive Cache Governor (ACG) when repeated LLM requests contain stable prompt sections that can benefit from provider prompt caching.

ACG decomposes LLM requests into Prompt IR, scores block stability across observed runs, and plans provider-specific prompt-cache breakpoints. The acg section is optional. Omit it to keep cache planning disabled.

plugins.toml Example

1version = 1
2
3[[components]]
4kind = "adaptive"
5enabled = true
6
7[components.config]
8version = 1
9agent_id = "planner"
10
11[components.config.state.backend]
12kind = "in_memory"
13
14[components.config.telemetry]
15subscriber_name = "adaptive.telemetry"
16learners = ["acg"]
17
18[components.config.acg]
19provider = "anthropic"
20observation_window = 100
21priority = 50
22
23[components.config.acg.stability_thresholds]
24stable_threshold = 0.95
25semi_stable_threshold = 0.50
26min_observations_for_full_confidence = 20

This configuration enables adaptive telemetry and configures ACG to plan cache breakpoints for Anthropic-style request surfaces after it has enough observed prompt samples.

Plugin Configuration

Use plugin configuration when the application should let NeMo Relay own the Adaptive Cache Governor (ACG) runtime lifecycle.

Python
Node.js
Rust
1import nemo_relay
2
3adaptive_config = nemo_relay.adaptive.AdaptiveConfig(
4 agent_id="planner",
5 state=nemo_relay.adaptive.StateConfig(
6 backend=nemo_relay.adaptive.BackendSpec.in_memory(),
7 ),
8 telemetry=nemo_relay.adaptive.TelemetryConfig(learners=["acg"]),
9 acg=nemo_relay.adaptive.AcgConfig(provider="anthropic"),
10)
11
12plugin_config = nemo_relay.plugin.PluginConfig(
13 components=[nemo_relay.adaptive.ComponentSpec(adaptive_config)]
14)
15
16report = nemo_relay.plugin.validate(plugin_config)
17if any(diagnostic["level"] == "error" for diagnostic in report["diagnostics"]):
18 raise RuntimeError(report["diagnostics"])
19
20await nemo_relay.plugin.initialize(plugin_config)
21try:
22 # Run instrumented application work here.
23 pass
24finally:
25 nemo_relay.plugin.clear()

Manual API

Use the manual runtime API when an integration needs to own adaptive lifecycle directly instead of activating the top-level plugin component.

Python
Node.js
Rust
1import nemo_relay
2
3adaptive_config = nemo_relay.adaptive.AdaptiveConfig(
4 agent_id="planner",
5 state=nemo_relay.adaptive.StateConfig(
6 backend=nemo_relay.adaptive.BackendSpec.in_memory(),
7 ),
8 telemetry=nemo_relay.adaptive.TelemetryConfig(learners=["acg"]),
9 acg=nemo_relay.adaptive.AcgConfig(provider="anthropic"),
10)
11
12runtime = nemo_relay.adaptive.AdaptiveRuntime(adaptive_config.to_dict())
13await runtime.register()
14try:
15 # Run instrumented application work here.
16 runtime.wait_for_idle()
17finally:
18 await runtime.shutdown()

Fields

FieldDefaultNotes
providerpassthroughpassthrough, anthropic, or openai.
observation_window100Rolling Prompt IR sample window for stability analysis.
priority50LLM execution intercept priority. Lower values run earlier.
stability_thresholds.stable_threshold0.95Minimum effective score classified as stable.
stability_thresholds.semi_stable_threshold0.50Minimum effective score classified as semi-stable.
stability_thresholds.min_observations_for_full_confidence20Observation count required for full confidence.

Use passthrough when you want ACG observations without provider-specific hint translation. Set provider to the backend API surface the agent actually calls when you are ready to apply cache planning.

Expected Output

When ACG is active, instrumented LLM calls still return the same application result. ACG records observations and, when enough stable prompt structure is available, emits adaptive diagnostics and cache-planning decisions through the adaptive runtime.

Provider-specific cache hints are useful only when the request surface supports them. Validate against representative LLM traffic before enabling ACG in production.

Common Validation Failures

  • provider is not one of passthrough, anthropic, or openai.
  • Stability thresholds are outside the supported numeric range.
  • ACG is enabled before the application emits managed LLM events.
  • The configured provider does not match the real model API surface.