> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/relay/llms.txt.
> For full documentation content, see https://docs.nvidia.com/nemo/relay/llms-full.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.nvidia.com/nemo/relay/_mcp/server.

# Adaptive Cache Governor (ACG)

Use the Adaptive Cache Governor (ACG) when repeated LLM requests contain stable
prompt sections that can benefit from provider prompt caching.

ACG decomposes LLM requests into Prompt IR, scores block stability across
observed runs, and plans provider-specific prompt-cache breakpoints. The `acg`
section is optional. Omit it to keep cache planning disabled.

## `plugins.toml` Example

```toml
version = 1

[[components]]
kind = "adaptive"
enabled = true

[components.config]
version = 1
agent_id = "planner"

[components.config.state.backend]
kind = "in_memory"

[components.config.telemetry]
subscriber_name = "adaptive.telemetry"
learners = ["acg"]

[components.config.acg]
provider = "anthropic"
observation_window = 100
priority = 50

[components.config.acg.stability_thresholds]
stable_threshold = 0.95
semi_stable_threshold = 0.50
min_observations_for_full_confidence = 20
```

This configuration enables adaptive telemetry and configures ACG to plan cache
breakpoints for Anthropic-style request surfaces after it has enough observed
prompt samples.

## Plugin Configuration

Use plugin configuration when the application should let NeMo Relay own the
Adaptive Cache Governor (ACG) runtime lifecycle.

```python
import nemo_relay

adaptive_config = nemo_relay.adaptive.AdaptiveConfig(
    agent_id="planner",
    state=nemo_relay.adaptive.StateConfig(
        backend=nemo_relay.adaptive.BackendSpec.in_memory(),
    ),
    telemetry=nemo_relay.adaptive.TelemetryConfig(learners=["acg"]),
    acg=nemo_relay.adaptive.AcgConfig(provider="anthropic"),
)

plugin_config = nemo_relay.plugin.PluginConfig(
    components=[nemo_relay.adaptive.ComponentSpec(adaptive_config)]
)

report = nemo_relay.plugin.validate(plugin_config)
if any(diagnostic["level"] == "error" for diagnostic in report["diagnostics"]):
    raise RuntimeError(report["diagnostics"])

await nemo_relay.plugin.initialize(plugin_config)
try:
    # Run instrumented application work here.
    pass
finally:
    nemo_relay.plugin.clear()
```

```js
const adaptive = require("nemo-relay-node/adaptive");
const plugin = require("nemo-relay-node/plugin");

const adaptiveConfig = adaptive.defaultConfig();
adaptiveConfig.agent_id = "planner";
adaptiveConfig.state = { backend: adaptive.inMemoryBackend() };
adaptiveConfig.telemetry = adaptive.telemetryConfig({ learners: ["acg"] });
adaptiveConfig.acg = adaptive.acgConfig({ provider: "anthropic" });

const pluginConfig = plugin.defaultConfig();
pluginConfig.components = [adaptive.ComponentSpec(adaptiveConfig)];

const report = plugin.validate(pluginConfig);
if (report.diagnostics.some((diagnostic) => diagnostic.level === "error")) {
  throw new Error(JSON.stringify(report.diagnostics));
}

await plugin.initialize(pluginConfig);
try {
  // Run instrumented application work here.
} finally {
  plugin.clear();
}
```

```rust
use nemo_relay::plugin::{initialize_plugins, validate_plugin_config, PluginConfig};
use nemo_relay_adaptive::plugin_component::ComponentSpec;
use nemo_relay_adaptive::{
    AcgComponentConfig, AdaptiveConfig, BackendSpec, StateConfig, TelemetryComponentConfig,
};

let mut adaptive = AdaptiveConfig::default();
adaptive.agent_id = Some("planner".into());
adaptive.state = Some(StateConfig {
    backend: BackendSpec::in_memory(),
});
adaptive.telemetry = Some(TelemetryComponentConfig {
    learners: vec!["acg".into()],
    ..TelemetryComponentConfig::default()
});
adaptive.acg = Some(AcgComponentConfig {
    provider: "anthropic".into(),
    ..AcgComponentConfig::default()
});

let mut plugin_config = PluginConfig::default();
plugin_config.components.push(ComponentSpec::new(adaptive).into());

let report = validate_plugin_config(&plugin_config);
assert!(!report.has_errors());

let active = initialize_plugins(plugin_config).await?;
```

## Manual API

Use the manual runtime API when an integration needs to own adaptive lifecycle
directly instead of activating the top-level plugin component.

```python
import nemo_relay

adaptive_config = nemo_relay.adaptive.AdaptiveConfig(
    agent_id="planner",
    state=nemo_relay.adaptive.StateConfig(
        backend=nemo_relay.adaptive.BackendSpec.in_memory(),
    ),
    telemetry=nemo_relay.adaptive.TelemetryConfig(learners=["acg"]),
    acg=nemo_relay.adaptive.AcgConfig(provider="anthropic"),
)

runtime = nemo_relay.adaptive.AdaptiveRuntime(adaptive_config.to_dict())
await runtime.register()
try:
    # Run instrumented application work here.
    runtime.wait_for_idle()
finally:
    await runtime.shutdown()
```

The Node.js binding exposes ACG through the adaptive plugin component helpers.
Use the Plugin Configuration example above when activating ACG from Node.js.

```rust
use nemo_relay_adaptive::{
    AcgComponentConfig, AdaptiveConfig, AdaptiveRuntime, BackendSpec, StateConfig,
    TelemetryComponentConfig,
};

let mut adaptive = AdaptiveConfig::default();
adaptive.agent_id = Some("planner".into());
adaptive.state = Some(StateConfig {
    backend: BackendSpec::in_memory(),
});
adaptive.telemetry = Some(TelemetryComponentConfig {
    learners: vec!["acg".into()],
    ..TelemetryComponentConfig::default()
});
adaptive.acg = Some(AcgComponentConfig {
    provider: "anthropic".into(),
    ..AcgComponentConfig::default()
});

let mut runtime = AdaptiveRuntime::new(adaptive).await?;
runtime.register().await?;

// Run instrumented application work here.

runtime.wait_for_idle();
runtime.shutdown().await?;
```

## Fields

| Field                                                       | Default       | Notes                                                       |
| ----------------------------------------------------------- | ------------- | ----------------------------------------------------------- |
| `provider`                                                  | `passthrough` | `passthrough`, `anthropic`, or `openai`.                    |
| `observation_window`                                        | `100`         | Rolling Prompt IR sample window for stability analysis.     |
| `priority`                                                  | `50`          | LLM execution intercept priority. Lower values run earlier. |
| `stability_thresholds.stable_threshold`                     | `0.95`        | Minimum effective score classified as stable.               |
| `stability_thresholds.semi_stable_threshold`                | `0.50`        | Minimum effective score classified as semi-stable.          |
| `stability_thresholds.min_observations_for_full_confidence` | `20`          | Observation count required for full confidence.             |

Use `passthrough` when you want ACG observations without provider-specific hint
translation. Set `provider` to the backend API surface the agent actually calls
when you are ready to apply cache planning.

## Expected Output

When ACG is active, instrumented LLM calls still return the same application
result. ACG records observations and, when enough stable prompt structure is
available, emits adaptive diagnostics and cache-planning decisions through the
adaptive runtime.

Provider-specific cache hints are useful only when the request surface supports
them. Validate against representative LLM traffic before enabling ACG in
production.

## Common Validation Failures

* `provider` is not one of `passthrough`, `anthropic`, or `openai`.
* Stability thresholds are outside the supported numeric range.
* ACG is enabled before the application emits managed LLM events.
* The configured provider does not match the real model API surface.