Adaptive Cache Governor (ACG)

Use the Adaptive Cache Governor (ACG) when repeated LLM requests contain stable prompt sections that can benefit from provider prompt caching.

ACG decomposes LLM requests into Prompt IR, scores block stability across observed runs, and plans provider-specific prompt-cache breakpoints. The acg section is optional. Omit it to keep cache planning disabled.

`plugins.toml` Example

1 version = 1
2 
3 [[components]]
4 kind = "adaptive"
5 enabled = true
6 
7 [components.config]
8 version = 1
9 agent_id = "planner"
10 
11 [components.config.state.backend]
12 kind = "in_memory"
13 
14 [components.config.telemetry]
15 subscriber_name = "adaptive.telemetry"
16 learners = ["acg"]
17 
18 [components.config.acg]
19 provider = "anthropic"
20 observation_window = 100
21 priority = 50
22 
23 [components.config.acg.stability_thresholds]
24 stable_threshold = 0.95
25 semi_stable_threshold = 0.50
26 min_observations_for_full_confidence = 20

This configuration enables adaptive telemetry and configures ACG to plan cache breakpoints for Anthropic-style request surfaces after it has enough observed prompt samples.

Plugin Configuration

Use plugin configuration when the application should let NeMo Relay own the Adaptive Cache Governor (ACG) runtime lifecycle.

Python

Node.js

Rust

1 import nemo_relay
2 
3 adaptive_config = nemo_relay.adaptive.AdaptiveConfig(
4     agent_id="planner",
5     state=nemo_relay.adaptive.StateConfig(
6         backend=nemo_relay.adaptive.BackendSpec.in_memory(),
7     ),
8     telemetry=nemo_relay.adaptive.TelemetryConfig(learners=["acg"]),
9     acg=nemo_relay.adaptive.AcgConfig(provider="anthropic"),
10 )
11 
12 plugin_config = nemo_relay.plugin.PluginConfig(
13     components=[nemo_relay.adaptive.ComponentSpec(adaptive_config)]
14 )
15 
16 report = nemo_relay.plugin.validate(plugin_config)
17 if any(diagnostic["level"] == "error" for diagnostic in report["diagnostics"]):
18     raise RuntimeError(report["diagnostics"])
19 
20 await nemo_relay.plugin.initialize(plugin_config)
21 try:
22     # Run instrumented application work here.
23     pass
24 finally:
25     nemo_relay.plugin.clear()

Manual API

Use the manual runtime API when an integration needs to own adaptive lifecycle directly instead of activating the top-level plugin component.

Python

Node.js

Rust

1 import nemo_relay
2 
3 adaptive_config = nemo_relay.adaptive.AdaptiveConfig(
4     agent_id="planner",
5     state=nemo_relay.adaptive.StateConfig(
6         backend=nemo_relay.adaptive.BackendSpec.in_memory(),
7     ),
8     telemetry=nemo_relay.adaptive.TelemetryConfig(learners=["acg"]),
9     acg=nemo_relay.adaptive.AcgConfig(provider="anthropic"),
10 )
11 
12 runtime = nemo_relay.adaptive.AdaptiveRuntime(adaptive_config.to_dict())
13 await runtime.register()
14 try:
15     # Run instrumented application work here.
16     runtime.wait_for_idle()
17 finally:
18     await runtime.shutdown()

Fields

Field	Default	Notes
`provider`	`passthrough`	`passthrough`, `anthropic`, or `openai`.
`observation_window`	`100`	Rolling Prompt IR sample window for stability analysis.
`priority`	`50`	LLM execution intercept priority. Lower values run earlier.
`stability_thresholds.stable_threshold`	`0.95`	Minimum effective score classified as stable.
`stability_thresholds.semi_stable_threshold`	`0.50`	Minimum effective score classified as semi-stable.
`stability_thresholds.min_observations_for_full_confidence`	`20`	Observation count required for full confidence.

Use passthrough when you want ACG observations without provider-specific hint translation. Set provider to the backend API surface the agent actually calls when you are ready to apply cache planning.

Expected Output

When ACG is active, instrumented LLM calls still return the same application result. ACG records observations and, when enough stable prompt structure is available, emits adaptive diagnostics and cache-planning decisions through the adaptive runtime.

Provider-specific cache hints are useful only when the request surface supports them. Validate against representative LLM traffic before enabling ACG in production.

Common Validation Failures

provider is not one of passthrough, anthropic, or openai.
Stability thresholds are outside the supported numeric range.
ACG is enabled before the application emits managed LLM events.
The configured provider does not match the real model API surface.

1	version = 1
2
3	[[components]]
4	kind = "adaptive"
5	enabled = true
6
7	[components.config]
8	version = 1
9	agent_id = "planner"
10
11	[components.config.state.backend]
12	kind = "in_memory"
13
14	[components.config.telemetry]
15	subscriber_name = "adaptive.telemetry"
16	learners = ["acg"]
17
18	[components.config.acg]
19	provider = "anthropic"
20	observation_window = 100
21	priority = 50
22
23	[components.config.acg.stability_thresholds]
24	stable_threshold = 0.95
25	semi_stable_threshold = 0.50
26	min_observations_for_full_confidence = 20

1	import nemo_relay
2
3	adaptive_config = nemo_relay.adaptive.AdaptiveConfig(
4	agent_id="planner",
5	state=nemo_relay.adaptive.StateConfig(
6	backend=nemo_relay.adaptive.BackendSpec.in_memory(),
7	),
8	telemetry=nemo_relay.adaptive.TelemetryConfig(learners=["acg"]),
9	acg=nemo_relay.adaptive.AcgConfig(provider="anthropic"),
10	)
11
12	plugin_config = nemo_relay.plugin.PluginConfig(
13	components=[nemo_relay.adaptive.ComponentSpec(adaptive_config)]
14	)
15
16	report = nemo_relay.plugin.validate(plugin_config)
17	if any(diagnostic["level"] == "error" for diagnostic in report["diagnostics"]):
18	raise RuntimeError(report["diagnostics"])
19
20	await nemo_relay.plugin.initialize(plugin_config)
21	try:
22	# Run instrumented application work here.
23	pass
24	finally:
25	nemo_relay.plugin.clear()

plugins.toml Example

Plugin Configuration

Python

Node.js

Rust

Manual API

Python

Node.js

Rust

Fields

Expected Output

Common Validation Failures

plugins.toml Example

Plugin Configuration

Python

Node.js

Rust

Manual API

Python

Node.js

Rust

Fields

Expected Output

Common Validation Failures

`plugins.toml` Example

`plugins.toml` Example