Adaptive Cache Governor (ACG)
Adaptive Cache Governor (ACG)
Adaptive Cache Governor (ACG)
Use the Adaptive Cache Governor (ACG) when repeated LLM requests contain stable prompt sections that can benefit from provider prompt caching.
ACG decomposes LLM requests into Prompt IR, scores block stability across
observed runs, and plans provider-specific prompt-cache breakpoints. The acg
section is optional. Omit it to keep cache planning disabled.
plugins.toml ExampleThis configuration enables adaptive telemetry and configures ACG to plan cache breakpoints for Anthropic-style request surfaces after it has enough observed prompt samples.
Use plugin configuration when the application should let NeMo Relay own the Adaptive Cache Governor (ACG) runtime lifecycle.
Use the manual runtime API when an integration needs to own adaptive lifecycle directly instead of activating the top-level plugin component.
Use passthrough when you want ACG observations without provider-specific hint
translation. Set provider to the backend API surface the agent actually calls
when you are ready to apply cache planning.
When ACG is active, instrumented LLM calls still return the same application result. ACG records observations and, when enough stable prompt structure is available, emits adaptive diagnostics and cache-planning decisions through the adaptive runtime.
Provider-specific cache hints are useful only when the request surface supports them. Validate against representative LLM traffic before enabling ACG in production.
provider is not one of passthrough, anthropic, or openai.