For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
  • About NVIDIA NeMo Relay
    • Overview
    • Architecture
    • Ecosystem
    • Concepts
    • Release Notes
  • Getting Started
    • Agent Runtime Primer
    • Prerequisites
    • Installation
    • Configuration / Setup
    • Quick Start
  • NVIDIA NeMo Relay CLI
    • About
    • Basic Usage
    • Claude Code
    • Codex
    • Cursor
    • Hermes Agent
  • Supported Integrations
    • About
    • OpenClaw Plugin Guide
    • LangChain Integration Guide
    • LangGraph Integration Guide
    • Deep Agents Integration Guide
  • Instrument Applications
    • About
    • Adding Scopes and Marks
    • Instrument a Tool Call
    • Instrument an LLM Call
    • Add Middleware
    • Code Examples
  • Observability Plugin
    • About
    • Configuration
    • Agent Trajectory Interchange Format (ATIF)
    • Agent Trajectory Observability Format (ATOF)
    • OpenTelemetry
    • OpenInference
  • Adaptive Plugin
    • About
    • Configuration
    • Adaptive Cache Governor (ACG)
    • Adaptive Hints
  • NeMo Guardrails Plugin
    • About
    • Configuration
  • Integrate into Frameworks
    • About
    • Adding Scopes
    • Wrap Tool Calls
    • Wrap LLM Calls
    • Handle Non-Serializable Data
    • Using Codecs
    • Provider Codecs
    • Provider Response Codecs
    • Code Examples
  • Build Plugins
    • About
    • Define a Plugin
    • Validate Plugin Configuration
    • Plugin Configuration Files
    • Register Plugin Behavior
    • Design Plugin Configuration
    • NeMo Guardrails Example Plugin
    • Code Examples
  • Contribute
    • About
    • Development Setup
    • Workflow and Reviews
    • Testing and Documentation
  • Reference
    • APIs
      • Python Library Reference
      • Node.js Library Reference
      • Rust Library Reference
        • nemo-relay
          • api
          • codec
            • anthropic
            • openai_chat
            • openai_responses
            • pricing
              • CacheReadAccounting
              • ModelPricing
              • PricingCatalog
              • PricingCatalogError
              • PricingConfig
              • PricingResolver
              • TokenRateTier
              • PricingSource
              • PricingSourceConfig
              • PricingUnit
              • PromptCachePricing
              • TokenPricingRates
              • RateScheduleApplication
              • active_pricing_resolver
              • TokenRateSchedule
              • attach_estimated_cost
              • attach_estimated_cost_for_provider
              • estimate_cost
              • estimate_cost_for_provider
              • estimate_cost_with_catalog
              • estimate_cost_with_provider
              • infer_model_provider
              • pricing_for_model
              • pricing_for_provider
              • reset_active_pricing_resolver
              • set_active_pricing_resolver
            • request
            • response
            • streaming
            • traits
          • config_editor
          • error
          • json
          • observability
          • plugin
          • plugins
          • stream
          • editor_config
        • nemo-relay-adaptive
        • nemo-relay-ffi
    • Performance
  • Resources
    • Support and FAQs
    • Glossary
    • Troubleshooting Guide
    • Community
    • Legal
NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Your Privacy Choices | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.

LogoLogo
On this page
  • Structs
  • Enums
  • Traits
  • Functions
ReferenceAPIsRust Library Referencenemo-relaycodec

Module pricing

||View as Markdown|
Previous

Struct OpenAI Responses Streaming Codec

Next

Enum Cache Read Accounting

Generated from cargo doc --no-deps -p nemo-relay -p nemo-relay-adaptive -p nemo-relay-ffi.

Data-driven LLM model pricing used to layer cost estimates onto usage.

Pricing is deliberately separate from response normalization so adding providers, aliases, or cache-accounting rules does not require editing AnnotatedLlmResponse.

Structs

  • ModelPricing: Per-token pricing for a model, expressed in USD per one million tokens.
  • PricingCatalog: Collection of model pricing entries.
  • PricingConfig: Runtime pricing resolver configuration.
  • PricingResolver: Ordered pricing lookup chain.
  • PromptCachePricing: Prompt-cache accounting rules for a model pricing entry.
  • TokenPricingRates: Token rates expressed as USD per one million tokens.
  • TokenRateTier: A token pricing tier selected by prompt/input token count.

Enums

  • CacheReadAccounting: How cache-read tokens relate to prompt token counts in provider usage.
  • PricingCatalogError: Errors produced while parsing or validating a pricing catalog.
  • PricingSourceConfig: Declarative pricing source supported by Relay configuration.
  • PricingUnit: Billing unit represented by a pricing entry.
  • RateScheduleApplication: How a selected rate-schedule tier applies to billable usage.
  • TokenRateSchedule: Data-driven token rate schedule for provider pricing with request thresholds.

Traits

  • PricingSource: Pluggable pricing source interface.

Functions

  • active_pricing_resolver: Returns the active process-wide pricing resolver.
  • attach_estimated_cost: Adds a model-pricing estimate to a normalized response when cost is missing.
  • attach_estimated_cost_for_provider: Adds a provider-aware model-pricing estimate to a normalized response when cost is missing.
  • estimate_cost: Estimates USD cost for a model/usage pair when pricing is known.
  • estimate_cost_for_provider: Estimates USD cost for a provider/model pair when pricing is known.
  • estimate_cost_with_catalog: Estimates USD cost using the provided catalog.
  • estimate_cost_with_provider: Estimates USD cost using the provided catalog and provider/model pair.
  • infer_model_provider: Infers a provider/route value for a decoded model.
  • pricing_for_model: Returns known pricing for a model ID.
  • pricing_for_provider: Returns known pricing for a provider/model pair.
  • reset_active_pricing_resolver: Restores the active process-wide pricing resolver to an empty resolver.
  • set_active_pricing_resolver: Replaces the active process-wide pricing resolver.