Skip to main content
country_code
Ctrl+K
NVIDIA Cloud Functions - Home NVIDIA Cloud Functions - Home

NVIDIA Cloud Functions

NVIDIA Cloud Functions - Home NVIDIA Cloud Functions - Home

NVIDIA Cloud Functions

Table of Contents

Before You Deploy

  • Infrastructure Sizing
  • Artifact Manifest

Deployment

  • Deployment
    • Image Mirroring
    • EKS Cluster Terraform (Optional)
    • Helmfile Installation
    • Helm Chart Installation
      • Prerequisites and Configuration
      • Phase 1: Infrastructure Dependencies
      • Phase 2: Core Services
      • Phase 3: Gateway and Ingress

GPU Cluster Setup

  • Overview
  • Self-Managed Clusters

Configuration

  • Gateway Routing and DNS
  • Working with Third-Party Registries
  • NVCA Configuration
  • Optional GPU Cluster Enhancements
    • LLS Installation
    • Container Cache
    • Distributed Shader Cache (GXCache)
    • Simulation Cluster Caches
      • Derived Data Cache Service (DDCS)
        • DDCS: Configure
        • DDCS: Deployment
        • DDCS: TLS Configuration
      • USD Content Cache (UCC)
        • UCC: Configure
        • UCC: Deployment
        • UCC: TLS Configuration
  • KAI Scheduler Integration Guide

Using Cloud Functions

  • API
  • Service Keys
  • Function Creation
  • Container-Based Function Creation
  • Helm-Based Function Creation
  • Low Latency Streaming (LLS/WebRTC) Functions
  • Self-hosted CLI

Observability

  • Observability Configuration
    • Self-Serve Metrics Documentation
      • C* Metrics
      • ESS Metrics
      • Init Container Metrics
      • Invocation Service Metrics
      • NVCF API Metrics
      • SIS/SPOT Metrics
      • State Metrics Service Metrics
      • Utils Container Metrics
      • Vault (implemented as OpenBao) Metrics
    • Example Dashboards Deployment

Operations

  • Control Plane Operations
    • MEK (Master Encryption Key) Rotation
  • Monitoring & Observability
  • Troubleshooting / FAQ

Runbooks

  • Runbooks
    • Caches Runbook
    • DDCS: No Active Shards
    • DDCS: Cache Misses and Performance Degradation
    • DDCS: Disk Space Exhaustion
    • DDCS: RocksDB Corruption or Failures
    • DDCS: Network Bandwidth or Latency Bottlenecks
    • UCC: Connection Saturation and High Response Times
    • UCC: Metadata Cache Undersizing
    • UCC: Data Disk Bandwidth Bottlenecks
    • UCC: Upstream S3 Connection Spikes and High Connect Time
    • UCC: Network Bandwidth Saturation

Reference

  • Helm Values Reference
  • Self-Managed NVCF gRPC Load Test
  • gRPC Load Test SLI Guide
  • Self-Managed NVCF HTTP Load Test
  • HTTP Load Test SLI Guide
  • Self-Managed NVCF HTTP Soak Test

Development

  • Local Development (k3d)
  • Fake GPU Operator (Development / Testing)
  • Observability Configuration
  • Self-Serve Metrics Documentation
  • Vault (implemented as OpenBao) Metrics
Is this page helpful?

Vault (implemented as OpenBao) Metrics#

Metric name

Metric type

Source

Description

Unit (where applicable)

Interesting Labels

Required Filters (where applicable)

For open telemetry metrics available, see https://openbao.org/docs/internals/telemetry/metrics/

previous

Utils Container Metrics

next

Example Dashboards Deployment

NVIDIA NVIDIA
Privacy Policy | Your Privacy Choices | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2024-2026, NVIDIA Corporation.

Last updated on Apr 23, 2026.