Is this page helpful?

UCC: Configure#

This guide provides detailed information on the USD Content Cache (UCC) component and its use in a self-hosted NVCF cluster.

UCC provides a content store running as an intermediary between object storage and Cloud Function clients. It caches USD (Universal Scene Description) content to accelerate scene loading and reduce bandwidth usage. UCC acts as a reverse proxy cache, intercepting requests for USD content from object storage sources like S3, Azure Blob Storage, or NVIDIA Omniverse Nucleus, and serving cached content when available.

Cached content reduces egress costs from cloud storage providers and improves scene load times. When content is requested, UCC first checks its local cache. On a cache hit, content is served directly from persistent storage. On a cache miss, UCC fetches the content from the upstream source, caches it locally, and serves it to the requesting client.

Base Configuration#

UCC requires some configuration to be properly installed. Create a file on your local machine called values.yaml. A base configuration is provided in the following dropdown.

values.yaml#

replicaCount: 3

image:
  pullSecrets:
    - name: ngc-container-pull

persistence:
  storageClassName: "gp3"
  volumes:
    - name: az
      path: /proxy_cache_az
      sizeGi: 256
      minFreeSizePercentage: 7
    - name: s3
      path: /proxy_cache_s3
      sizeGi: 256
      minFreeSizePercentage: 7
    - name: nucleus
      path: /proxy_cache_nucleus
      sizeGi: 256
      minFreeSizePercentage: 7

nginx:
  proxyCache:
    validity:
      "200": "1d"
      "206": "1d"
  backends:
    azure:
      include: true
      cacheTtl: 30
    s3:
      include: true
      cacheTtl: 30

metrics:
  prometheus:
    enabled: true
    serviceMonitor:
      enabled: false

tls:
  enabled: false

Complete Configuration Reference

The base configuration above covers the essential settings for most deployments. For advanced configuration options or to explore all available settings, refer to the complete values file below. This reference includes all configuration options available in the UCC Helm chart, including advanced settings for TLS, OpenTelemetry, and resource management.

Complete UCC values.yaml reference#

replicaCount: 3

image:
  registry: nvcr.io
  repository: nvidia/omniverse/usd-content-cache
  pullPolicy: IfNotPresent
  # Default tag is latest, often overridden by CI
  tag: "0.1.0"
  pullSecrets:
    # Name of the Kubernetes secret needed to pull from the GitLab registry.
    - name: gitlab-registry-secret

# Service configuration for accessing the cache.
# Port 14128 avoids issues with privileged port 443 (HTTPS), which non-root containers
# can't bind to, preventing permission errors.
service:
  ucc:
    type: ClusterIP
    port: 14128
    containerPort: 14128
    annotations: {}
    # Optionally specify a specific IP for the LoadBalancer.
    loadBalancerIP: null
  nucleus:
    # Nucleus service configuration for Large File Transfer (LFT) operations.
    type: ClusterIP
    port: 14129
    containerPort: 14129
    annotations: {}
    # Optionally specify a specific IP for the LoadBalancer.
    loadBalancerIP: null

# TLS configuration for HTTPS.
tls:
  # Enables HTTPS for the service.
  enabled: true
  # Name of the Kubernetes Secret containing tls.crt and tls.key
  # For local Minikube testing, you'll need to create this secret:
  #   - openssl req -x509 -nodes -days 365 -newkey rsa:2048 -keyout /tmp/tls.key -out /tmp/tls.crt -subj "/CN=ucc.local.test/O=MyOrg"
  #   - kubectl create secret tls usd-content-cache-tls --key /tmp/tls.key --cert /tmp/tls.crt
  #   - rm /tmp/tls.key /tmp/tls.crt
  secretName: "usd-content-cache-tls"

# Persistence Volume Claim (PVC) configuration.
persistence:
  storageClassName: "managed-csi-premium"
  # Defines claims for the volumes declared in the "/docker/openresty.Dockerfile" file.
  # `name` is used as the logical key to link with the corresponding "nginx.proxyCache.paths" entry.
  volumes:
    - name: az
      path: /proxy_cache_az
      sizeGi: 50
      minFreeSizePercentage: 7
      # Override storage class for this volume.
      # storageClassName: null
    - name: s3
      path: /proxy_cache_s3
      sizeGi: 50
      minFreeSizePercentage: 7
    - name: nucleus
      path: /proxy_cache_nucleus
      sizeGi: 50
      minFreeSizePercentage: 7

# NGINX/OpenResty cache behavior settings.
nginx:
  workerConnections: 1024
  proxyCache:
    # List the HTTP codes and how long they should be cached.
    # https://nginx.org/en/docs/syntax.html
    validity:
      "200": "1d"
      "206": "1d"
    # List defining NGINX "proxy_cache_path" directives.
    # `name` is NGINX keys_zone name and must match a `name` in "persistence.volumes".
    # `path` specifies the NGINX cache directory on disk and must reside within a "persistence.volumes".
    paths:
      - name: az
        path: /proxy_cache_az/ucc_data
        # maxSizeGi: 90
        maxIdleTime: 1d
        metadataMemorySize: 10m
      - name: s3
        path: /proxy_cache_s3/ucc_data
        maxIdleTime: 1d
        metadataMemorySize: 10m
      - name: nucleus
        path: /proxy_cache_nucleus/ucc_data
        maxIdleTime: 1d
        metadataMemorySize: 10m
  logging:
    buffer: 1m
    flushInterval: 10s
  sharedMemory:
    limits:
      # TODO DOC
      presignedUrlCache: "1024m"
      # TODO DOC
      lockByRequestUriTable: "1024m"
  resolver:
    config: "local=on ipv6=off"
    timeout: 8s
  # List of backends supported.
  # The `proxyCacheName` must match a `name` in "nginx.proxyCache.paths.name".
  backends:
    azure:
      include: true
      allowCacheReset: false
      serverName: ~^(?<container_name>[^\.]+)\.blob\.core\.windows\.net$
      proxyCacheName: az
      proxyPass: $scheme://$host
      proxyAuthPass: $scheme://$host
      cacheTtl: 30
    s3:
      include: true
      allowCacheReset: false
      serverName: ~^[^.]+\.s3\.[^.]+\.amazonaws\.com$
      proxyCacheName: s3
      proxyPass: $scheme://$host
      proxyAuthPass: $scheme://$host
      cacheTtl: 30
    nucleus:
      allowCacheReset: true
      # Default server - catches any hostname not matched by other servers.
      serverName: _
      proxyCacheName: nucleus
      proxyPass: $scheme://$host
      proxyAuthPass: null
      # DNS resolver for Nucleus Bridge connectivity.
      # Each Nucleus Connectivity configuration is setup into the DNS, reuse it to access the bridge.
      # Uses global resolver if not specified.
      bridgeResolver: null
  lua:
    # When true, Lua scripts can be injected into the running containers using a configmap.
    #
    # To simplify the creation of the configmap see the script at `scripts/debug/lua_configmap.sh`
    # See for more information: `./scripts/debug/lua_configmap.sh -h`
    # Example: `./scripts/debug/lua_configmap.sh -x -n namespace lua-access.lua`
    #    Note: include '-x' option to execute the statement
    debug: false
    # When true Lua scripts are reloaded for each execution. This setting will only take
    # affect when 'debug' mode is also enabled.
    noCache: false
    # The name of the configmap to look for. Will be prefixed with the full name of the deployment.
    configMap: debug-lua
    # Maps keys in the configmap to file names that will be placed in `/etc/nginx/conf.d/lua/`
    files: [lua-access.lua, lua-metrics.lua, lua-generation.lua]

# Resource requests and limits (define later for specific environments)
resources: {}
  # limits:
  #   cpu: 2000m
  #   memory: 4Gi
  # requests:
  #   cpu: 1000m
  #   memory: 2Gi

# Node selection, tolerations, affinity (define later if needed)
nodeSelector: {}
tolerations: []
affinity: {}

# Configuration options related to how Container Cache Metrics are configured.
metrics:
  # Creates a configmap entry that will contain the json for the UCC grafana dashboard.
  # Useful in cases where `kube-prometheus-stack` is used for monitoring.
  dashboard:
    # When true the configmap value will be created.
    include: false
    # Allows the configmap to be placed in a different namespace.
    namespace:
      override: false
      value: "default"
    # Addtional labels for the configmap for the operator to pick up.
    labels:
      label: false
  # Size for storing cache metrics in memory.
  cacheMetricsStorageSize: 16m
  # Configure the histogram buckets used for metrics.
  # Must be a comma-separated list of positive integers.
  buckets:
    byte: 1000,2000,4000,8000,16000,32000,64000,128000,256000,1048576, 10485760, 104857600, 1000000000, 10000000000, 100000000000
    time: 0.001, 0.002, 0.003, 0.005, 0.01, 0.02, 0.03, 0.05, 0.075, 0.1, 0.2, 0.3, 0.4, 0.5, 0.75, 1, 1.5, 2, 3, 4, 5, 10

  prometheus:
    # Enables a ServiceMonitor and Service to export prometheus metrics
    enabled: true
    port: 9145
    serviceMonitor:
      enabled: false
      interval: 5s
      path: /cache_metrics
      port: cache-metrics
      scheme: http
      scrapeTimeout: 5s

otel:
  # Enable or disable import of otel module and collection off otel events.
  enabled: false
  # Additional headers to include in otel export.
  headers:
    #- header: X-API-Token
    #  value: "my-token-value"
  # interval (in seconds) in which to report buffered batches of OTEL events.
  interval: 5
  # The number of items in a batch.
  batchSize: 512
  # The number of batches to collect. If the buffer runs out of space, new events
  # are discarded
  batchCount: 4
  # The collector to send the otel events to.
  endpoint: localhost:4317
  # See `otel_trace_context` in https://nginx.org/en/docs/ngx_otel_module.html
  context: propagate
  # Settings for collection of a ratio of all events.
  ratio:
    # When true a portion of all events are collected.
    enabled: false
    # The index on which to ratio.
    source: otel_trace_id
    # The percent of total requests to report.
    percent: 10
  # Controls an OpenTelemetry collector/operator deployment that can be used to
  # deploy sidcars in each pod. This sidecar will collect the metrics from the running service
  # and sends them to another OTEL collector.
  #
  # If enabled, this option will disable the ServiceMonitor.
  collector:
    # When enabled `endpoint` must be `localhost:4317`.
    enabled: false
    batch:
    export:
      otlp:
        # Set this to the correct location of the collector in your cluster.
        endpoint: otel-collector.svc.svc.cluster.local:4317
        tls:
          insecure: true

1. Provision and Scale#

UCC replicas operate independently of each other and do not share cached data. Each pod maintains its own cache and distributes requests from clients. Multiple replicas provide high availability and can handle higher request volumes.

Important

UCC is not designed to scale up and down dynamically. Set the replica count based on your expected workload and maintain it consistently.

The number of UCC pods is controlled through the replicaCount value:

values.yaml#

replicaCount: 3

Recommended Replica Counts:

Small deployments: 1-2 replicas
Typical deployments: 3 replicas (recommended baseline)
Large deployments: 3-5 replicas

2. Storage Configuration#

UCC uses persistent volumes to cache USD content from upstream sources. Each configured backend (Azure Blob, S3, Nucleus) has its own persistent volume. Storage performance directly impacts cache hit performance and the speed at which cold assets can be served.

values.yaml#

persistence:
  storageClassName: "gp3"
  volumes:
    - name: az
      path: /proxy_cache_az
      sizeGi: 256
      minFreeSizePercentage: 7
    - name: s3
      path: /proxy_cache_s3
      sizeGi: 256
      minFreeSizePercentage: 7
    - name: nucleus
      path: /proxy_cache_nucleus
      sizeGi: 256
      minFreeSizePercentage: 7

storageClassName determines the performance characteristics of persistent volumes. Select a storage class that provides high IOPS and throughput suitable for cache workloads.

Storage Class Recommendations:

AWS (EKS): gp3 (recommended) or io1/io2 for higher performance
Azure (AKS): managed-csi-premium
On-Premises: Select a storage class backed by high-performance hardware

sizeGi determines the volume size for each cache backend. Plan cache size based on your workload:

Small workloads: 50-100GB per provider
Medium workloads: 100-500GB per provider
Large workloads: 500GB+ per provider

minFreeSizePercentage restricts cached content size by maintaining a minimum number of free bytes. The default value of 7 means the service will restrict cached content to 93% of the available capacity, leaving 7% free space for filesystem operations and preventing disk exhaustion.

3. Cache Configuration#

UCC cache behavior is controlled through NGINX proxy cache settings. These settings determine how long content is cached, how cache metadata is stored, and when cached content expires.

values.yaml#

nginx:
  proxyCache:
    validity:
      "200": "1d"
      "206": "1d"
  backends:
    azure:
      include: true
      cacheTtl: 30
    s3:
      include: true
      cacheTtl: 30

Cache Validity (validity):

Controls how long content is cached based on HTTP status codes. When the upstream source returns the specified HTTP status code, the result is kept for the given time period. Default values cache successful responses (200, 206) for 1 day.

Backend Configuration:

UCC supports multiple upstream backends. Each backend can be enabled or disabled, and has specific routing rules:

Azure Blob (azure): Matches Azure Blob Storage URLs using regex pattern
S3 (s3): Matches Amazon S3 URLs using regex pattern
Nucleus (nucleus): Default backend that catches any hostname not matched by other servers

Each backend configuration includes: * include: Enable or disable the backend * cacheTtl: Cache time-to-live in seconds

5. Telemetry#

UCC exports Prometheus metrics for monitoring cache performance, hit rates, and storage utilization. Collection of these metrics is important for diagnosing potential problems with UCC performance and optimizing cache configuration.

values.yaml#

metrics:
  prometheus:
    enabled: true
    serviceMonitor:
      enabled: false

tls:
  enabled: false

Metrics Configuration:

prometheus.enabled: When true, creates a Kubernetes service that exports metrics in Prometheus format at /cache_metrics
prometheus.serviceMonitor.enabled: When true, creates a ServiceMonitor resource for Prometheus Operator integration

Important

The ServiceMonitor CRD must be installed in the cluster for ServiceMonitor resources to work.

Summary#

This guide covered the configuration options for UCC, including replica scaling, storage sizing, memory allocation, cache behavior, and monitoring setup. Proper configuration of these settings is essential for optimal UCC performance in your self-hosted NVCF cluster.

Once you have prepared your values.yaml file with the appropriate configuration, proceed to the UCC: Deployment guide to deploy UCC using Helm.

For TLS configuration instructions, refer to the UCC: TLS Configuration guide.