Distributed Shader Cache (GXCache)
Distributed Shader Cache (GXCache)
This guide provides detailed information on the Distributed Shader Cache (GXCache) component and its use in NVCF GPU clusters.
GXCache is a cloud-native shader cache for NVIDIA that provides distributed shader caching capabilities. It consists of the GXCache server and mutating webhook to optimize shader compilation and caching across your cluster.
GXCache improves rendering performance by caching compiled shaders, reducing compilation time for frequently used shaders and enabling faster scene loading and rendering.
Installation
Prerequisites
- A running Kubernetes cluster with
kubectlaccess - Helm >= 3.12
- Credentials for the registry where your NVCF charts and images are stored
Step 1. Authenticate Helm to your chart registry
Authenticate Helm to your OCI registry where the NVCF charts are stored:
Step 2. Create the namespace and image pull secret
Create an image pull secret so that pods can pull container images from your registry.
BYOC / NGC
Amazon ECR
Other Registry
The secret name nvcr-creds is referenced in the values file under
webhook.deployment.secret and service.deployment.imageSecret.
If you use a different secret name, update the values file to match.
Step 3. Create a values file
Create a values.yaml using the complete example in [Base Configuration] below.
- BYOC users pulling directly from NGC can use the
nvcr.io/nvidia/nvcf-byoc/image paths without mirroring. - Self-hosted users should replace the
<your-registry>/<your-repo>placeholders with their mirrored registry path (see the Artifact Manifest in the self-hosted installation guide).
At minimum, configure the storage class for your environment (e.g., gp3 for AWS EKS)
and the node selector for your GPU nodes.
Step 4. Install the chart
BYOC (NGC Helm Repo)
Self-Hosted (OCI)
Step 5. Verify the installation
Base Configuration
The following is a complete example values.yaml for deploying GXCache. Copy this file and
adjust values for your environment. Each section is explained in detail below.
- BYOC users can use the
nvcr.io/nvidia/nvcf-byoc/paths directly. - Self-hosted users should replace
<your-registry>/<your-repo>with their mirrored registry path (see the Artifact Manifest in the self-hosted installation guide for source image paths).
Replace <your-registry>/<your-repo> with your registry path. BYOC users pulling
directly from NGC can use nvcr.io/nvidia/nvcf-byoc as the registry/repo path.
If your cluster does not use node labels, set nodeSelector to null.
Configuration Sections
Webhook Configuration
The GXCache webhook is responsible for intercepting and modifying pod specifications to enable shader caching:
Service Configuration
The GXCache service provides the shader cache functionality:
Node Selection
GXCache components are scheduled on nodes with appropriate labels to ensure they run on GPU compute nodes. Adjust the node selector based on your cluster’s node labeling scheme.
Key Management
GXCache uses NVIDIA’s Key Notary Service (KNS) for secure key management. For self-hosted deployments use the in-cluster notary service:
Storage Configuration
GXCache requires persistent storage for caching shader data:
Metrics Configuration
GXCache supports Prometheus metrics for monitoring:
Architecture
GXCache Architecture
GXCache consists of two main components:
- GXCache Service: Provides the shader cache storage and retrieval functionality
- GXCache Webhook: Intercepts pod creation to inject shader cache configuration
Data Flow
- Pod Creation: Application pod is created with GPU requirements
- Webhook Interception: GXCache webhook intercepts pod creation
- Configuration Injection: Webhook injects shader cache configuration
- Shader Compilation: Application compiles shaders during runtime
- Cache Storage: Compiled shaders are stored in GXCache
- Cache Retrieval: Subsequent requests retrieve cached shaders
Performance Considerations
Storage Performance
For optimal shader cache performance:
- Storage Type: Use high-performance SSD storage (gp3, io1, io2)
- Storage Size: Size based on expected shader cache usage (20GB minimum)
- Access Mode: ReadWriteOnce is sufficient for most deployments
Network Configuration
GXCache should be deployed to minimize network latency:
- Deploy in the same availability zone as GPU nodes
- Use high-bandwidth network connections
- Consider dedicated network interfaces for cache traffic
Cache Size Planning
Plan cache size based on your workload:
- Small workloads: 20-50GB
- Medium workloads: 50-200GB
- Large workloads: 200GB+
Monitoring and Observability
Metrics
GXCache provides several key metrics:
- Cache hit ratio: Percentage of shader requests served from cache
- Cache size: Current size of cached shader data
- Request latency: Time to serve cached shaders
- Storage utilization: Disk space usage
Prometheus Integration
To enable Prometheus metrics:
-
Install Prometheus Operator (if not already installed):
-
Enable Metrics in GXCache:
-
Access Prometheus:
Logging
GXCache logs include:
- Cache hit/miss events
- Shader compilation requests
- Error conditions and troubleshooting information
- Performance metrics
Troubleshooting
Common Issues
Webhook Not Working
- Verify webhook is running and healthy
- Check webhook configuration and secrets
- Ensure proper RBAC permissions
Cache Not Storing Shaders
- Verify GXCache service is running
- Check storage configuration and persistent volumes
- Review application logs for shader compilation errors
Low Cache Hit Ratio
- Review cache size configuration
- Check cache eviction policies
- Monitor storage performance
Storage Issues
- Verify storage class availability
- Check persistent volume claims
- Monitor disk space usage
Best Practices
Deployment
- Deploy GXCache before deploying GPU workloads
- Use high-performance storage for better cache performance
- Monitor cache performance and adjust configuration as needed
Configuration
- Size cache appropriately for your shader workload
- Use high-performance storage for better performance
- Enable metrics for monitoring and optimization
Security
- Use proper RBAC permissions for webhook and service
- Secure communication between components
- Regularly update GXCache images for security patches
Maintenance
- Monitor cache hit ratios and adjust cache size as needed
- Regularly review and clean up unused cached shaders
- Update GXCache images regularly for security patches
- Monitor storage usage and plan for growth