For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
    • Overview
    • Quickstart
  • Before You Deploy
    • Infrastructure Sizing
    • Manifest
  • Deployment
    • Installation Overview
    • Image Mirroring
    • Helmfile Installation
  • GPU Cluster Setup
    • GPU Cluster Setup
    • Self-Managed Clusters
  • Configuration
    • Optional Enhancements
    • LLM Function Enablement
    • Gateway Routing
    • Third-Party Registries
    • Registry Allowlist
    • Cluster Configuration
    • KAI Scheduler
  • Using Cloud Functions
    • API
    • Service Keys
    • Function Creation
    • LLM Gateway
    • Generic HTTP Function Invocation
    • gRPC Function Invocation
    • Container Functions
    • Helm Functions
    • Streaming Functions
    • Configure Autoscaling
    • CLI
  • Function Autoscaling
    • Function Autoscaling Overview
    • Architecture
    • Operations
    • Observability
  • Observability
    • Observability
    • Example Dashboards
  • Operations
    • Control Plane Operations
    • Cluster Monitoring
    • Troubleshooting
  • Runbooks
    • Runbooks
    • Key Rotation
  • Reference
    • Cluster Reference
    • gRPC Load Testing
    • gRPC Load Test SLI Guide
    • HTTP Load Testing
    • HTTP Load Test SLI Guide
    • HTTP Soak Testing
  • Development
    • Architecture Overview
    • Fake GPU Operator
    • Release Process
  • Managed (Legacy)
    • Function Lifecycle
    • Observability
NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Your Privacy Choices | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.

LogoLogoCloud Functions
On this page
  • Invocation Types
  • Best Practices
  • Container Versioning
  • Security
  • Available Container Variables
  • Environment Variables
Using Cloud Functions

Function Creation

||View as Markdown|
Previous

Service Keys

Next

LLM Gateway

This page describes how to create functions within Cloud Functions.

Functions can be created in one of two ways:

  1. Custom Container

    • Enables any container-based workload as long as the container exposes an inference endpoint and a health check.
    • Option to leverage any server, ex. PyTriton, FastAPI, Triton.
    • See Container-Based Function Creation.
  2. Helm Chart

    • Enables orchestration across multiple containers. For complex use cases where a single container isn’t flexible enough.
    • Requires one “mini-service” container defined as the inference entry point for the function.
    • Does not support partial response reporting, gRPC or HTTP streaming-based invocation.
    • See Helm-Based Function Creation.

Additionally, Cloud Functions supports Low Latency Streaming (LLS) functions for video, audio, and data streaming via WebRTC.

For LLM functions, see LLM Gateway for OpenAI-compatible model route configuration.

Invocation Types

  • Generic HTTP function invocation: Invoke HTTP functions through the standard invocation route.
  • gRPC function invocation: Invoke gRPC functions through the Gateway TCP listener.
  • LLM invocation: Invoke OpenAI-compatible LLM functions through llm.invocation.<domain>.
  • LLS/WebRTC client connection: Connect browser or proxy clients to streaming functions.

Best Practices

Container Versioning

  • Ensure that any resources that you tag for deployment into production environments are not simply using “latest” and are following a standard version control convention.

    • During autoscaling, a function scaling any additional instances will pull the same specificed container image and version. If version is set to “latest”, and the “latest” container image is updated between instance scaling, this can lead to undefined behavior.
  • Function versions created are immutable, this means that the container image and version cannot be updated for a function without creating a new version of the function.

Security

  • Do not run containers as root user: Running containers as root is not supported in Cloud Functions. Always specify a non-root user in your Dockerfile using the USER instruction.
  • Use Kubernetes Secrets: For sensitive information like API keys, credentials, or tokens, use Kubernetes Secrets instead of environment variables. This provides better security and follows Kubernetes best practices for secret management.

Available Container Variables

The following is a reference of available variables via the headers of the invocation message (auto-populated by Cloud Functions), accessible within the container.

For examples of how to extract and use some of these variables, see NVCF Container Helper Functions.

NameDescription
NVCF-REQIDRequest ID for this request.
NVCF-SUBMessage subject.
NVCF-NCAIDFunction’s organization’s NCA ID.
NVCF-FUNCTION-NAMEFunction name.
NVCF-FUNCTION-IDFunction ID.
NVCF-FUNCTION-VERSION-IDFunction version ID.
NVCF-LARGE-OUTPUT-DIRLarge output directory path.
NVCF-MAX-RESPONSE-SIZE-BYTESMax response size in bytes for the function.
NVCF-NSPECTIDNVIDIA reserved variable.
NVCF-BACKENDBackend or “Cluster Group” the function is deployed on.
NVCF-INSTANCETYPEInstance type the function is deployed on.
NVCF-REGIONRegion or zone the function is deployed in.
NVCF-ENVSpot environment if deployed on spot instances.

Environment Variables

The following environment variables are automatically injected into your function containers when they are deployed and can be accessed using standard environment variable access methods in your application code:

NameDescription
NVCF_BACKENDBackend or “Cluster Group” the function is deployed on.
NVCF_ENVSpot environment if deployed on spot instances.
NVCF_FUNCTION_IDFunction ID.
NVCF_FUNCTION_NAMEFunction name.
NVCF_FUNCTION_VERSION_IDFunction version ID.
NVCF_INSTANCETYPEInstance type the function is deployed on.
NVCF_NCA_IDFunction’s organization’s NCA ID.
NVCF_REGIONRegion or zone the function is deployed in.

All environment variables with the NVCF_* prefix are reserved and should not be overridden in your application code or function configuration.