For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
    • Overview
    • Quickstart
  • Before You Deploy
    • Infrastructure Sizing
    • Manifest
  • Deployment
    • Installation Overview
    • Image Mirroring
    • Helmfile Installation
  • GPU Cluster Setup
    • GPU Cluster Setup
    • Self-Managed Clusters
  • Configuration
    • Optional Enhancements
    • LLM Function Enablement
    • Gateway Routing
    • Third-Party Registries
    • Registry Allowlist
    • Cluster Configuration
    • KAI Scheduler
  • Using Cloud Functions
    • API
    • Service Keys
    • Function Creation
    • LLM Gateway
    • Generic HTTP Function Invocation
    • gRPC Function Invocation
    • Container Functions
    • Helm Functions
    • Streaming Functions
    • Configure Autoscaling
    • CLI
  • Function Autoscaling
    • Function Autoscaling Overview
    • Architecture
    • Operations
    • Observability
  • Observability
    • Observability
    • Example Dashboards
  • Operations
    • Control Plane Operations
    • Cluster Monitoring
    • Troubleshooting
  • Runbooks
    • Runbooks
    • Key Rotation
  • Reference
    • Cluster Reference
    • gRPC Load Testing
    • gRPC Load Test SLI Guide
    • HTTP Load Testing
    • HTTP Load Test SLI Guide
    • HTTP Soak Testing
  • Development
    • Architecture Overview
    • Fake GPU Operator
    • Release Process
  • Managed (Legacy)
    • Function Lifecycle
    • Observability
NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Your Privacy Choices | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.

LogoLogoCloud Functions
On this page
  • Invocation Path
  • Multi-Cluster View
  • Metadata
  • Python Example
  • Connection Reuse and Streaming
Using Cloud Functions

gRPC Function Invocation

||View as Markdown|
Previous

Generic HTTP Function Invocation

Next

Container Functions

gRPC invocation executes requests against Cloud Functions functions that expose a gRPC service. gRPC functions use the gRPC proxy instead of the HTTP invocation route.

In self-hosted deployments, the gRPC route is exposed on the Gateway TCP listener. See Gateway Routing for listener and DNS configuration.

Invocation Path

gRPC invocation path

$export GRPC_GATEWAY_ADDR=<grpc-gateway-address>
$export FUNCTION_ID=<function-id>
$export FUNCTION_VERSION_ID=<function-version-id>
$export API_KEY=<api-key>

Multi-Cluster View

In a global deployment, DNS selects a regional public gRPC endpoint. Each region keeps its own gRPC Proxy, NVCF API and NATS stateful request path, worker CONNECT registration, and customer gRPC service placement. The cross-cluster line shows NATS chatter for regional stateful request-path coordination when configured.

gRPC multi-cluster invocation path

Metadata

Set these gRPC metadata values when invoking a function:

Metadata keyRequiredDescription
authorizationYesAPI key, formatted as Bearer <api-key>. You can also use gRPC call credentials.
function-idYesFunction ID to invoke.
function-version-idNoFunction version ID to target.

The data sent to your gRPC function is defined by the Protobuf messages your function implements. gRPC functions do not have an input request size limit.

gRPC connections stay alive for 30 seconds when idle. Close the gRPC client connection after your client is finished so function workers are not held longer than needed.

Python Example

This example uses a plaintext local or test gateway on port 10081. For a production TLS endpoint, use grpc.secure_channel("grpc.<domain>:443", grpc.ssl_channel_credentials()).

1import os
2import grpc
3
4import grpc_service_pb2_grpc
5
6
7def call_grpc(model_infer_request) -> None:
8 channel = grpc.insecure_channel(f"{os.environ['GRPC_GATEWAY_ADDR']}:10081")
9 grpc_client = grpc_service_pb2_grpc.GRPCInferenceServiceStub(channel)
10
11 metadata = [
12 ("function-id", os.environ["FUNCTION_ID"]),
13 ("function-version-id", os.environ["FUNCTION_VERSION_ID"]),
14 ("authorization", f"Bearer {os.environ['API_KEY']}"),
15 ]
16
17 infer = grpc_client.ModelInfer(model_infer_request, metadata=metadata)
18 _ = infer
19
20 channel.close()

The official gRPC term for authorization handling is call credentials. The example above sets the authorization metadata directly for clarity.

Connection Reuse and Streaming

The gRPC proxy pins sessions to the TCP connection to support unmodified gRPC clients that ignore cookie headers. This matters when an intermediary proxy for streaming, such as Kit streaming or Low Latency Streaming (LLS), uses HTTP/2 and reuses connections.

Single-client flow

Reconnect flow

Do not pre-allocate streaming sessions with POST plus X-NVCF-ABSORB when a shared HTTP/2 client can reuse one TCP connection across multiple users or flows. Two separate requests sent over the same connection can receive the same request ID from the proxy, which can bind different users or flows to the same Kit pod.

Use on-demand binding through the WebSocket instead: establish the WebSocket, obtain the request ID from the proxy, and use that ID for subsequent requests.

For requirements and a sample intermediary proxy implementation, see Intermediary Proxy.