> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nvcf/llms.txt.
> For full documentation content, see https://docs.nvidia.com/nvcf/llms-full.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.nvidia.com/nvcf/_mcp/server.

# gRPC Function Invocation

gRPC invocation executes requests against Cloud Functions functions that expose a
gRPC service. gRPC functions use the gRPC proxy instead of the HTTP invocation
route.

In self-hosted deployments, the gRPC route is exposed on the Gateway TCP
listener. See [Gateway Routing](/nvcf/dev/gateway-routing) for listener and DNS
configuration.

## Invocation Path

![gRPC invocation path](https://files.buildwithfern.com/nvidia-nvcf.docs.buildwithfern.com/nvcf/ff5ddc1c107d45608e711e4267a26e10c41f547f508399a2855cf3697df18010/_dot_dot_/docs/user/images/nvcf-grpc-invocation-path.svg)

```bash
export GRPC_GATEWAY_ADDR=<grpc-gateway-address>
export FUNCTION_ID=<function-id>
export FUNCTION_VERSION_ID=<function-version-id>
export API_KEY=<api-key>
```

### Multi-Cluster View

In a global deployment, DNS selects a regional public gRPC endpoint. Each region
keeps its own gRPC Proxy, NVCF API and NATS stateful request path, worker CONNECT
registration, and customer gRPC service placement. The cross-cluster line shows
NATS chatter for regional stateful request-path coordination when configured.

![gRPC multi-cluster invocation path](https://files.buildwithfern.com/nvidia-nvcf.docs.buildwithfern.com/nvcf/e3961e719ae02993e974abcc2976daae7a84efa15585aa2ec2fc6da50787e76a/_dot_dot_/docs/user/images/nvcf-grpc-multicluster-invocation.svg)

## Metadata

Set these gRPC metadata values when invoking a function:

| Metadata key | Required | Description |
| --- | --- | --- |
| `authorization` | Yes | API key, formatted as `Bearer <api-key>`. You can also use gRPC call credentials. |
| `function-id` | Yes | Function ID to invoke. |
| `function-version-id` | No | Function version ID to target. |

The data sent to your gRPC function is defined by the Protobuf messages your
function implements. gRPC functions do not have an input request size limit.

gRPC connections stay alive for 30 seconds when idle. Close the gRPC client
connection after your client is finished so function workers are not held longer
than needed.

## Python Example

This example uses a plaintext local or test gateway on port `10081`. For a
production TLS endpoint, use `grpc.secure_channel("grpc.<domain>:443",
grpc.ssl_channel_credentials())`.

```python
import os
import grpc

import grpc_service_pb2_grpc


def call_grpc(model_infer_request) -> None:
    channel = grpc.insecure_channel(f"{os.environ['GRPC_GATEWAY_ADDR']}:10081")
    grpc_client = grpc_service_pb2_grpc.GRPCInferenceServiceStub(channel)

    metadata = [
        ("function-id", os.environ["FUNCTION_ID"]),
        ("function-version-id", os.environ["FUNCTION_VERSION_ID"]),
        ("authorization", f"Bearer {os.environ['API_KEY']}"),
    ]

    infer = grpc_client.ModelInfer(model_infer_request, metadata=metadata)
    _ = infer

    channel.close()
```

<Note>
The official gRPC term for authorization handling is
[call credentials](https://grpc.io/docs/guides/auth/#credential-types). The
example above sets the `authorization` metadata directly for clarity.

</Note>

## Connection Reuse and Streaming

The gRPC proxy pins sessions to the TCP connection to support unmodified gRPC
clients that ignore cookie headers. This matters when an intermediary proxy for
streaming, such as Kit streaming or Low Latency Streaming (LLS), uses HTTP/2 and
reuses connections.

![Single-client flow](https://files.buildwithfern.com/nvidia-nvcf.docs.buildwithfern.com/nvcf/7b7db5ef2d471152e50d5c8cd977263f7bb8274a5b0fe14e8e8e47843a942f26/_dot_dot_/docs/user/images/grpc-single-client.png)

![Reconnect flow](https://files.buildwithfern.com/nvidia-nvcf.docs.buildwithfern.com/nvcf/da2fc8fa735b915d7ec585707820fe528a631b392f3145645b0e0fe869fc6046/_dot_dot_/docs/user/images/grpc-reconnect-flow.png)

<Warning>
Do not pre-allocate streaming sessions with `POST` plus `X-NVCF-ABSORB` when a
shared HTTP/2 client can reuse one TCP connection across multiple users or
flows. Two separate requests sent over the same connection can receive the same
request ID from the proxy, which can bind different users or flows to the same
Kit pod.

Use on-demand binding through the WebSocket instead: establish the WebSocket,
obtain the request ID from the proxy, and use that ID for subsequent requests.

</Warning>

For requirements and a sample intermediary proxy implementation, see
[Intermediary Proxy](/nvcf/dev/streaming-functions#intermediary-proxy).