Generic HTTP Function Invocation
HTTP invocation executes an inference request against a deployed Cloud Functions function through the invocation service. Use this page for standard HTTP request and response workloads, multipart or binary-style payloads, and HTTP streaming with Server-Sent Events (SSE).
If you invoke a function without specifying a function version ID, and multiple versions are deployed, Cloud Functions can route the request to any deployed version for that function.
For gRPC functions, see gRPC Function Invocation.
For HTTP examples on this page, see HTTP Invocation and HTTP Streaming.
HTTP Invocation
HTTP invocation uses the invocation route exposed by your gateway. In self-hosted
deployments, requests usually go to the gateway load balancer and use the Host
header for routing:
Invoke a function endpoint by using the function ID as the wildcard subdomain in
the Host header:
With production DNS and TLS, the same request can use the DNS hostname directly:
Function routes preserve endpoint paths and query parameters:
You can also send multipart or binary-style payloads to a custom function endpoint:
Cloud Functions uses HTTP/2 persistent connections. For best performance, keep client connections open until the client no longer needs to communicate with the server.
Size multipart and binary requests against your gateway, load balancer, and function container limits.
HTTP Streaming
HTTP streaming lets a function return an event stream to the client. The client
uses the same invocation endpoint and sends Accept: text/event-stream.
Prerequisites
- A deployed Cloud Functions function.
- A function endpoint that can return
Content-Type: text/event-stream. - Familiarity with Server-Sent Events (SSE).
Client Request
The client initiates streaming by making a request with
Accept: text/event-stream:
If the inference container response includes Content-Type: text/event-stream,
Cloud Functions keeps the client connection open and forwards events from the
container response.
The worker reads events from the inference container for up to the global request timeout, or until the inference container closes the connection. Do not create an infinite event stream. If the client disconnects, the worker eventually times out and closes the request.
Streaming reduces latency by sending data as it becomes available, avoids polling, and lets the inference container decide whether a given response should be streamed.
For the streaming request sample on this page, see Client Request.
Statuses and Errors
Direct HTTP invocation returns the status, headers, and body produced by your inference container. If your container returns an error, clients receive the container status code and response body.
For consistent client handling, return JSON from your inference container and
set Content-Type: application/json. Example:
Cloud Functions adds invocation headers such as nvcf-reqid and nvcf-status
when a request is accepted by the invocation service. Platform-generated errors
can still use the platform error response format. For platform API behavior, see
API.
Emit logs from your inference container so invocation failures can be diagnosed. See Observability and Troubleshooting for logging and debugging guidance.