LLM Function Enablement | NVIDIA Cloud Functions

Enable the LLM addon before creating or invoking functions with functionType: "LLM" through the LLM invocation route. The addon deploys the LLM API Gateway and LLM request router, creates the external LLM invocation route, and configures worker pods to use the stargate-client sidecar for model-aware routing.

For LLM function payload shape and invocation examples, see Function Creation and LLM Gateway.

When to Enable

Enable the LLM addon when NVCF should route OpenAI-compatible requests by function and model through llm.invocation.<domain>. The gateway extracts the function ID from the OpenAI model field, applies LLM-specific validation and rate limits, and sends the request through the LLM request router.

Standard HTTP, gRPC, and LLS functions do not require this addon, even when a container exposes paths such as /v1/chat/completions, /v1/responses, or /v1/embeddings.

When enabled, the stack creates:

llm-api-gateway in the nvcf namespace.
llm-request-router in the nvcf namespace.
The llm.invocation.<domain> HTTPRoute when Gateway API ingress is enabled.
LLM worker pods with a stargate-client sidecar that forwards requests to the function container on the configured inferencePort.

Helmfile Configuration

Add the LLM addon block to your Helmfile environment file before applying the stack:

1 addons:
2   llm:
3     enabled: true
4     gateway:
5       replicaCount: 3
6     requestRouter:
7       replicaCount: 3

Use replicaCount: 1 for local or single-node test clusters. Use multiple replicas for shared or production environments.

If you mirror images to a registry that does not use the stack’s default global.image.registry and global.image.repository, override the stargate-client sidecar image passed to generated LLM workers:

1 api:
2   env:
3     NVCF_SIDECARS_LLM_ROUTER_CLIENT_IMAGE: <registry>/<repository>/stargate-client:0.2.0

The LLM API Gateway and request router images are resolved from the same stack artifact registry settings as the other control plane services.

Local Plaintext Transport

Local development clusters commonly run the API gRPC endpoint and worker router tunnel without TLS. In that case, add both plaintext controls:

1 addons:
2   llm:
3     enabled: true
4     gateway:
5       replicaCount: 1
6       auth:
7         grpcInsecure: true
8     requestRouter:
9       replicaCount: 1
10 
11 agentConfig:
12   mergeConfig: |
13     workload:
14       stargateQUICInsecure: true

addons.llm.gateway.auth.grpcInsecure: true configures the LLM API Gateway to talk to the local NVCF API over plaintext gRPC.

workload.stargateQUICInsecure: true configures generated LLM workers to pass the insecure local QUIC setting to stargate-client.

Use these insecure settings only for local or isolated test clusters. Production environments should use TLS-capable service configuration instead.

Apply and Verify

Apply the updated control plane environment before creating LLM functions:

$ make apply HELMFILE_ENV=<environment-name>

Apply or refresh the worker layer for each registered GPU cluster so the NVCA operator receives agentConfig.mergeConfig:

$ make install-nvca-operator HELMFILE_ENV=<environment-name> CLUSTER_NAME=<cluster-name>

Existing LLM function pods keep their current sidecar arguments. Recreate or redeploy those functions after refreshing the worker layer so new pods get the updated worker transport settings.

Verify the LLM control plane components:

$ kubectl get deploy -n nvcf llm-api-gateway llm-request-router
$ kubectl get pods -n nvcf | grep -E 'llm-api-gateway|llm-request-router'
$ kubectl get httproute -A | grep llm

After deploying an LLM function, verify the worker sidecar:

$ kubectl get pods -n nvcf-backend -L FUNCTION_ID
$ kubectl -n nvcf-backend get pod <function-pod> \
>   -o jsonpath='{range .spec.containers[*]}{.name}{"\t"}{.image}{"\n"}{end}'
$ kubectl -n nvcf-backend get pod <function-pod> \
>   -o jsonpath='{range .spec.containers[?(@.name=="llm-worker")].args[*]}{.}{"\n"}{end}'

The function pod should include an llm-worker container using stargate-client. For local plaintext clusters, the llm-worker args should include --quic-insecure.

Troubleshooting

404 no_eligible_candidates from llm.invocation.<domain> means the request reached the LLM Gateway, but the requested function or model was unknown or was not registered on the selected request router. Similar 503 candidate errors mean the router knows the target but has no active eligible backend. Check:

The LLM function is deployed and its pod is Running.
The request model value uses <function-id>/<model-name>.
The function’s models[].name matches the model suffix in the request.
models[].llmConfig.uris includes the invoked path.
The llm-worker sidecar connected to llm-request-router.
Local clusters using plaintext transport include both grpcInsecure and stargateQUICInsecure.

Useful logs:

$ kubectl logs -n nvcf deploy/llm-api-gateway --tail=100
$ kubectl logs -n nvcf deploy/llm-request-router --tail=100
$ kubectl logs -n nvcf-backend <function-pod> -c llm-worker --tail=100

In healthy routing, the request router logs show a reverse tunnel connection from the worker and at least one routing candidate for the requested function.