LLM Function Enablement
Enable the LLM addon before creating or invoking functions with
functionType: "LLM" through the LLM invocation route. The addon deploys the
LLM API Gateway and LLM request router, creates the external LLM invocation
route, and configures worker pods to use the stargate-client sidecar for
model-aware routing.
For LLM function payload shape and invocation examples, see Function Creation and LLM Gateway.
When to Enable
Enable the LLM addon when NVCF should route OpenAI-compatible requests by
function and model through llm.invocation.<domain>. The gateway extracts the
function ID from the OpenAI model field, applies LLM-specific validation and
rate limits, and sends the request through the LLM request router.
Standard HTTP, gRPC, and LLS functions do not require this addon, even when a
container exposes paths such as /v1/chat/completions, /v1/responses, or
/v1/embeddings.
When enabled, the stack creates:
llm-api-gatewayin thenvcfnamespace.llm-request-routerin thenvcfnamespace.- The
llm.invocation.<domain>HTTPRoute when Gateway API ingress is enabled. - LLM worker pods with a
stargate-clientsidecar that forwards requests to the function container on the configuredinferencePort.
Helmfile Configuration
Add the LLM addon block to your Helmfile environment file before applying the stack:
Use replicaCount: 1 for local or single-node test clusters. Use multiple
replicas for shared or production environments.
If you mirror images to a registry that does not use the stack’s default
global.image.registry and global.image.repository, override the
stargate-client sidecar image passed to generated LLM workers:
The LLM API Gateway and request router images are resolved from the same stack artifact registry settings as the other control plane services.
Local Plaintext Transport
Local development clusters commonly run the API gRPC endpoint and worker router tunnel without TLS. In that case, add both plaintext controls:
addons.llm.gateway.auth.grpcInsecure: true configures the LLM API Gateway to
talk to the local NVCF API over plaintext gRPC.
workload.stargateQUICInsecure: true configures generated LLM workers to pass
the insecure local QUIC setting to stargate-client.
Use these insecure settings only for local or isolated test clusters. Production environments should use TLS-capable service configuration instead.
Apply and Verify
Apply the updated control plane environment before creating LLM functions:
Apply or refresh the worker layer for each registered GPU cluster so the NVCA
operator receives agentConfig.mergeConfig:
Existing LLM function pods keep their current sidecar arguments. Recreate or redeploy those functions after refreshing the worker layer so new pods get the updated worker transport settings.
Verify the LLM control plane components:
After deploying an LLM function, verify the worker sidecar:
The function pod should include an llm-worker container using
stargate-client. For local plaintext clusters, the llm-worker args should
include --quic-insecure.
Troubleshooting
404 no_eligible_candidates from llm.invocation.<domain> means the request
reached the LLM Gateway, but the requested function or model was unknown or was
not registered on the selected request router. Similar 503 candidate errors
mean the router knows the target but has no active eligible backend. Check:
- The LLM function is deployed and its pod is
Running. - The request
modelvalue uses<function-id>/<model-name>. - The function’s
models[].namematches the model suffix in the request. models[].llmConfig.urisincludes the invoked path.- The
llm-workersidecar connected tollm-request-router. - Local clusters using plaintext transport include both
grpcInsecureandstargateQUICInsecure.
Useful logs:
In healthy routing, the request router logs show a reverse tunnel connection from the worker and at least one routing candidate for the requested function.