Self-hosted CLI
Self-hosted CLI
Self-hosted CLI
This page provides documentation for the NVCF Self-hosted CLI, a command-line interface for managing NVIDIA Cloud Functions in self-hosted deployments.
The NVCF Self-hosted CLI provides:
The CLI is available as a container image from NGC. See self-hosted-artifact-manifest for the full artifact path.
The CLI is available as a resource from NGC. See download-nvcf-cli for detailed download and extraction instructions.
The downloaded package includes:
nvcf-cli - The CLI binary.nvcf-cli.yaml.template - Configuration templateexamples/ - Sample configuration filesUSAGE-GUIDE.md - Detailed usage documentationThe CLI uses YAML configuration files. After extracting the CLI, copy the included template:
Configuration files are searched in this order:
--config flag (highest priority)./.nvcf-cli.yaml~/.nvcf-cli.yamlPlace your .nvcf-cli.yaml in the directory where you run the CLI for project-specific configuration, or in your home directory for global configuration.
For self-hosted deployments, the CLI must be configured to communicate with your gateway. The gateway uses hostname-based routing for HTTP services.
For a complete understanding of how the gateway routes traffic, including architecture diagrams, verification commands, and production DNS/HTTPS setup, see gateway-routing.
For one-click installs on a remote cluster, set up Gateway API ingress before
running self-hosted up. The command installs the control plane, then calls
the configured API, API Keys, invocation, and gRPC endpoints during health and
cluster registration phases.
Complete Gateway quickstart before you configure the CLI. The shared Gateway quickstart installs the Gateway API CRDs, creates and labels the required namespaces, installs Envoy Gateway, creates the GatewayClass and Gateway, waits for the Gateway to be programmed, and exports:
These are the same Gateway setup steps used by the one-click, Helmfile, and standalone install paths. Keep the exported values in your shell, then configure the CLI.
For test environments without production DNS, use the Gateway load balancer address as the stack domain:
For production environments, set STACK_DOMAIN to the DNS name that your
HTTPRoute hostnames use.
Create your configuration file:
Complete self-hosted configuration:
For test environments without production DNS, the URL fields and host fields can use the Gateway load balancer address:
After configuring the CLI, verify connectivity:
If you see a 404 error, verify:
api_keys_host value matches your HTTPRoute hostnamekubectl get pods -n api-keysWhy Host headers are needed: the Envoy Gateway uses hostname-based routing to
direct traffic to different backend services through a single load balancer.
Without the correct Host header, the gateway cannot match the request to a
route and returns 404.
gRPC does not need Host headers because it uses a dedicated TCP listener on port 10081. The gateway routes all traffic on that port directly to the gRPC service without hostname matching.
The Host header configuration above is designed for testing and development. For production deployments, configure proper DNS and TLS to eliminate the need for Host header overrides.
With proper DNS and HTTPS configured:
For complete instructions on setting up DNS records and TLS certificates, see production-dns-https in the Gateway Routing guide.
Use the --config flag to manage multiple environments with separate configuration files:
Each configuration maintains separate state files (e.g., ~/.nvcf-cli.dev.state for dev.yaml).
Enable debug mode for detailed logging by adding to your configuration file:
Or use the --debug flag or NVCF_DEBUG=true environment variable per-command.
For immediate testing, you can use load_tester_supreme from nvcf-onprem (see self-hosted-artifact-manifest), which supports the {"message": "hello world"} request body above. For more function samples, see the nv-cloud-function-helpers repository and function-creation for function creation documentation.
The CLI supports two types of authentication tokens:
Refresh your token while preserving function context:
Available scopes for API keys (all included by default):
Use these commands to install and inspect self-hosted NVCF deployments. For the full fresh-install walkthrough, see Quickstart.
For separate control-plane and GPU clusters, pass both kube contexts:
For a single cluster, omit both context flags.
All function create flags:
Example function JSON:
LLM functions use functionType: "LLM" and define model routing metadata under models[].llmConfig:
For LLM models, llmConfig.routingMethod accepts round_robin, power_of_two, groq_multiregion, pulsar, or random.
Supported LLM paths are /v1/chat/completions, /v1/responses, and /v1/embeddings.
llmConfig.tokenRateLimit accepts one or more comma-separated positive integer token limits in <value>-<unit> format. Supported units are S (seconds), M (minutes), H (hours), D (days), and W (weeks). Use 1000-S for a single limit, or 1000-S,5000-M,100000-H,500000-D,1000000-W for a combined limit with distinct units. Use JSON input for combined limits because inline CLI model specs use commas as field separators.
The function deploy command group manages deployments with the following subcommands:
Key function deploy create flags:
Example deployment JSON:
LLM model updates can also be provided in the input file:
Note: The CLI function invoke command detects LLM functions automatically.
For LLM functions, --model-name and --inference-url are required. The CLI uses the LLM invocation route and sets the OpenAI model value to <function-id>/<model-name>.
For LLM Gateway endpoint behavior, routing, and session stickiness details, see LLM Gateway.
For raw HTTP invocation, HTTP streaming, gRPC metadata, and invocation error behavior, see Generic HTTP Function Invocation and gRPC Function Invocation.
Additional function invoke flags:
Manage container registry credentials for function images and Helm charts. For comprehensive setup instructions including IAM configuration for AWS ECR, see third-party-registries-self-hosted.
401 Unauthorized on function creation:
403 Forbidden on invocation:
For additional troubleshooting, see self-hosted-troubleshooting.