Self-Managed NVCF gRPC Load Test
Self-Managed NVCF gRPC Load Test
Prerequisites
Self-hosted CLI
You need a working nvcf-cli configured against your self-managed cluster.
If you have not set this up yet, follow the self-hosted-cli guide to
install the binary and the cli-configuration section to point it at your
gateway.
Verify the CLI can reach the cluster before continuing:
Deploy the load test function
Use the load_tester_supreme container for load testing. It is purpose-built for high-throughput benchmarking and includes:
- gRPC + HTTP + SSE endpoints in a single image
- 500 gRPC worker threads by default (configurable via
WORKER_COUNT), compared to 10 in the simplergrpc_echo_sample - Tunable
repeats,delay, andsizefields to shape request/response profiles - Built-in OpenTelemetry tracing
The grpc_echo_sample from the same repository shares the
Echo/EchoMessage proto and will work for a quick smoke test, but its
10-thread pool will saturate under moderate concurrency. Always use
load_tester_supreme for real load tests.
The source, build instructions, and registry push examples are in the nv-cloud-function-helpers repository. Build and push the image to whichever container registry your cluster has credentials for:
To check which registries your cluster recognises, run
./nvcf-cli registry list.
Then create the function and deploy it using the CLI:
Once deployed, note the following — you will need them for the run script:
- Function ID — the UUID returned by
function create - Function Version ID — the UUID of the specific deployed version
- gRPC endpoint — your gateway address on port
10081(see below) - API key — the key from
api-key generate(begins withnvapi-)
Your gateway address is the external address of the Envoy Gateway deployed with the control plane. To retrieve it:
On AWS EKS this is an ELB hostname (e.g.
a1b2c3d4.us-east-1.elb.amazonaws.com). For a local deployment (Kind,
k3d, Docker Desktop) it is typically localhost or 127.0.0.1.
The CLI saves the function and version IDs automatically. Run
./nvcf-cli status to view them at any time.
Clone the load test scripts
Install k6
Install k6 if you don’t have it:
Create your run script
The run*.sh scripts are gitignored, so each user creates their own locally.
Create run_grpc_self_managed_test.sh in the load-tests directory:
Make it executable and run:
Tune the load
Virtual users (VUs)
Each VU simulates a single concurrent client holding an open gRPC connection and sending requests in a loop. The number of VUs directly controls the concurrency hitting your endpoint.
Default control plane sizing: The default resource sizing that ships
with nvcf-base is designed to handle roughly 100 concurrent users. If you
need to test beyond that, you will need to scale the control plane components
first. Starting with --vus 100 or the scratch config is a good baseline
for validating a default self-managed deployment.
Start low and increase gradually. If you see rising error rates or latency, you have found the saturation point.
Fixed VUs for a set duration (simplest approach):
Ramping VUs with a config file (recommended for real load tests):
Config files let you gradually ramp users up, hold steady, and ramp down. This avoids slamming the endpoint all at once and gives more realistic results.
The k6_long_scaling_test_config.json ramps to 100 VUs over 5 minutes, holds
for 15 minutes, then steps through higher concurrency levels:
The k6_hammer_test_config.json uses a ramping-arrival-rate executor that
ramps up to 100,000 requests/second — use this only when you want to push the
endpoint to its limit:
Other tuning parameters
Environment variables reference
Verifying your endpoint manually
Before running a load test, you can verify the endpoint works with grpcurl: