HTTP Load Testing
(self-managed-http-load-test)=
Self-Managed NVCF HTTP Load Test
Prerequisites
Self-hosted CLI
You need a working nvcf-cli configured against your self-managed cluster.
If you have not set this up yet, follow the {ref}self-hosted-cli guide to
install the binary and the {ref}cli-configuration section to point it at your
gateway.
Verify the CLI can reach the cluster before continuing:
Deploy the load test function
Use the load_tester_supreme container for load testing. It is purpose-built for high-throughput benchmarking and includes:
- gRPC + HTTP + SSE endpoints in a single image
- Tunable
repeats,delay, andsizefields to shape request/response profiles - Built-in OpenTelemetry tracing
The source, build instructions, and registry push examples are in the nv-cloud-function-helpers repository. Build and push the image to whichever container registry your cluster has credentials for:
:::{tip}
To check which registries your cluster recognises, run
./nvcf-cli registry list.
:::
Then create the function and deploy it using the CLI:
Once deployed, note the following — you will need them for the run script:
- Function ID — the UUID returned by
function create - Function Version ID — the UUID of the specific deployed version
- API key — from
./nvcf-cli api-key generate(begins withnvapi-)
Obtain the gateway address
Your gateway address is the external address of the Envoy Gateway deployed with the control plane. To retrieve it:
On AWS EKS this is an ELB hostname (e.g.
a1b2c3d4.us-east-1.elb.amazonaws.com). For a local deployment (Kind,
k3d, Docker Desktop) it is typically localhost or 127.0.0.1.
Clone the load test scripts
Install k6
Install k6 if you don’t have it:
Create your run script
The run*.sh scripts are gitignored, so each user creates their own locally.
Create run_http_self_managed_test.sh in the load-tests directory:
Make it executable and run:
Tune the load
Virtual users (VUs)
Each VU simulates a single concurrent HTTP client, sending requests in a loop and holding the connection open while waiting for a response (long-polling). The number of VUs directly controls the concurrency hitting your endpoint.
Fixed VUs for a set duration (simplest approach):
Ramping VUs with a config file (recommended for real load tests):
Example k6_rampup_config.json:
Environment variables reference
Verifying your endpoint manually
Then verify the endpoint works with curl:
You should receive a 200 OK response with the Nvcf-Status: fulfilled
header.