HTTP Load Testing

View as Markdown

(self-managed-http-load-test)=

Self-Managed NVCF HTTP Load Test

Prerequisites

Self-hosted CLI

You need a working nvcf-cli configured against your self-managed cluster. If you have not set this up yet, follow the {ref}self-hosted-cli guide to install the binary and the {ref}cli-configuration section to point it at your gateway.

Verify the CLI can reach the cluster before continuing:

$./nvcf-cli init
$./nvcf-cli api-key generate

Deploy the load test function

Use the load_tester_supreme container for load testing. It is purpose-built for high-throughput benchmarking and includes:

  • gRPC + HTTP + SSE endpoints in a single image
  • Tunable repeats, delay, and size fields to shape request/response profiles
  • Built-in OpenTelemetry tracing

The source, build instructions, and registry push examples are in the nv-cloud-function-helpers repository. Build and push the image to whichever container registry your cluster has credentials for:

$git clone https://github.com/NVIDIA/nv-cloud-function-helpers.git
$cd nv-cloud-function-helpers/examples/function_samples/load_tester_supreme
$
$# Build
$docker build --platform linux/amd64 -t load_tester_supreme .
$
$# Tag and push (replace with your registry -- NGC, ECR, etc.)
$docker tag load_tester_supreme nvcr.io/<your-org>/load_tester_supreme:latest
$docker push nvcr.io/<your-org>/load_tester_supreme:latest

:::{tip} To check which registries your cluster recognises, run ./nvcf-cli registry list. :::

Then create the function and deploy it using the CLI:

$# Create the function (HTTP)
$./nvcf-cli function create \
> --name "load-tester-supreme" \
> --image "nvcr.io/<your-org>/load_tester_supreme:latest" \
> --inference-url "/echo" \
> --inference-port 8000 \
> --health-uri "/health" \
> --health-port 8000 \
> --health-timeout PT30S
$
$# Deploy (adjust GPU type and instance type for your cluster)
$./nvcf-cli function deploy create \
> --gpu H100 \
> --instance-type NCP.GPU.H100_1x \
> --min-instances 1 \
> --max-instances 1 \
> --function-id <function id> \
> --version-id <version id>
$
$# Generate an API key for invocations
$./nvcf-cli api-key generate

Once deployed, note the following — you will need them for the run script:

  • Function ID — the UUID returned by function create
  • Function Version ID — the UUID of the specific deployed version
  • API key — from ./nvcf-cli api-key generate (begins with nvapi-)

Obtain the gateway address

Your gateway address is the external address of the Envoy Gateway deployed with the control plane. To retrieve it:

$export GATEWAY_ADDR=$(kubectl get gateway nvcf-gateway -n envoy-gateway \
> -o jsonpath='{.status.addresses[0].value}')
$echo "Gateway Address: $GATEWAY_ADDR"

On AWS EKS this is an ELB hostname (e.g. a1b2c3d4.us-east-1.elb.amazonaws.com). For a local deployment (Kind, k3d, Docker Desktop) it is typically localhost or 127.0.0.1.

Clone the load test scripts

$git clone https://github.com/NVIDIA/nv-cloud-function-helpers.git
$cd nv-cloud-function-helpers/examples/load-tests

Install k6

Install k6 if you don’t have it:

$# macOS
$brew install k6
$# Linux (Debian/Ubuntu)
$sudo gpg -k
$sudo gpg --no-default-keyring --keyring /usr/share/keyrings/k6-archive-keyring.gpg \
> --keyserver hkp://keyserver.ubuntu.com:80 \
> --recv-keys C5AD17C747E3415A3642D57D77C6C491D6AC1D69
$echo "deb [signed-by=/usr/share/keyrings/k6-archive-keyring.gpg] https://dl.k6.io/deb stable main" \
> | sudo tee /etc/apt/sources.list.d/k6.list
$sudo apt-get update && sudo apt-get install k6

Create your run script

The run*.sh scripts are gitignored, so each user creates their own locally. Create run_http_self_managed_test.sh in the load-tests directory:

$#!/bin/bash
$
$set -e
$
$export GATEWAY_ADDR=<your-gateway-address>
$export TOKEN=<your-nvapi-key>
$
$export HTTP_SUPREME_NVCF_URL="http://${GATEWAY_ADDR}/v2/nvcf/pexec/functions/<your-function-id>"
$export INVOKE_HOST="invocation.${GATEWAY_ADDR}"
$export SENT_MESSAGE_SIZE=32
$export RESPONSE_COUNT=1
$
$k6 run functions/supreme_http_test.js \
> --vus 10 --duration 60s \
> -e TOKEN=${TOKEN} \
> -e HTTP_SUPREME_NVCF_URL=${HTTP_SUPREME_NVCF_URL} \
> -e INVOKE_HOST=${INVOKE_HOST} \
> -e SENT_MESSAGE_SIZE=${SENT_MESSAGE_SIZE} \
> -e RESPONSE_COUNT=${RESPONSE_COUNT}

Make it executable and run:

$chmod +x run_http_self_managed_test.sh
$./run_http_self_managed_test.sh

Tune the load

Virtual users (VUs)

Each VU simulates a single concurrent HTTP client, sending requests in a loop and holding the connection open while waiting for a response (long-polling). The number of VUs directly controls the concurrency hitting your endpoint.

VUsSimulates
1—5Smoke test — verify the endpoint works under minimal load
10—50Light load — a small team or service calling the function
100—500Moderate load — multiple services or a rollout with real traffic
1000+Stress test — find the breaking point or max throughput

Fixed VUs for a set duration (simplest approach):

$# 10 concurrent users for 1 minute
$k6 run functions/supreme_http_test.js --vus 10 --duration 60s ...
$
$# 200 concurrent users for 10 minutes
$k6 run functions/supreme_http_test.js --vus 200 --duration 10m ...

Ramping VUs with a config file (recommended for real load tests):

Example k6_rampup_config.json:

1{
2 "cloud": {
3 "projectID": 3695020
4 },
5 "scenarios": {
6 "rampup_scenario": {
7 "executor": "ramping-vus",
8 "startVUs": 0,
9 "gracefulRampDown": "30s",
10 "gracefulStop": "30s",
11 "stages": [
12 { "duration": "1m", "target": 5 },
13 { "duration": "2m", "target": 5 },
14 { "duration": "1m", "target": 25 },
15 { "duration": "2m", "target": 25 },
16 { "duration": "1m", "target": 100 },
17 { "duration": "2m", "target": 100 },
18 { "duration": "1m", "target": 500 },
19 { "duration": "2m", "target": 500 },
20 { "duration": "1m", "target": 1000 },
21 { "duration": "2m", "target": 1000 },
22 { "duration": "1m", "target": 0 }
23 ]
24 }
25 }
26}
$k6 run functions/supreme_http_test.js \
> --config k6_rampup_config.json \
> -e TOKEN=${TOKEN} \
> -e HTTP_SUPREME_NVCF_URL=${HTTP_SUPREME_NVCF_URL} \
> -e INVOKE_HOST=${INVOKE_HOST} \
> -e SENT_MESSAGE_SIZE=${SENT_MESSAGE_SIZE} \
> -e RESPONSE_COUNT=${RESPONSE_COUNT}

Environment variables reference

VariablePurpose
TOKENYour nvapi-* bearer token from ./nvcf-cli api-key generate
HTTP_SUPREME_NVCF_URLHTTP URL: http://<gateway-addr>/v2/nvcf/pexec/functions/<function-id>
INVOKE_HOSTINVOKE_HOST : invocation.<gateway-addr>
SENT_MESSAGE_SIZESize of the test payload in bytes
RESPONSE_COUNTNumber of responses the server should return

Verifying your endpoint manually

Then verify the endpoint works with curl:

$curl -v -X POST \
> http://$GATEWAY_ADDR/v2/nvcf/pexec/functions/<your-function-id> \
> -H "Content-Type: application/json" \
> -H "Authorization: Bearer $TOKEN" \
> -H "Host: invocation.$GATEWAY_ADDR" \
> -H "Nvcf-Poll-Seconds: 5" \
> -d '{"message": "hello", "repeats": 1}'

You should receive a 200 OK response with the Nvcf-Status: fulfilled header.