Function Creation#

This page describes the steps to create a function within Cloud Functions.

Attention

Please ensure before function creation, you’ve installed and configured the NGC CLI for working with the NGC Private Registry.

Functions can be created in one of three ways, listed below, and also visible in the Cloud Functions UI.

../_images/elastic-nim-create-function-button.png
  1. Custom Container

  1. Helm Chart

  • Enables orchestration across multiple containers. For complex use cases where a single container isn’t flexible enough.

  • Requires one “mini-service” container defined as the inference entry point for the function.

  • Does not support partial response reporting, gRPC or HTTP streaming-based invocation.

  • See Helm-Based Function Creation.

Working with NGC Private Registry#

Function creation requires your model, container, helm chart and/or static resources to be hosted within NGC Private Registry as a prerequisite. Follow the steps below to optimally configure the NGC CLI to work with NGC Private Registry and Cloud Functions.

Warning

NGC Private Registry has size constraints on layers, images, models and resources.

Ensure that your uploaded resources conform to these constraints.

Generate an NGC Personal API Key#

Do this by navigating to the Personal Keys Page. For more details see Generate an NGC Personal API Key.

Note

It’s recommended that the API Key that you generate includes both Cloud Functions and Private Registry scopes to enable ideal Cloud Functions workflows.

Download & Configure the NGC CLI#

  1. Navigate to the NGC CLI Installer Page to download the CLI and follow the installation instructions for your platform.

  2. Find your NGC organization name within the NGC Organization Profile Page. This is not the Display Name. For example: qdrlnbkss123.

  3. Run ngc config set and input the Personal API Key generated in the previous step, along with your organization name. If prompted, default to no-team and no-ace.

1> ngc config set
2Enter API key [****bi9Z]. Choices: [<VALID_APIKEY>, 'no-apikey']: <api key>
3Enter CLI output format type [json]. Choices: ['ascii', 'csv', 'json']: json
4Enter org [ax3ysqem02xw]. Choices: ['$ORG_NAME']: <org name>
5Enter team [no-team]. Choices: ['no-team']:
6Enter ace [no-ace]. Choices: ['no-ace']:

Authenticate with NGC Docker Registry#

  1. Run docker login nvcr.io and input the following, note $oauthtoken is the actual string to input, and <api key> is the Personal API key generated in the first step.

1> docker login nvcr.io
2Username: $oauthtoken
3Password: $API_KEY

(Optional) Push a Container to the NGC Private Registry#

You should now be able to push a container to the NGC Private Registry. Optionally, validate this by pushing an example container from the samples repository:

  1. First clone and build the docker image.

1> git clone https://github.com/NVIDIA/nv-cloud-function-helpers.git
2> cd nv-cloud-function-helpers/examples/function_samples/fastapi_echo_sample
3> docker build . -t fastapi_echo_sample
  1. Now tag and push the docker image to the NGC Private Registry.

1> docker tag fastapi_echo_sample:latest nvcr.io/$ORG_NAME/fastapi_echo_sample:latest
2> docker push nvcr.io/$ORG_NAME/fastapi_echo_sample:latest

Warning

Note that any additional slashes in the path when tagging and pushing to nvcr.io will be detected by Private Registry as specifying a team. This is most likely not what you want.

  1. Once this finishes, you’ll now be able to see the new container in the NGC Private Registry Containers Page and it will be available for use in function creation.

Best Practices with NGC Docker Registry and Cloud Functions#

Container Versioning#

  • Ensure that any resources that you tag for deployment into production environments are not simply using “latest” and are following a standard version control convention.

    • During autoscaling, a function scaling any additional instances will pull the same specificed container image and version. If version is set to “latest”, and the “latest” container image is updated between instance scaling, this can lead to undefined behavior.

  • Function versions created are immutable, this means that the container image and version cannot be updated for a function without creating a new version of the function.

Usage of NGC Teams#

  • For easier handling of authorization and accessibility, we recommend pushing your containers, helm charts, models and resources to the root of your NGC organization (i.e. “No Team”), not to a team within the organization.

  • Note that any additional slashes in the path when tagging and pushing to nvcr.io will be detected as an NGC team.

Security#

  • Do not run containers as root user: Running containers as root is not supported in Cloud Functions. Always specify a non-root user in your Dockerfile using the USER instruction.

  • Use Kubernetes Secrets: For sensitive information like API keys, credentials, or tokens, use Secrets instead of environment variables. This provides better security and follows Kubernetes best practices for secret management.

Container-Based Function Creation#

Container-based functions require building and pushing a Cloud Functions compatible Docker container image to the NGC Private Registry.

Attention

Before proceeding, ensure that you have the NGC CLI installed and configured with an API Key that has the required scopes for Cloud Functions and Private Registry.

See Working with NGC Private Registry for instructions.

Resources#

  • Example containers can be found in the examples repository.

  • The repository also contains helper functions that are useful when authoring your container, including:

    • Helpers that parse Cloud Functions-specific parameters on invocation

    • Helpers that can be used to instrument your container with Cloud Functions compatible logs

  • After container creation, but before proceeding to deployment, it is strongly recommended to validate your container’s configuration locally, see Deployment Validation.

  • It’s always a best practice to emit logs from your inference container. See Observability Guide for how to add logs to your container. Cloud Functions also supports third-party logging and metrics emission from your container.

Attention

Please note that container functions should not run as root user, running as root is not formally supported on any Cloud Functions backend.

Container Endpoints#

Any server can be implemented within the container, as long as it implements the following:

  • For HTTP-based functions, a health check endpoint that returns a 200 HTTP Status Code on success.

  • For gRPC-based functions, a standard gRPC health check. See these docs for more info also gRPC Health Checking.

  • An inference endpoint (this endpoint will be called during function invocation)

These endpoints are expected to be served on the same port, defined as the inferencePort.

Warning

Cloud Functions reserves the following ports on your container for internal monitoring and metrics:

  • Port 8080

  • Port 8010

Cloud Functions also expects the following directories in the container to remain read-only for caching purposes:

  • /config/ directory

  • Nested directories created inside /config/

Composing a FastAPI Container#

It’s possible to use any container with Cloud Functions as long as it implements a server with the above endpoints. The below is an example of a FastAPI-based container compatible with Cloud Functions. Clone the FastAPI echo example.

Create the “requirements.txt” File#

requirements.txt#
1fastapi==0.110.0
2uvicorn==0.29.0

Implement the Server#

http_echo_server.py#
 1import os
 2import time
 3import uvicorn
 4from pydantic import BaseModel
 5from fastapi import FastAPI, status
 6from fastapi.responses import StreamingResponse
 7
 8
 9app = FastAPI()
10
11class HealthCheck(BaseModel):
12    status: str = "OK"
13
14# Implement the health check endpoint
15@app.get("/health", tags=["healthcheck"], summary="Perform a Health Check", response_description="Return HTTP Status Code 200 (OK)", status_code=status.HTTP_200_OK, response_model=HealthCheck)
16def get_health() -> HealthCheck:
17    return HealthCheck(status="OK")
18
19class Echo(BaseModel):
20    message: str
21    delay: float = 0.000001
22    repeats: int = 1
23    stream: bool = False
24
25
26# Implement the inference endpoint
27@app.post("/echo")
28async def echo(echo: Echo):
29    if echo.stream:
30        def stream_text():
31            for _ in range(echo.repeats):
32                time.sleep(echo.delay)
33                yield f"data: {echo.message}\n\n"
34        return StreamingResponse(stream_text(), media_type="text/event-stream")
35    else:
36        time.sleep(echo.delay)
37        return echo.message*echo.repeats
38
39# Serve the endpoints on a port
40if __name__ == "__main__":
41    uvicorn.run(app, host="0.0.0.0", port=8000, workers=int(os.getenv('WORKER_COUNT', 500)))

Note in the example above, the function’s configuration during creation will be:

  • Inference Protocol: HTTP

  • Inference Endpoint: /echo

  • Health Endpoint: /health

  • Inference Port (also used for health check): 8000

Create the Dockerfile#

Dockerfile#
 1FROM python:3.10.13-bookworm
 2
 3ENV WORKER_COUNT=10
 4
 5WORKDIR /app
 6
 7COPY requirements.txt ./
 8
 9RUN python -m pip install --no-cache-dir -U pip && \
10    python -m pip install --no-cache-dir -r requirements.txt
11
12COPY http_echo_server.py /app/
13
14CMD uvicorn http_echo_server:app --host=0.0.0.0 --workers=$WORKER_COUNT

Build the Container & Create the Function#

See the Functions Quickstart for the remaining steps.

Composing a PyTriton Container#

NVIDIA’s PyTriton is a Python native solution of Triton inference server. A minimum version of 0.3.0 is required.

Create the “requirements.txt” File#

  • This file should list the Python dependencies required for your model.

  • Add nvidia-pytriton to your requirements.txt file.

Here is an example of a requirements.txt file:

requirements.txt#
1--extra-index-url https://pypi.ngc.nvidia.com
2opencv-python-headless
3pycocotools
4matplotlib
5torch==2.1.0
6nvidia-pytriton==0.3.0
7numpy

Create the “run.py” File#

  1. Your run.py file (or similar Python file) needs to define a PyTriton model.

  2. This involves importing your model dependencies, creating a PyTritonServer class with an __init__ function, an _infer_fn function and a run function that serves the inference_function, defining the model name, the inputs and the outputs along with optional configuration.

Here is an example of a run.py file:

run.py#
 1import numpy as np
 2from pytriton.model_config import ModelConfig, Tensor
 3from pytriton.triton import Triton, TritonConfig
 4import time
 5....
 6class PyTritonServer:
 7    """triton server for timed_sleeper"""
 8
 9    def __init__(self):
10        # basically need to accept image, mask(PIL Images), prompt, negative_prompt(str), seed(int)
11        self.model_name = "timed_sleeper"
12
13    def _infer_fn(self, requests):
14        responses = []
15        for req in requests:
16            req_data = req.data
17            sleep_duration = numpy_array_to_variable(req_data.get("sleep_duration"))
18            # deal with header dict keys being lowerscale
19            request_parameters_dict = uppercase_keys(req.parameters)
20            time.sleep(sleep_duration)
21            responses.append({"sleep_duration": np.array([sleep_duration])})
22
23        return responses
24
25    def run(self):
26        """run triton server"""
27        with Triton(
28            config=TritonConfig(
29                http_header_forward_pattern="NVCF-*",  # this is required
30                http_port=8000,
31                grpc_port=8001,
32                metrics_port=8002,
33            )
34        ) as triton:
35            triton.bind(
36                model_name="timed_sleeper",
37                infer_func=self._infer_fn,
38                inputs=[
39                    Tensor(name="sleep_duration", dtype=np.uint32, shape=(1,)),
40                ],
41                outputs=[Tensor(name="sleep_duration", dtype=np.uint32, shape=(1,))],
42                config=ModelConfig(batching=False),
43            )
44            triton.serve()
45if __name__ == "__main__":
46    server = PyTritonServer()
47    server.run()

Create the “Dockerfile”#

  1. Create a file named Dockerfile in your model directory.

  2. It’s strongly recommended to use NVIDIA-optimized containers like CUDA, Pytorch or TensorRT as your base container. They can be downloaded from the NGC Catalog.

  3. Make sure to install your Python requirements in your Dockerfile.

  4. Copy in your model source code, and model weights unless you plan to host them in NGC Private Registry.

Here is an example of a Dockerfile:

Dockerfile#
 1FROM nvcr.io/nvidia/cuda:12.1.1-devel-ubuntu22.04
 2RUN apt-get update && apt-get install -y \
 3    git \
 4    python3 \
 5    python3-pip \
 6    python-is-python3 \
 7    libsm6 \
 8    libxext6 \
 9    libxrender-dev \
10    curl \
11    && rm -rf /var/lib/apt/lists/*
12WORKDIR /workspace/
13
14# Install requirements file
15COPY requirements.txt requirements.txt
16RUN pip install --no-cache-dir --upgrade pip
17RUN pip install --no-cache-dir -r requirements.txt
18ENV DEBIAN_FRONTEND=noninteractive
19
20# Copy model source code and weights
21COPY model_weights /models
22COPY model_source .
23COPY run.py .
24
25# Set run command to start PyTriton to serve the model
26CMD python3 run.py

Build the Docker Image#

  1. Open a terminal or command prompt.

  2. Navigate to the my_model directory.

  3. Run the following command to build the docker image:

docker build -t my_model_image .

Replace my_model_image with the desired name for your docker image.

Push the Docker Image#

Before beginning, ensure that you have authenticated with the NGC Docker Registry.

  1. Tag and push the docker image to the NGC Private Registry.

1> docker tag my_model_image:latest nvcr.io/$ORG_NAME/my_model_image:latest
2> docker push nvcr.io/$ORG_NAME/my_model_image:latest

Create the Function#

  1. Create the function via API by running the following curl with an $API_KEY and your $ORG_NAME. In this example, we defined the inference endpoint as 8000 and are using the default inference and health endpoint paths.

 1API_KEY=<your api key>
 2ORG_NAME=<your organization name>
 3
 4 curl --location 'https://api.ngc.nvidia.com/v2/nvcf/functions' \
 5 --header 'Content-Type: application/json' \
 6 --header 'Accept: application/json' \
 7 --header "Authorization: Bearer $API_KEY" \
 8 --data '{
 9     "name": "my-model-function",
10     "inferenceUrl": "/v2/models/my_model_image/infer",
11     "inferencePort": 8000,
12     "containerImage": "nvcr.io/'$ORG_NAME'/my_model_image:latest",
13     "health": {
14                 "protocol": "HTTP",
15                 "uri": "/v2/health/ready",
16                 "port": 8000,
17                 "timeout": "PT10S",
18                 "expectedStatusCode": 200
19             }
20 }'

Additional Examples#

See more examples of containers that are Cloud Functions compatible in the function samples directory.

Creating Functions with NGC Models & Resources#

When creating a function, models and resources can be mounted to the function instance. The models will be available under /config/models/{modelName} and /config/resources/{resourceName} where modelName and resourceName are specified as part of the API request.

Here is an example where a model and resource are added to a function creation API call, for an echo sample function:

 1curl -X 'POST' \
 2  'https://api.ngc.nvidia.com/v2/nvcf/functions' \
 3  -H "Authorization: Bearer $API_KEY" \
 4  -H 'accept: application/json' \
 5  -H 'Content-Type: application/json' \
 6  -d '{
 7     "name": "echo_function",
 8     "inferenceUrl": "/echo",
 9     "containerImage": "nvcr.io/$ORG_NAME/echo:latest",
10     "apiBodyFormat": "CUSTOM",
11     "models": [
12     {
13         "name": "simple_int8",
14         "version": "1",
15         "uri": "v2/org/'$ORG_NAME'/models/simple_int8/1/files"
16     }
17     ],
18     "resources": [
19     {
20         "name": "simple_resource",
21         "version": "1",
22         "uri": "v2/org/'$ORG_NAME'/resources/simple_resource/1/files"
23     }
24     ]
25 }'

Within the container, once the function instance is deployed, the model would be mounted at /config/models/simple_int8 and resource mounted at /config/resources/simple_int8

Creating gRPC-based Functions#

Cloud Functions supports function invocation via gRPC. During function creation, specify that the function is a gRPC function by setting the “Inference Protocol”, or inferenceUrl field to /grpc.

Prerequisites#

  • The function container must implement a gRPC port, endpoint and health check. The health check is expected to be served by the gRPC inference port, there is no need to define a separate health endpoint path.

gRPC Function Creation via UI#

In the Function Creation Page, set the “Inference Protocol” to gRPC and port to whatever your gRPC server has implemented.

gRPC Function Creation

gRPC Function Creation via CLI#

When creating the gRPC function, set the --inference-url argument to /grpc:

1 ngc cf function create --inference-port 8001 --container-image nvcr.io/$ORG_NAME/grpc_echo_sample:latest --name my-grpc-function --inference-url /grpc

gRPC Function Creation via API#

When creating the gRPC function, set the inferenceURl field to /grpc:

 1 curl --location 'https://api.ngc.nvidia.com/v2/nvcf/functions' \
 2 --header 'Content-Type: application/json' \
 3 --header 'Accept: application/json' \
 4 --header "Authorization: Bearer $API_KEY" \
 5 --data '{
 6     "name": "my-grpc-function",
 7     "inferenceUrl": "/grpc",
 8     "inferencePort": 8001,
 9     "containerImage": "nvcr.io/'$ORG_NAME'/grpc_echo_sample:latest"
10 }'

gRPC Function Invocation#

See gRPC Invocation for details on how to authenticate and invoke your gRPC function.

Creating Low Latency Streaming (LLS AKA GameStreamSDK/WebRTC) Functions#

Cloud Functions supports the ability to stream video, audio, and other data using WebRTC.

Here is a detailed diagram of the overall setup.

Low Latency Streaming Diagram

For complete examples of LLS streaming functions, see NVCF LLS Function Samples.

Currently LLS Streaming is only supported with GFN based instances. Either a single container or a helm chart can be used.

Building the Streaming Server Application#

The streaming application needs to be packaged inside a container and should be leveraging the StreamSDK. The streaming application needs to follow the below guidelines: 1. Expose an HTTP server at port CONTROL_SERVER_PORT with following 2 endpoints:

1. Health endpoint: This endpoint should return 200 HTTP status code only when the streaming application container is ready to start streaming a session. If the streaming application container doesn’t want to serve any more streaming sessions of current container deployment this endpoint should return HTTP status code 500. .. code-block:: yaml

Request:
Endpoint:

GET /v1/streaming/ready

Responses:
Status code:

200 OK 500 Internal Server Error

2. STUN creds endpoint: This endpoint should accept the access details and credentials for STUN server and keep it cached in the memory of the streaming application. When the streaming requests comes, the streaming application can use these access details and credentials to communicate with STUN server and request for opening of ports for streaming. .. code-block:: yaml

Request:
Endpoint:

POST /v1/streaming/creds

Headers:

Content-Type: application/json

Request Body:
{

“stunIp”: “<string>”, “stunPort”: <int>, “username”: “<string>”, “password”: “<string>”

}

Responses:
Status code:

200 OK

  1. Expose a server at port STREAMING_SERVER_PORT to accepting WebSocket connection

    1. An endpoint STREAMING_START_ENDPOINT should be exposed by this server

  2. Post websocket connection establishment guidelines:

    1. When the browser client requests for opening port for specific protocols (e.g. WebRTC), the streaming application needs to request STUN server to open port. This port should be in the range of 47998 and 48020 which would be referred as STREAMING_PORT_BINDING_RANGE in this doc.

  3. Containerization guidelines:

    1. The container should make sure that the CONTROL_SERVER_PORT, STREAMING_SERVER_PORT and STREAMING_PORT_BINDING_RANGE are exposed by the container and accessible from outside the container.

    2. If multiple sessions one after another needs to be supported with a fresh start of container, then exit the container after a streaming session ends.

Creating the LLS Streaming Function#

When creating the function, we need to ensure functionType is set to STREAMING:

 1curl -X 'POST' \
 2    'https://api.ngc.nvidia.com/v2/nvcf/functions' \
 3    -H "Authorization: Bearer $API_KEY" \
 4    -H 'accept: application/json' \
 5    -H 'Content-Type: application/json' \
 6    -d '{
 7        "name": "'$STREAMING_FUNCTION_NAME'",
 8        "inferenceUrl": "/sign_in",
 9        "inferencePort": '$STREAMING_SERVER_PORT',
10        "health": {
11            "protocol": "HTTP",
12            "uri": "/v1/streaming/ready",
13            "port": '$CONTROL_SERVER_PORT',
14            "timeout": "PT10S",
15            "expectedStatusCode": 200
16        },
17        "containerImage": "'$STREAMING_CONTAINER_IMAGE'",
18        "apiBodyFormat": "CUSTOM",
19        "description": "'$STREAMING_FUNCTION_NAME'",
20        "functionType": "STREAMING"
21        }
22    }'

Connecting to a streaming function with a client#

Intermediary Proxy#

An intermediary proxy service needs to be deployed in order to facilitate the connection to the streaming function.

The intermediate proxy serves to handle authentication and the headers that are required for NVCF, and also to align the connection behavior with NVCF that the browser can’t handle on its own, or the browser behavior is unpredictable.

Proxy Responsibilities

The intermediary proxy performs the following functionalities:

  1. Authenticate the user token coming from the browser to the intermediary proxy

  2. Authorize the user to have access to specific streaming function

  3. Once the user is authenticated and authorized, modify the websocket connection coming in to append the required NVCF headers (NVCF_API_KEY and STREAMING_FUNCTION_ID)

  4. Forward the websocket connection request to NVCF

Technical Implementation Guidance

nvcf-function-id Header

NVCF requires this header to be present to identify the function that needs to be reached. Browser does not have the mechanism to set any kind of headers in case of WebSocket connections other than Sec-Websocket-Protocol, so the intermediate proxy can serve to either add the nvcf-function-id header on its own, or to parse Sec-Websocket-Protocol if the browser added it there and get the function id from there.

See http-request add-header documentation in HAProxy.

Authentication

Originally, NVCF did not support client authentication tokens, so the role of intermediate proxy is to add the required server authentication here (e.g. http-request set-header Authorization "Bearer NVCF_BEARER_TOKEN").

Connection Keepalive

NVCF controls the session lifetime based on the TCP connection lifetime to the function and the type of disconnection that happens. The intermediate proxy helps to keep the connection with the browser alive.

Resume Support

NVCF returns the cookie with nvcf-request-id, but given the browser may reject the cookie since it is not from the same domain, the intermediate proxy helps to align this.

CORS Headers

For browsers to allow traffic with NVCF, the intermediate proxy needs to add the relevant CORS headers to responses from NVCF:

  • access-control-expose-headers: *

  • access-control-allow-headers: *

  • access-control-allow-origin: *

For guidance on implementing this in HAProxy, see http-response set-header documentation.

Example HAProxy Dockerfile

Below is an all-in-one Dockerfile sample for setting up an HAProxy intermediary proxy with optional TLS/SSL support:

Note

This example focuses on NVCF integration. In production, you should also implement user authentication and authorization to control access to your streaming function.

For certain applications, TLS/SSL support is required. The proxy can be configured to use self-signed certificates for development and testing purposes by setting PROXY_SSL_INSECURE=true.

  1FROM haproxy:3.2
  2
  3# Switch to root user for package installation
  4USER root
  5
  6# Install necessary tools
  7RUN apt-get update && apt-get install -y \
  8    bash \
  9    gettext-base \
 10    lua5.3 \
 11    openssl \
 12    && rm -rf /var/lib/apt/lists/*
 13
 14# Create directory for configuration and certificates
 15RUN mkdir -p /usr/local/etc/haproxy/lua \
 16    && mkdir -p /usr/local/etc/haproxy/certs \
 17    && chown -R haproxy:haproxy /usr/local/etc/haproxy
 18
 19# Create certificate generation script
 20COPY <<EOF /usr/local/bin/generate-cert.sh
 21#!/bin/bash
 22cd /usr/local/etc/haproxy/certs
 23openssl req -x509 -newkey rsa:2048 -keyout server.key -out server.crt -days 365 -nodes -subj "/CN=localhost" -quiet
 24# Combine certificate and key into a single file for HAProxy
 25cat server.crt server.key > server.pem
 26chown haproxy:haproxy server.key server.crt server.pem
 27chmod 600 server.key server.pem
 28chmod 644 server.crt
 29EOF
 30
 31RUN chmod +x /usr/local/bin/generate-cert.sh
 32
 33# Create the HAProxy configuration template file
 34COPY --chown=haproxy:haproxy <<EOF /usr/local/etc/haproxy/haproxy.cfg.template
 35global
 36        log stdout    local0 info
 37        stats timeout 30s
 38        user haproxy
 39
 40        # Default SSL material locations
 41        ca-base /etc/ssl/certs
 42        crt-base /etc/ssl/private
 43
 44        # SSL server verification enabled for security
 45        ssl-server-verify required
 46
 47        # See: https://ssl-config.mozilla.org/#server=haproxy&server-version=3.2&config=intermediate
 48        ssl-default-bind-ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384
 49        ssl-default-bind-ciphersuites TLS_AES_128_GCM_SHA256:TLS_AES_256_GCM_SHA384:TLS_CHACHA20_POLY1305_SHA256
 50        ssl-default-bind-options ssl-min-ver TLSv1.2 no-tls-tickets
 51
 52defaults
 53        log     global
 54        option  httplog
 55        option  dontlognull
 56        option  logasap
 57        timeout connect 5000
 58        timeout client  50000
 59        timeout server  50000
 60
 61frontend test_frontend
 62        log  global
 63        bind *:\${PROXY_PORT} \${PROXY_SSL_BIND_OPTIONS}
 64        mode http
 65        timeout client       7s
 66        timeout http-request 30m
 67        use_backend webrtc_backend
 68
 69backend webrtc_backend
 70        log  global
 71        mode http
 72        timeout connect 4s
 73        timeout server  7s
 74        http-request set-header Host \${NVCF_SERVER}
 75        http-request set-header Authorization "Bearer \${NGC_CLI_API_KEY}"
 76        http-request set-header Function-ID \${STREAMING_FUNCTION_ID}
 77        server s1 \${NVCF_SERVER}:443 ssl ca-file /etc/ssl/certs/ca-certificates.crt verify required
 78EOF
 79
 80# Create the entrypoint script
 81COPY <<EOF /entrypoint.sh
 82#!/bin/bash
 83
 84# Check required environment variables
 85if [ -z "\${NGC_CLI_API_KEY:+x}" ]; then
 86    echo "NGC_CLI_API_KEY must be set"
 87    exit 1
 88fi
 89
 90if [ -z "\${STREAMING_FUNCTION_ID:+x}" ]; then
 91    echo "STREAMING_FUNCTION_ID must be set"
 92    exit 1
 93fi
 94
 95# Use default NVCF_SERVER if not set
 96if [ -z "\${NVCF_SERVER:+x}" ]; then
 97    export NVCF_SERVER=grpc.nvcf.nvidia.com
 98    echo "NVCF_SERVER not set, using default: \${NVCF_SERVER}"
 99fi
100
101# Use default PROXY_PORT if not set
102if [ -z "\${PROXY_PORT:+x}" ]; then
103    export PROXY_PORT=49100
104    echo "PROXY_PORT not set, using default: \${PROXY_PORT}"
105fi
106
107# Use default PROXY_SSL_INSECURE if not set
108if [ -z "\${PROXY_SSL_INSECURE:+x}" ]; then
109    export PROXY_SSL_INSECURE=false
110    echo "PROXY_SSL_INSECURE not set, using default: \${PROXY_SSL_INSECURE}"
111fi
112
113echo "Launching intermediate proxy:"
114echo "  API Key: \${NGC_CLI_API_KEY:0:6}**********\${NGC_CLI_API_KEY: -3}"
115echo "  Function ID: \${STREAMING_FUNCTION_ID}"
116echo "  Version ID: \${STREAMING_FUNCTION_VERSION_ID}"
117echo "  NVCF Server: \${NVCF_SERVER}"
118echo "  Proxy Port: \${PROXY_PORT}"
119echo "  Proxy SSL (Insecure): \${PROXY_SSL_INSECURE}"
120
121# Generate self-signed certificate if SSL is enabled
122if [ "\${PROXY_SSL_INSECURE}" = "true" ]; then
123    /usr/local/bin/generate-cert.sh
124    export PROXY_SSL_BIND_OPTIONS="ssl crt /usr/local/etc/haproxy/certs/server.pem"
125    echo "SSL enabled - self-signed certificate generated"
126else
127    export PROXY_SSL_BIND_OPTIONS=""
128    echo "SSL disabled - running in HTTP mode"
129fi
130
131# Process the template and create the final config
132envsubst < /usr/local/etc/haproxy/haproxy.cfg.template > /usr/local/etc/haproxy/haproxy.cfg
133
134# Function to handle signals and forward them to HAProxy
135handle_signal() {
136    echo "Received signal, shutting down HAProxy..."
137    if [ -n "\$HAPROXY_PID" ]; then
138        kill -TERM "\$HAPROXY_PID" 2>/dev/null
139        wait "\$HAPROXY_PID"
140    fi
141    exit 0
142}
143
144# Set up signal handlers
145trap handle_signal SIGTERM SIGINT
146
147# Start HAProxy in background and capture PID
148echo "Starting HAProxy..."
149haproxy -f /usr/local/etc/haproxy/haproxy.cfg &
150HAPROXY_PID=\$!
151
152# Wait for HAProxy process
153wait "\$HAPROXY_PID"
154EOF
155
156RUN chmod +x /entrypoint.sh
157
158# Switch back to haproxy user
159USER haproxy
160
161# Set the entrypoint
162ENTRYPOINT ["/entrypoint.sh"]

Environment Variables

The following environment variables control proxy behavior:

Variable

Required

Default

Description

NGC_CLI_API_KEY

Yes

Your NGC Personal API Key

STREAMING_FUNCTION_ID

Yes

Your NVCF streaming function ID

STREAMING_FUNCTION_ VERSION_ID

No

Specific version of your function (optional)

NVCF_SERVER

No

grpc.nvcf. nvidia.com

NVCF server endpoint

PROXY_PORT

No

49100

Port for the proxy to listen on

PROXY_SSL_INSECURE

No

false

Enable SSL with self-signed certificate (set to “true” to enable)

Usage Examples

1. HTTP Mode (Default)

Standard configuration without SSL:

export STREAMING_FUNCTION_ID=your-function-id
export NGC_CLI_API_KEY=your-ngc-cli-api-key
export PROXY_PORT=49100

docker build -t nvcf-haproxy-proxy .

docker run --rm -it \
    -p 127.0.0.1:${PROXY_PORT}:${PROXY_PORT}/tcp \
    -e PROXY_PORT="$PROXY_PORT" \
    -e NGC_CLI_API_KEY="$NGC_CLI_API_KEY" \
    -e STREAMING_FUNCTION_ID="$STREAMING_FUNCTION_ID" \
    nvcf-haproxy-proxy

2. HTTPS Mode with Self-Signed Certificate

Configuration with SSL enabled using a self-signed certificate:

export STREAMING_FUNCTION_ID=your-function-id
export NGC_CLI_API_KEY=your-ngc-cli-api-key
export PROXY_SSL_INSECURE=true
export PROXY_PORT=48322

docker build -t nvcf-haproxy-proxy .

docker run --rm -it \
    -p 127.0.0.1:${PROXY_PORT}:${PROXY_PORT}/tcp \
    -e PROXY_PORT=${PROXY_PORT} \
    -e PROXY_SSL_INSECURE=${PROXY_SSL_INSECURE} \
    -e NGC_CLI_API_KEY="$NGC_CLI_API_KEY" \
    -e STREAMING_FUNCTION_ID="$STREAMING_FUNCTION_ID" \
    nvcf-haproxy-proxy

Note

Since this configuration uses self-signed certificates for development and testing, you will need to configure your client to accept untrusted certificates. In production environments, you should use proper CA-signed certificates.

Web Browser Client#

Using the proxy, a browser client can be used to connect to the stream. The browser client needs to be developed by the customer leveraging the raganrok dev branch 0.0.1503 version. Please ensure that the flags are set: .. code-block:: javascript

const configData: RagnarokConfigData = {

overrideData: “disableworkerws=true”

}

ConfigureRagnarokSettings(configData);

Available Container Variables#

The following is a reference of available variables via the headers of the invocation message (auto-populated by Cloud Functions), accessible within the container.

For examples of how to extract and use some of these variables, see NVCF Container Helper Functions.

Name

Description

NVCF-REQID

Request ID for this request.

NVCF-SUB

Message subject.

NVCF-NCAID

Function’s organization’s NCA ID.

NVCF-FUNCTION-NAME

Function name.

NVCF-FUNCTION-ID

Function ID.

NVCF-FUNCTION-VERSION-ID

Function version ID.

NVCF-LARGE-OUTPUT-DIR

Large output directory path.

NVCF-MAX-RESPONSE-SIZE-BYTES

Max response size in bytes for the function.

NVCF-NSPECTID

NVIDIA reserved variable.

NVCF-BACKEND

Backend or “Cluster Group” the function is deployed on.

NVCF-INSTANCETYPE

Instance type the function is deployed on.

NVCF-REGION

Region or zone the function is deployed in.

NVCF-ENV

Spot environment if deployed on spot instances.

Environment Variables#

The following environment variables are automatically injected into your function containers when they are deployed and can be accessed using standard environment variable access methods in your application code:

Name

Description

NVCF_BACKEND

Backend or “Cluster Group” the function is deployed on.

NVCF_ENV

Spot environment if deployed on spot instances.

NVCF_FUNCTION_ID

Function ID.

NVCF_FUNCTION_NAME

Function name.

NVCF_FUNCTION_VERSION_ID

Function version ID.

NVCF_INSTANCETYPE

Instance type the function is deployed on.

NVCF_NCA_ID

Function’s organization’s NCA ID.

NVCF_REGION

Region or zone the function is deployed in.

Note

All environment variables with the NVCF_* prefix are reserved and should not be overridden in your application code or function configuration.

Helm-Based Function Creation#

Cloud functions support helm-based functions for orchestration across multiple containers.

Prerequisites#

Warning

Ensure that your helm charts version does not contain - For example v1 is ok but v1-test will cause issues.

  1. The helm chart must have a “mini-service” container defined, which will be used as the inference entry point.

  2. The name of this service in your helm chart should be supplied by setting helmChartServiceName during the function definition. This allows Cloud Functions to communicate and make inference requests to the “mini-service” endpoint.

Attention

The servicePort defined within the helm chart should be used as the inferencePort supplied during function creation. Otherwise, Cloud Functions will not be able to reach the “mini-service”.

  1. Ensure you have the NGC CLI configured and have pushed your helm chart to NGC Private Registry. Refer to Managing Helm Charts Using the NGC CLI.

Secret Management#

For pulling containers defined as part of the helm chart from NGC Private Registry, define ngcImagePullSecretName in values.yaml. This value will be used in the deployment spec as spec.imagePullSecrets.name for the pods.

For nested Helm charts, define global.ngcImagePullSecretName in values.yaml, which will be referenced in the deployment spec under spec.imagePullSecrets.name for the pods.

Warning

Containers defined in the helm chart should be in the same NGC Organization and Team that the helm chart itself is being pulled from.

Create a Helm-based Function#

  1. Ensure your helm chart is uploaded to NGC Private Registry and adheres to the Prerequisites listed above.

  2. Create the function:

    • Include the following additional parameters in the function definition

      • helmChart

      • helmChartServiceName

    • The helmChart property should be set to the URL hosted by the NGC Model Registry pointing to the helm chart that will deploy the “mini-service”. Please note, that this helm chart URL should be accessible to the NGC org in which the function will eventually be deployed. The helm chart URL should follow the format: https://helm.ngc.nvidia.com/$ORG_ID/$TEAM_NAME/charts/$NAME-X.Y.Z.tgz for example, https://helm.ngc.nvidia.com/abc123/teamA/charts/nginx-0.1.5.tgz would be a valid chart URL but https://helm.ngc.nvidia.com/abc123/teamA/charts/nginx-0.1.5-hello.tgz would not.

    • The helmChartServiceName is used for checking if the “mini-service” is ready for inference and is also scraped for function metrics. At this time, templatized service names are not supported. This must match the service name of your “mini-service” with the exposed entry point port.

    • Important: The Helm chart name should not contain underscores or other special symbols, as that may cause issues during deployment.

Example Creation via API

Please see our sample helm chart used in this example for reference.

Below is an example function creation API call creating a helm-based function:

 1curl -X 'POST' \
 2    'https://api.ngc.nvidia.com/v2/nvcf/functions' \
 3    -H "Authorization: Bearer $API_KEY" \
 4    -H 'accept: application/json' \
 5    -H 'Content-Type: application/json' \
 6    -d '{
 7    "name": "function_name",
 8    "inferenceUrl": "v2/models/model_name/versions/model_version/infer",
 9    "inferencePort": 8001,
10    "helmChart": "https://helm.ngc.nvidia.com/'$ORG_ID'/'$TEAM_NAME'/charts/inference-test-1.0.tgz",
11    "helmChartServiceName": "service_name",
12    "apiBodyFormat": "CUSTOM"
13}'

Note

For gRPC-based functions, set "inferenceURL" : "/gRPC". This signals to Cloud Functions that the function is using gRPC protocol and is not expected to have a /gRPC endpoint exposed for inferencing requests.

  1. Proceed with function deployment and invocation normally.

Multi-node helm deployment To create a multi-node helm deployment, you need to use the following format for the instanceType: <CSP>.GPU.<GPU_NAME>_<number of gpus per node>x[.x<number of nodes>]. For example, DGXC.GPU.L40S_1x is a single L40S instance while ON-PREM.GPU.B200_8x.x2 is two full nodes of 8-way B200.

A sample helm chart for a multi-node deployment can be found in the multi-node helm example.

Limitations#

When using Helm Charts to deploy a function, the following limitations need to be taken into consideration.

1. Asset caching#

NGC model and resource caching - Automatic mounting of NGC Models and Resources for your container is not supported (coming soon). - For any downloads (such as of assets or models) occurring within your function’s containers, download size is limited by the disk space on the VM - for GFN this is 100GB approximately, and for other clusters this limit will vary.

2. Inference#

Progress/partial response reporting is not supported, including any additional artifacts generated during inferencing. Consider opting for HTTP streaming or gRPC bidirectional support.

3. Security Constraints#

Helm charts must conform to certain security standards to be deployable as a function. This means that certain helm and Kubernetes features are restricted in NVCF backends. NVCF will process your helm chart on function creation, then later on deployment with your Helm values and other deployment metadata, to ensure standards are enforced.

NVCF may automatically modify certain objects in your chart so they conform to these standards; it will only do so if modification will not break your chart when it is installed in the targeted backend. Possible areas amenable to modification will be noted in the restrictions section below. Any standard that cannot be enforced by modification will result in error(s) during function creation.

Restrictions
  • Supported k8s artifacts under Helm Chart Namespace are listed below; others will be rejected:

    • ConfigMaps

    • Secrets

    • Services - Only type: ClusterIP or none

    • Deployments

    • ReplicaSets

    • StatefulSets

    • Jobs

    • CronJobs

    • Pods

    • ServiceAccounts (GFN backend only)

    • Roles (GFN backend only)

    • Rolebindings (GFN backend only)

    • PersistentVolumeClaims (GFN backend only)

  • The only allowed Pod or Pod template volume types are:

    • configMap

    • secret

    • projected.sources.* of any of the above

    • persistentVolumeClaim (GFN backend only)

    • emptyDir

  • No chart hooks are allowed; if specified in the chart, they will not be executed.

Note

CustomResourceDefinitions in helm charts will be skipped on installation. There is no need to modify your chart to remove them from helm template output for NVCF.

Helm charts _should_ conform to these additional security standards. While not enforced now, they will be at a later date.
  • All containers have resource limits for at least cpu and memory (and nvidia.com/gpu, ephemeral-storage if required for certain containers).

  • All Pod’s and resources that define a Pod template conform to the Kubernetes Pod Security Standards Baseline and Restricted policies.

  • Pod and container securityContext’s conform to these parameters:

  • automountServiceAccountToken must be unset or set to false

  • runAsNonRoot must be explicitly set to true

  • hostIPC, hostPID, and hostNetwork must be unset or set to false

  • No privilege escalation, root capabilities, or non-defalt Seccomp, AppArmor, or SELinux profiles are allowed. See the Baseline and Restricted Pod security standards for fields that cannot be explicitly set.

Helm Chart Overrides#

To override keys in your helm chart values.yml, you can provide the configuration parameter and supply corresponding key-value pairs in JSON format which you would like to be overridden when the function is deployed.

Example helm chart override#
 1curl -X 'POST' \
 2 'https://api.ngc.nvidia.com/v2/nvcf/deployments/functions/fe6e6589-12bb-423a-9bf6-8b9d028b8bf4/versions/fe6e6589-12bb-423a-9bf6-8b9d028b8bf4' \
 3 -H "Authorization: Bearer $API_KEY" \
 4 -H 'accept: application/json' \
 5 -H 'Content-Type: application/json' \
 6 -d '{
 7     "deploymentSpecifications": [{
 8         "gpu": "L40",
 9         "backend": "OCI",
10         "maxInstances": 2,
11         "minInstances": 1,
12         "configuration": {
13         "key_one": "<value>",
14         "key_two": { "key_two_subkey_one": "<value>", "key_two_subkey_two": "<value>" }
15     ...
16     },
17     {
18         "gpu": "T10",
19         "backend": "GFN",
20         "maxInstances": 2,
21         "minInstances": 1
22     }]
23 }'