Container-Based Function Creation#

Container-based functions require building and pushing a Cloud Functions compatible Docker container image to your container registry.

Resources#

  • Example containers can be found in the examples repository.

  • The repository also contains helper functions that are useful when authoring your container, including:

    • Helpers that parse Cloud Functions-specific parameters on invocation

    • Helpers that can be used to instrument your container with Cloud Functions compatible logs

  • It’s always a best practice to emit logs from your inference container. Cloud Functions supports third-party logging and metrics emission from your container.

Attention

Please note that container functions should not run as root user, running as root is not formally supported on any Cloud Functions backend.

Container Endpoints#

Any server can be implemented within the container, as long as it implements the following:

  • For HTTP-based functions, a health check endpoint that returns a 200 HTTP Status Code on success.

  • For gRPC-based functions, a standard gRPC health check. See these docs for more info also gRPC Health Checking.

  • An inference endpoint (this endpoint will be called during function invocation)

These endpoints are expected to be served on the same port, defined as the inferencePort.

Warning

Cloud Functions reserves the following ports on your container for internal monitoring and metrics:

  • Port 8080

  • Port 8010

Cloud Functions also expects the following directories in the container to remain read-only for caching purposes:

  • /config/ directory

  • Nested directories created inside /config/

Composing a FastAPI Container#

It’s possible to use any container with Cloud Functions as long as it implements a server with the above endpoints. The below is an example of a FastAPI-based container compatible with Cloud Functions. Clone the FastAPI echo example.

Create the “requirements.txt” File#

requirements.txt#
1fastapi==0.110.0
2uvicorn==0.29.0

Implement the Server#

http_echo_server.py#
 1import os
 2import time
 3import uvicorn
 4from pydantic import BaseModel
 5from fastapi import FastAPI, status
 6from fastapi.responses import StreamingResponse
 7
 8
 9app = FastAPI()
10
11class HealthCheck(BaseModel):
12    status: str = "OK"
13
14# Implement the health check endpoint
15@app.get("/health", tags=["healthcheck"], summary="Perform a Health Check", response_description="Return HTTP Status Code 200 (OK)", status_code=status.HTTP_200_OK, response_model=HealthCheck)
16def get_health() -> HealthCheck:
17    return HealthCheck(status="OK")
18
19class Echo(BaseModel):
20    message: str
21    delay: float = 0.000001
22    repeats: int = 1
23    stream: bool = False
24
25
26# Implement the inference endpoint
27@app.post("/echo")
28async def echo(echo: Echo):
29    if echo.stream:
30        def stream_text():
31            for _ in range(echo.repeats):
32                time.sleep(echo.delay)
33                yield f"data: {echo.message}\n\n"
34        return StreamingResponse(stream_text(), media_type="text/event-stream")
35    else:
36        time.sleep(echo.delay)
37        return echo.message*echo.repeats
38
39# Serve the endpoints on a port
40if __name__ == "__main__":
41    uvicorn.run(app, host="0.0.0.0", port=8000, workers=int(os.getenv('WORKER_COUNT', 500)))

Note in the example above, the function’s configuration during creation will be:

  • Inference Protocol: HTTP

  • Inference Endpoint: /echo

  • Health Endpoint: /health

  • Inference Port (also used for health check): 8000

Create the Dockerfile#

Dockerfile#
 1FROM python:3.10.13-bookworm
 2
 3ENV WORKER_COUNT=10
 4
 5WORKDIR /app
 6
 7COPY requirements.txt ./
 8
 9RUN python -m pip install --no-cache-dir -U pip && \
10    python -m pip install --no-cache-dir -r requirements.txt
11
12COPY http_echo_server.py /app/
13
14CMD uvicorn http_echo_server:app --host=0.0.0.0 --workers=$WORKER_COUNT

Build the Container & Create the Function#

See the Create the Function section below for the remaining steps.

Composing a PyTriton Container#

NVIDIA’s PyTriton is a Python native solution of Triton inference server. A minimum version of 0.3.0 is required.

Create the “requirements.txt” File#

  • This file should list the Python dependencies required for your model.

  • Add nvidia-pytriton to your requirements.txt file.

Here is an example of a requirements.txt file:

requirements.txt#
1--extra-index-url https://pypi.ngc.nvidia.com
2opencv-python-headless
3pycocotools
4matplotlib
5torch==2.1.0
6nvidia-pytriton==0.3.0
7numpy

Create the “run.py” File#

  1. Your run.py file (or similar Python file) needs to define a PyTriton model.

  2. This involves importing your model dependencies, creating a PyTritonServer class with an __init__ function, an _infer_fn function and a run function that serves the inference_function, defining the model name, the inputs and the outputs along with optional configuration.

Here is an example of a run.py file:

run.py#
 1import numpy as np
 2from pytriton.model_config import ModelConfig, Tensor
 3from pytriton.triton import Triton, TritonConfig
 4import time
 5....
 6class PyTritonServer:
 7    """triton server for timed_sleeper"""
 8
 9    def __init__(self):
10        # basically need to accept image, mask(PIL Images), prompt, negative_prompt(str), seed(int)
11        self.model_name = "timed_sleeper"
12
13    def _infer_fn(self, requests):
14        responses = []
15        for req in requests:
16            req_data = req.data
17            sleep_duration = numpy_array_to_variable(req_data.get("sleep_duration"))
18            # deal with header dict keys being lowerscale
19            request_parameters_dict = uppercase_keys(req.parameters)
20            time.sleep(sleep_duration)
21            responses.append({"sleep_duration": np.array([sleep_duration])})
22
23        return responses
24
25    def run(self):
26        """run triton server"""
27        with Triton(
28            config=TritonConfig(
29                http_header_forward_pattern="NVCF-*",  # this is required
30                http_port=8000,
31                grpc_port=8001,
32                metrics_port=8002,
33            )
34        ) as triton:
35            triton.bind(
36                model_name="timed_sleeper",
37                infer_func=self._infer_fn,
38                inputs=[
39                    Tensor(name="sleep_duration", dtype=np.uint32, shape=(1,)),
40                ],
41                outputs=[Tensor(name="sleep_duration", dtype=np.uint32, shape=(1,))],
42                config=ModelConfig(batching=False),
43            )
44            triton.serve()
45if __name__ == "__main__":
46    server = PyTritonServer()
47    server.run()

Create the “Dockerfile”#

  1. Create a file named Dockerfile in your model directory.

  2. It’s strongly recommended to use NVIDIA-optimized containers like CUDA, Pytorch or TensorRT as your base container. They can be downloaded from the NGC Catalog.

  3. Make sure to install your Python requirements in your Dockerfile.

  4. Copy in your model source code, and model weights.

Here is an example of a Dockerfile:

Dockerfile#
 1FROM nvcr.io/nvidia/cuda:12.1.1-devel-ubuntu22.04
 2RUN apt-get update && apt-get install -y \
 3    git \
 4    python3 \
 5    python3-pip \
 6    python-is-python3 \
 7    libsm6 \
 8    libxext6 \
 9    libxrender-dev \
10    curl \
11    && rm -rf /var/lib/apt/lists/*
12WORKDIR /workspace/
13
14# Install requirements file
15COPY requirements.txt requirements.txt
16RUN pip install --no-cache-dir --upgrade pip
17RUN pip install --no-cache-dir -r requirements.txt
18ENV DEBIAN_FRONTEND=noninteractive
19
20# Copy model source code and weights
21COPY model_weights /models
22COPY model_source .
23COPY run.py .
24
25# Set run command to start PyTriton to serve the model
26CMD python3 run.py

Build the Docker Image#

  1. Open a terminal or command prompt.

  2. Navigate to the my_model directory.

  3. Run the following command to build the docker image:

docker build -t my_model_image .

Replace my_model_image with the desired name for your docker image.

Push the Docker Image#

Tag and push the docker image to your container registry.

1docker tag my_model_image:latest ${REGISTRY}/${REPOSITORY}/my_model_image:latest
2docker push ${REGISTRY}/${REPOSITORY}/my_model_image:latest

Create the Function#

Create the function via the NVCF API. In this example, we defined the inference port as 8000 and are using the default inference and health endpoint paths.

 1 curl -s -X POST "http://${GATEWAY_ADDR}/v2/nvcf/functions" \
 2 -H "Host: api.${GATEWAY_ADDR}" \
 3 -H 'Content-Type: application/json' \
 4 -H "Authorization: Bearer $NVCF_TOKEN" \
 5 -d '{
 6     "name": "my-model-function",
 7     "inferenceUrl": "/v2/models/my_model_image/infer",
 8     "inferencePort": 8000,
 9     "containerImage": "'${REGISTRY}'/'${REPOSITORY}'/my_model_image:latest",
10     "health": {
11                 "protocol": "HTTP",
12                 "uri": "/v2/health/ready",
13                 "port": 8000,
14                 "timeout": "PT10S",
15                 "expectedStatusCode": 200
16             }
17 }'

Additional Examples#

See more examples of containers that are Cloud Functions compatible in the function samples directory.

Creating gRPC-based Functions#

Cloud Functions supports function invocation via gRPC. During function creation, specify that the function is a gRPC function by setting the inferenceUrl field to /grpc.

Prerequisites#

  • The function container must implement a gRPC port, endpoint and health check. The health check is expected to be served by the gRPC inference port, there is no need to define a separate health endpoint path.

gRPC Function Creation via API#

When creating the gRPC function, set the inferenceUrl field to /grpc:

 1 curl -s -X POST "http://${GATEWAY_ADDR}/v2/nvcf/functions" \
 2 -H "Host: api.${GATEWAY_ADDR}" \
 3 -H 'Content-Type: application/json' \
 4 -H "Authorization: Bearer $NVCF_TOKEN" \
 5 -d '{
 6     "name": "my-grpc-function",
 7     "inferenceUrl": "/grpc",
 8     "inferencePort": 8001,
 9     "containerImage": "'${REGISTRY}'/'${REPOSITORY}'/grpc_echo_sample:latest"
10 }'

gRPC Function Invocation#

gRPC function invocation uses the same Authorization: Bearer $NVCF_TOKEN header as HTTP invocation, passed as gRPC metadata. See the gRPC invocation examples for details on how to authenticate and invoke your gRPC function.