Function Management - NVIDIA Docs

The lifecycle of a Function in any of the following states:

ACTIVE - If at least one worker node is active. Only when a Function is ACTIVE it can be invoked.
ERROR - If all the worker nodes associated with the Function are in an ERROR state.
INACTIVE - When a Function is created, it is INACTIVE. Also, when a Function is undeployed, the state is changed from ACTIVE to INACTIVE.
DEPLOYING - When the Function is being deployed and the instances or Workers are still coming up.

Function Creation

Functions can be created in one of three ways:

A Triton Inference Server compatible Model repository
A Docker Image
A Helm chart

Function Creation with your own Model

Bring Your Own Model Overview

This is a guide to enable model owners to make use of Triton Inference Server to serve a model within NVIDIA’s Cloud Functions and NGC Private Registry.

A single model can be easily deployed leveraging Triton’s Auto-Generated Model Configuration functionality; simply upload the model and the inputs will be automatically discovered and deployed as part of the endpoint. However, more complicated configurations are supported as well through manually specifying the configuration or having multiple models controlled inside of an overall “ensemble” or through backend logic scripting (BLS)

When using BYOM within Cloud Functions, you will need to take into consideration the following limitations:

Only a HTTP interface is exposed
The service is not stateful
Must load models at start time

Creating the Model

As described above, a single model can be deployed and the configuration will be autogenerated. However, if a more complicated setup is required a complete model configuration will need to be created. Please see the Triton Model Configuration documentation to learn more.

Typically we recommend using Triton’s Python model backend as one if not the only model for several reasons:

90% of use cases have no impact on performance
Faster development
You can call out to other backends
Pre or post-processing within

Refer to the model.py example to develop your model and refer to the config.pbtxt example and create the file with the necessary configuration for your model.

If you are deploying on NVCF and using a model-only function, the input sent should not exceed 5MB. For container-based functions using assets as input, the asset should be read from a specific path. The python library NVCF Container Helpers is available to assist with common tasks like these.

There is also a Stable Diffusion based example available under examples/byom/sd_txt2img that demonstrates

Here are some other examples of how to create more complex workflows within Triton:

Uploading the Model

For detailed instructions on uploading a model to NGC please see Uploading a New NGC Model Version Using the NGC CLI

Use the Model repository structure to NGC:

To upload your model to the NVIDIA GPU Cloud (NGC) registry, follow these steps:

Confirm that the model file is present in the correct directory.
Upload the model to your NGC private registry by running the following command (note rnwu0zzwflg6 is the org ID found under Organization -> Profile):

Copy
Copied!

            
            ngc registry model upload-version rnwu0zzwflg6/bis-test-2:v1 --gpu-model 'GV-100' --source ./kngo-test-customization2_v1/gpt2b_ptuning.llmservice.nemo

After the upload is complete, you will receive an upload summary confirming the status:

Copy
Copied!

            
            Model ID: bis-test-2[version=v1]
Upload status: Completed
...

Creating a Function with Models Only

Note that the “containerImage” is omitted from the request. This Function will use the Triton Inference Server. Model-only Function uses the Predict Protocol - Version 2 for the “requestBody” (input & output).

Copy
Copied!

            
            curl -X 'POST' \
  'https://api.nvcf.nvidia.com/v2/nvcf/functions' \
  -H 'Authorization: Bearer <Token>' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
"name": "simple_int8_1",
"inferenceUrl": "v2/models/simple_int8/infer",
"models": [
{
"name": "simple_int8",
"version": "1",
"uri": "v2/org/cf/team/myteam/models/simple_int8/versions/1/zip"
}
],
"apiBodyFormat": "PREDICT_V2"
}'

Function Creation with your Docker Container

Bring Your Own Container Overview

This is a guide to enable users to build a Docker that will work within NVIDIA Cloud Functions.

Samples can be found here <https://github.com/NVIDIA/nv-cloud-function-helpers/tree/main/examples>

Note: A health check is required by NVCF in order to deploy the function.

First Method: Building a Triton Inference Server Container

NVCF is designed to working natively with Triton Inference Server based containers, including leveraging metrics and health checks from the server.

The pre-built Triton Docker images can be found within NGC’s Container catalog. A minimum version of 23.04 (2.33.0) is required.

When setting the Docker Image’s run command to start tritonserver the following command options are mandatory:

Copy
Copied!

            
            CMD tritonserver --model-repository=${YOUR_PATH_HERE} --http-header-forward-pattern NVCF-.*

Once the Docker image is built and ready, it can be uploaded to NGC:

Tag the Docker image:

Copy
Copied!

            
            docker tag my_model_image nvcr.io/[ngc_org]/[ngc_team]/echo:latest/my_model_image:latest

Log in to NGC:

Copy
Copied!

            
            docker login nvcr.io

Enter your credentials

Push the Docker image to NGC:

Copy
Copied!

            
            docker push nvcr.io/[ngc_org]/[ngc_team]/echo:latest/my_model_image:latest

Second Method: PyTriton

NVIDIA’s PyTriton is a Python native solution of Triton inference server that works natively with NVCF. A minimum version of 0.3.0 is required.

Create the ``requirements.txt`` file:

This file should list the Python dependencies required for your model.
Add nvidia-pytriton to your requirements.txt file.

Create the ``run.py`` file:

Your run.py file (or similar python file) needs to define a PyTriton model.
This involves importing your model dependencies, creating a PyTritonServer class with an __init__ function, an _infer_fn function and a run function that serves the inference_function, defining the model name, the inputs and the outputs along with optional configuration.

Here is an example of a run.py file:

run.py

Copy
Copied!

            
            import numpy as np
from pytriton.model_config import ModelConfig, Tensor
from pytriton.triton import Triton, TritonConfig
import time
....
class PyTritonServer:
    """triton server for timed_sleeper"""

    def __init__(self):
        # basically need to accept image, mask(PIL Images), prompt, negative_prompt(str), seed(int)
        self.model_name = "timed_sleeper"

    def _infer_fn(self, requests):
        responses = []
        for req in requests:
            req_data = req.data
            sleep_duration = numpy_array_to_variable(req_data.get("sleep_duration"))
            # deal with header dict keys being lowerscale
            request_parameters_dict = uppercase_keys(req.parameters)
            time.sleep(sleep_duration)
            responses.append({"sleep_duration": np.array([sleep_duration])})

        return responses

    def run(self):
        """run triton server"""
        with Triton(
            config=TritonConfig(
                http_header_forward_pattern="NVCF-*",  # this is required
                http_port=8000,
                grpc_port=8001,
                metrics_port=8002,
            )
        ) as triton:
            triton.bind(
                model_name="timed_sleeper",
                infer_func=self._infer_fn,
                inputs=[
                    Tensor(name="sleep_duration", dtype=np.uint32, shape=(1,)),
                ],
                outputs=[Tensor(name="sleep_duration", dtype=np.uint32, shape=(1,))],
                config=ModelConfig(batching=False),
            )
            triton.serve()
if __name__ == "__main__":
    server = PyTritonServer()
    server.run()

Build the Dockerfile

Create a file named Dockerfile in your model directory.
You can use containers like NVIDIA CUDA, Pytorch or TensorRT as your base container. They can be downloaded from the NGC Catalog.
Make sure to install your Python requirements in your Dockerfile.
Copy in your model source code, and model weights unless you plan to host them in NGC Registry.

Here is a sample Dockerfile and requirements.txt:

Dockerfile

Copy
Copied!

            
            FROM nvcr.io/nvidia/cuda:12.1.1-devel-ubuntu22.04
RUN apt-get update && apt-get install -y \
    git \
    python3 \
    python3-pip \
    python-is-python3 \
    libsm6 \
    libxext6 \
    libxrender-dev \
    curl \
    && rm -rf /var/lib/apt/lists/*
WORKDIR /workspace/
# install requirements file
COPY requirements.txt requirements.txt
RUN pip install --no-cache-dir --upgrade pip
RUN pip install --no-cache-dir -r requirements.txt
ENV DEBIAN_FRONTEND=noninteractive
#copy model weights
COPY model_weights /models
COPY model_source .
COPY run.py .
CMD python3 run.py

requirements.txt

Copy
Copied!

            
            --extra-index-url https://pypi.ngc.nvidia.com
opencv-python-headless
pycocotools
matplotlib
torch==2.1.0
nvidia-pytriton==0.3.0
numpy

This docker file will do the following:

Copy in model source code
Copy in the model weights
Installs requirements.txt python dependencies and nvidia-pytriton
Sets run command to start pytriton to serve the model

Build the Docker image

Open a terminal or command prompt.
Navigate to the my_model directory.
Run the following command to build the Docker image:

Copy
Copied!

            
            docker build -t my_model_image .

Replace my_model_image with the desired name for your Docker image.

Use the Docker image and upload to NGC

Currently, NVCF only supports containers hosted in NGC Private Registry. For detailed instructions on uploading a container to NGC please see Uploading an NVIDIA Container Image.

Note

NGC Private Registry has size constraints on layers, images, models and resources.

Tag the Docker image:

Copy
Copied!

            
            docker tag my_model_image nvcr.io/[ngc_org]/[ngc_team]/echo:latest/my_model_image:latest

Log in to NGC:

Copy
Copied!

            
            docker login nvcr.io

Enter your credentials

Push the Docker image to NGC:

Copy
Copied!

            
            docker push nvcr.io/[ngc_org]/[ngc_team]/my_model_image:latest

Creating a Function with a Custom Container

Note that the “models” array is omitted from the request.

If no port is specified for the container, it will default to 8000. If you want to override the port, add the parameter inferencePort with the desired port number.

Copy
Copied!

            
            curl -X 'POST' \
  'https://api.nvcf.nvidia.com/v2/nvcf/functions' \
  -H 'Authorization: Bearer <Token>' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
"name": "echo_function",
"inferenceUrl": "/echo",
"containerImage": "nvcr.io/qtfpt1h06ieu/c-tryhhjju67hjgjf/echo:latest",
"apiBodyFormat": "CUSTOM"
}'

Function Creation with Helmchart

Prerequisites

A service must be included as part of the helm chart. The name of this service in your helm chart should be supplied by setting helmChartServiceName during function definition, see below. This allows the util container to communicate and make inference requests to the mini service entry point. Please note, service port defined should be used as inference port which is supplied during function creation, otherwise the util container will not be able to reach to mini service.

Secret Management

For pulling containers defined as part of the helm chart from NGC, a new value named ngcImagePullSecretName needs to be defined in the chart. The value is referred in deployment spec as spec.imagePullSecrets.name of pods in chart. Please note containers defined in the helm chart should be in the same NGC org and team that the helm chart itself is being pulled from.

How To

Upload your helm chart to NGC
Ensure adherence to the helm chart prerequisite (listed above).
Create your function
- Include the following additional parameter in the function definition
  - helmChart
  - helmChartServiceName
- The helmChart property in the function definition is an optional field. It should be a URL hosted by the NGC model registry pointing to the helm chart that will deploy the mini service. Please note, this helm chart URL should be accessible to the NGC org in which function will eventually be deployed in.
- The helmChartServiceName is a required field only when helmChart property is supplied during function definition. It is used for checking if the mini service is ready for inference and is scraped for function metrics. At this time, templatized service names are not supported.
- Here is an example:
  Copy
  
  Copied!
  
  curl -X 'POST' \ 'https://api.nvcf.nvidia.com/v2/nvcf/functions' \ -H 'Authorization: Bearer <Token>' \ -H 'accept: application/json' \ -H 'Content-Type: application/json' \ -d '{ "name": "function_name", "inferenceUrl": "v2/models/model_name/versions/model_version/infer", "inferencePort": 8001, "helmChart": "https://helm.ngc.nvidia.com/plq9i6ygkfzp/charts/inference-test-1.0.tgz", "helmChartServiceName": "service_name", "apiBodyFormat": "CUSTOM" }'

Please note, for gRPC based NVCF functions, set "inferenceURL" : "/gRPC". This signals NVCF and workers are using gRPC protocol and the function is not expected to have /gRPC endpoint exposed for inferencing requests. Here’s the inference-test-2.3.tgz helm chart example as a reference.

Proceed with function deployment and invocation.

When using helm charts, the following limitations needs to be taken into consideration

Helm Chart Overrides

To override keys in your helm chart values.yml, you can provide the configuration parameter and supply corresponding key value pair in JSON format which you would like to be overridden when function is deployed.

Example helm chart override

Copy
Copied!

            
            curl -X 'POST' \
 'https://api.nvcf.nvidia.com/v2/nvcf/deployments/functions/fe6e6589-12bb-423a-9bf6-8b9d028b8bf4/versions/fe6e6589-12bb-423a-9bf6-8b9d028b8bf4' \
 -H 'Authorization: Bearer <Token>' \
 -H 'accept: application/json' \
 -H 'Content-Type: application/json' \
 -d '{
"deploymentSpecifications": [{
"gpu": "L40",
"backend": "OCI",
"maxInstances": 2,
"minInstances": 1
"configuration": {
"key_one": "<value>",
"key_two": { "key_two_subkey_one": "<value>", "key_two_subkey_two": "<value>" }
...
},
{
"gpu": "T10",
"backend": "GFN",
"maxInstances": 2,
"minInstances": 1
}]
}'

Function Versioning

When you first register a function, it will have an initial version ID created. You can create additional versions of this function by specifying other models/containers/helm charts to use. Here is a sample API call:

Copy
Copied!

            
            curl -X 'POST' \
'https://api.nvcf.nvidia.com/v2/nvcf/functions/1ccd6b12-1ee0-44a5-bfad-1d3630213418/versions' \
  -H 'accept: application/json' \
  -H 'Authorization: Bearer <Token>'
  -d '{
"name": "echo_function",
"inferenceUrl": "/echo",
"containerImage": "nvcr.io/qtfpt1h06ieu/c-tryhhjju67hjgjf/echo:latest",
"apiBodyFormat": "CUSTOM"
}'

Multiple function versions allow for different deployment configurations for each version while still be accessible through a single function endpoint. Deployments will be discussed in the following section. Multiple functions versions can also be used to deployed to support A/B testing.

NOTE: Function versioning should only be used if the APIs between the various version are intercompatable. Different APIs should be created as new Functions.

Listing Functions

This is used to list the available functions that can be run.

Copy
Copied!

            
            curl -X 'GET' \
  'https://api.nvcf.nvidia.com/v2/nvcf/functions' \
  -H 'Authorization: Bearer <Token>' \
  -H 'accept: application/json'

Listing Function Versions

This is used to list the versions of a specific Function ID.

Copy
Copied!

            
            curl -X 'GET' \
  'https://api.nvcf.nvidia.com/v2/nvcf/functions/fe6e6589-12bb-423a-9bf6-8b9d028b8bf4/versions' \
  -H 'Authorization: Bearer <Token>' \
  -H 'accept: application/json'

Retrieve Function Version Details

This is used to list details of a specific Function version.

Copy
Copied!

            
            curl -X 'GET' \
  'https://api.nvcf.nvidia.com//v2/nvcf/functions/2aca7eb1-1351-4072-a1ba-859abf893325/versions/f0620899-fdd5-4860-a619-149315188660' \
  -H 'Authorization: Bearer <Token>' \
  -H 'accept: application/json'

Public Functions

Functions marked as public are visible in the list_functions response for all Cloud Functions users.
You can filter these out if you wish using your NCAID.

Deleting a Function Version

Use both the Function ID and Function Version ID to delete a Function version.

Copy
Copied!

            
            curl -X 'DELETE' \
  'https://api.nvcf.nvidia.com/v2/nvcf/functions/fe6e6589-12bb-423a-9bf6-8b9d028b8bf4/versions/fe6e6589-12bb-423a-9bf6-8b9d028b8bf4' \
  -H 'Authorization: Bearer <Token>' \
  -H 'accept: application/json'

Function Deployment

To activate the function, it must be deployed as a Function version. This action requests the creation of worker pods to process requests for the invocation of the function. Once worker pods are successfully created, the status of the Function will transition to ACTIVE. If all worker pods fail to launch, the status will change to ERROR.

Copy
Copied!

            
            curl -X 'POST' \
  'https://api.nvcf.nvidia.com/v2/nvcf/deployments/functions/fe6e6589-12bb-423a-9bf6-8b9d028b8bf4/versions/fe6e6589-12bb-423a-9bf6-8b9d028b8bf4' \
  -H 'Authorization: Bearer <Token>' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
"deploymentSpecifications": [
{
"gpu": "L40",
"backend": "OCI",
"maxInstances": 2,
"minInstances": 1
},
{
"gpu": "T10",
"backend": "GFN",
"maxInstances": 2,
"minInstances": 1
}
]
}'

Each function version can have a different deployment configuration allowing for heterogenous computing infrastucure to be used across a single function endpoint.

Delete Function Version Deployment

To delete a Function version deployment, you supply the Function ID and version ID.

Copy
Copied!

            
            curl -X 'DELETE' \
  'https://api.nvcf.nvidia.com/v2/nvcf/functions/fe6e6589-12bb-423a-9bf'

Debugging

My function is stuck in the deploying state. What do I do?

Depending on the size of your containers and models, it usually takes up to 30 minutes for your function to deploy, although durations up to 2 hours are permitted. If you believe your function should have deployed already, or if it has entered an error state, review the logs to understand what happened.

In some cases, there may not be enough capacity available to fulfill your deployment. Try reducing the number of instances you are requesting or changing the GPU/instance type used by your function.

I’m getting errors when invoking my function. What do I do?

Please review the error message and update your container or model as required. If the error message is emitting from your inference container, consider adding further logs in the container and redeploying to troubleshoot.

Common Deployment Failures

Failure Type	Description
Function configuration problems	This occurs due to incorrect inference or health endpoints and ports defined, causing container to be marked unhealthy. Try the deployment validation tool on the container locally to rule out configuration issues.
Inadequate capacity for the chosen cluster	This will usually be indicated in the deployment failure error message in the UI. Try reducing the number of instances you are requesting or changing the GPU/instance type used by your function.
Container in restart loop	This will be indicated in the inference container logs (if your container is configured to emit logs) and is fixed by debugging and updating your inferencing container code.

Common Function Invocation Failures

Failure Type	Description
Invocation response returning 4xx or 5xx status code	Check the “type” of the error message response, if the “type” includes “worker-service” or “container” this indicates the error is coming from your inference container. Please check the Open API Docs for other possible status code failure reasons in case where they are not generated from your inference container.
Invocation request taking long to get a result	Check the capacity of your function using the metrics UI, to see if your function is queuing. Consider instrumenting your container with additional metrics to your chosen monitoring solution for further debugging - NVCF containers allow public egress. Set NVCF-POLL-SECONDS header to 300 (maximum) to wait for a sync response for up to 5 min to rule out errors in your polling logic.
Invocation response returning 401 or 403	This indicates that the caller is unauthorized, ensure Authorization header is correct per the documentation.
Container OOM	This is difficult to detect without instrumenting your container with additional metrics, unless your container is emitting logs that indicate out of memory. We recommend profiling the memory usage locally. For testing locally and in the function, you can look at a profile of the memory allocation using this guide.

Function Lifecycle Endpoints

Function Management

The table below provides an overview of the Function lifecycle API endpoints and their respective usages.

Name	Method	Endpoint	Usage
Register Function	POST	`/v2/nvcf/functions`	Creates a new Function.
Register Function Version	POST	`/v2/nvcf/functions/{functionId}/versions`	Creates a new version of a Function.
Delete Function Version	DELETE	`/v2/nvcf/functions/{functionId}/versions/{functionVersionId}`	Deletes a Function specified by its ID.
List Functions	GET	`/v2/nvcf/functions`	Retrieves a list of functions associated with the account.
List Function Versions	GET	`/v2/nvcf/functions/{functionId}/versions`	Retrieves a list of versions for a specific Function.
Retrieve Function Details	GET	`/v2/nvcf/functions/{functionId}/versions/{functionVersionId}`	Retrieves details of a specific Function version.
Create Function Version Deployment	POST	`/v2/nvcf/deployments/functions/{functionId}/versions/{functionVersionId}`	Initiates the deployment process for a Function version on worker nodes.
Delete Function Version Deployment	DELETE	`/v2/nvcf/deployments/functions/{functionId}/versions/{functionVersionId}`	Initiates the undeployment process for a Function version.
Retrieve Function Version Deployment	GET	`/v2/nvcf/deployments/functions/{functionId}/versions/{functionVersionId}`	Retrieves details of a specific Function version deployment.
Update Function Version Deployment	PUT	`/v2/nvcf/deployments/functions/{functionId}/versions/{functionVersionId}`	Updates the configuration of a Function version deployment.

Function Metadata

When using the NVCF API to create a function, it’s possible to specify a function description and a list of tags as strings as part of the function creation request body. This metadata is then returned in all responses that include the function definition. This is an API only feature at this time. Please see the Open API Docs for more information.

Function Invocation

The table below provides an overview of the Function invocation API endpoints and their respective usages.

Name	Method	Endpoint	Usage
Invoke Function	POST	`v2/nvcf/pexec/functions/{functionId}`	Invokes the specified Function to execute the job and returns the results, if available. NVCF randomly selects one of the active versions of the specified Function to execute the submitted job. Avoid making a GET request to obtain result if the original POST request returns a 200 response.
Invoke Function Version	POST	`v2/nvcf/pexec/functions/{functionId}/versions/{functionVersionId}`	Invokes the specified version under the specified Function to execute the job and returns the results. Avoid making a GET request to obtain result if the original POST request returns a 200 response.
Get Function Invocation Status	GET	`/v2/nvcf/pexec/status/{invocationRequestId}`	Used to poll for results of a job when 202 is returned. Avoid making this request to obtain result if the original POST request returns a 200 response.

Note

Result can be obtained just once either via the original POST request or via a subsequent GET request.

If the result was included in the original POST request, then the status will be 200. In that case, any subsequent attempts to obtain the result using the GET request will result in 404.

If the original POST request responds with 202(i.e. result is pending), then result should be obtained using the GET request. GET request can respond with either 202(i.e. result is pending) or 200(i.e. result is ready). Once GET request responds with 200, then any subsequent attempts to obtain the result using the GET request will result in 404.

Asset Management

Name	Method	Endpoint	Usage
Create Asset	POST	`/v2/nvcf/assets`	Creates an asset id and a corresponding pre-signed URL to upload (file).
List Assets	GET	`/v2/nvcf/assets`	Returns a list of assets associated with the account.
Delete Asset	DELETE	`/v2/nvcf/assets/{nvcf_asset_id}`	Deletes an asset using the specified asset id.

Visibility

Name	Method	Endpoint	Usage
Get Queue Length for Function id	GET	`/v2/nvcf/queues/functions/{functionId}`	Returns a list containing a single element with corresponding queue length for the specified Function.
Get Queue Length for Version id	GET	`/v2/nvcf/queues/functions/{functionId}/versions/{functionVersionId}`	Returns a list containing a single element with corresponding queue length for the specified Function version id.
Get Available GPUs	GET	`/v2/nvcf/supportedGpus`	Returns a list of GPU types you have access too.
Get Queue Position for Request id	GET	`/v2/nvcf/queues/{requestId}/position`	Returns estimated position in queue, up to 1000, for a specific request id of a function invocation request.

Available Container Variables

The following is a reference of available variables via the headers of the invocation message (auto-populated by NVCF), accessible within the container.

For examples of how to extract and use some of these variables, see NVCF Container Helper Functions.

Name	Description
NVCF-REQID	Request ID for this request.
NVCF-SUB	Message subject.
NVCF-NCAID	Function’s organization’s NCA ID.
NVCF-FUNCTION-NAME	Function name.
NVCF-FUNCTION-ID	Function ID.
NVCF-FUNCTION-VERSION-ID	Function version ID.
NVCF-ASSET-DIR	Asset directory path. Not available for helm deployments.
NVCF-LARGE-OUTPUT-DIR	Large output directory path.
NVCF-MAX-RESPONSE-SIZE-BYTES	Max response size in bytes for the function.
NVCF-NSPECTID	NVIDIA reserved variable.
NVCF-BACKEND	Backend or “Cluster Group” the function is deployed on.
NVCF-INSTANCETYPE	Instance type the function is deployed on.
NVCF-REGION	Region or zone the function is deployed in.
NVCF-ENV	Spot environment if deployed on spot instances.

Logging and Metrics

This section gives an overview of available metrics and logs within the Cloud Functions UI. Note that it is possible to emit logs, metrics, analytics etc. to any 3rd party from within your container and recommended for full observability and monitoring.

Emit and View Inference Container Logs

View inference container logs in the Cloud Functions UI via the “Logs” tab in the function details page. To get here, click any function version from the “Functions” list and click “View Details” on the side panel to the right.

Logs are currently available with up to 48 hours history, with the ability to view as expanded rows for scanning, or as a “window” view for ease of copying and pasting.

Warning

Note as a prerequisite, your inference container will have to be instrumented to emit logs. This is highly recommended.

How to Add Logs to Your Inference Container

Here is an example of adding NVCF compatible logs. The helper function for logging below, along with other helper functions, can be imported from the Helper Functions repository.

Copy
Copied!

            
             import logging

 def get_logger() -> logging.Logger:
     """
gets a Logger that logs in a format compatible with NVCF
:return: logging.Logger
"""
     sys.stdout.reconfigure(encoding="utf-8")
     logging.basicConfig(
         level=logging.INFO,
         format="%(asctime)s[%(levelname)s] [INFERENCE]%(message)s",
         handlers=[logging.StreamHandler(sys.stdout)],
     )
     logger = logging.getLogger(__name__)
     return logger

 class MyServer:

     def __init__(self):
         self.logger = get_logger()

     def _infer_fn(self, request):
         self.logger.info("Got a request!")

View Function Metrics

NVCF exposes the following metrics by default.

Instance counts (current, min and max)
Invocation activity and queue depth
Total invocation count, success rate and failure count
Average inference time

Metrics are viewable upon clicking any function from the “Functions” list page. The function overview page will display aggregated values across all function versions.

When clicking into a function version’s details page, you will then see metrics for this specific function version.

Warning

There may be up to a 5 minute delay on metric ingestion. Any timeseries queries within the page are aggregated on 5 minute intervals with a step set to show 500 data points. All stat queries are based on the total selected time period and reduced to either show the latest total value or a mean value.

Supporting Partial Response

Below are instructions on setting up output directories and efficiently tracking and communicating progress using the utils container.

Setting Up the Output Directory

The utils container automatically configures the output directory for you. To access the path, simply read the NVCF-LARGE-OUTPUT-DIR header. NVCF-LARGE-OUTPUT-DIR points to the directory for that particular requestId.

Writing Large Outputs

When your Custom BLS generates large outputs, save them temporarily with the “*.partial” extension inside the NVCF-LARGE-OUTPUT-DIR directory. For instance, if you’re writing an image, name it image1.partial.

Finalizing Outputs

Once the writing of the output file is complete, rename it from “*.partial” to its appropriate extension. Continuing with our example, rename image1.partial to image1.jpg.

Handling Progress Messages

The utils container actively observes the output directory for a file named ‘progress’. This file can be used to communicate progress and partial responses back to the caller.

Structure of the Progress File

This file should contain well-formed JSON data.

Structure the JSON content as follows:

Copy
Copied!

            
            {
   "id": "<requestId>",
   "progress": 50,
   "partialResponse": {
      "exampleKey": "Insert any well-formed JSON here, but ensure its size is less than 250K"
   }
}

Replace <requestId> with the actual request id if it’s present. Modify the progress integer as needed, ranging from 0 (just started) to 100 (fully complete). Within partialResponse, insert any JSON content you want to send as a partial response, making sure it’s smaller than 250KB.

Transferring Data to the Utils Container

Once the output files and progress file are correctly set up in the output directory under the correct request id, the utils container will automatically detect them. The utils container will then send these as a “progress” message.

Best Practices

Always use the “.partial” extension to avoid sending partial or incomplete data.
Rename to the final extension only when the writing process is fully complete.
Ensure your progress file remains under 250KB to maintain efficiency and avoid errors.