Function Management

The lifecycle of a Function in any of the following states:

  • ACTIVE - If at least one worker node is active. Only when a Function is ACTIVE it can be invoked.

  • ERROR - If all the worker nodes associated with the Function are in an ERROR state.

  • INACTIVE - When a Function is created, it is INACTIVE. Also, when a Function is undeployed, the state is changed from ACTIVE to INACTIVE.

  • DEPLOYING - When the Function is being deployed and the instances or Workers are still coming up.

papi-deploy.png

Functions can be created in one of three ways:

  1. A Triton Inference Server compatible Model repository

  2. A Docker Image

  3. A Helm chart

Function Creation with your own Model

Bring Your Own Model Overview

This is a guide to enable model owners to make use of Triton Inference Server to serve a model within NVIDIA’s Cloud Functions and NGC Private Registry.

A single model can be easily deployed leveraging Triton’s Auto-Generated Model Configuration functionality; simply upload the model and the inputs will be automatically discovered and deployed as part of the endpoint. However, more complicated configurations are supported as well through manually specifying the configuration or having multiple models controlled inside of an overall “ensemble” or through backend logic scripting (BLS)

When using BYOM within Cloud Functions, you will need to take into consideration the following limitations:

  • Only a HTTP interface is exposed

  • The service is not stateful

  • Must load models at start time

Creating the Model

As described above, a single model can be deployed and the configuration will be autogenerated. However, if a more complicated setup is required a complete model configuration will need to be created. Please see the Triton Model Configuration documentation to learn more.

Typically we recommend using Triton’s Python model backend as one if not the only model for several reasons:

  • 90% of use cases have no impact on performance

  • Faster development

  • You can call out to other backends

  • Pre or post-processing within

Refer to the model.py example to develop your model and refer to the config.pbtxt example and create the file with the necessary configuration for your model.

If you are deploying on NVCF and using a model-only function, the input sent should not exceed 5MB. For container-based functions using assets as input, the asset should be read from a specific path. The python library NVCF Container Helpers is available to assist with common tasks like these.

There is also a Stable Diffusion based example available under examples/byom/sd_txt2img that demonstrates

Here are some other examples of how to create more complex workflows within Triton:

Uploading the Model

For detailed instructions on uploading a model to NGC please see Uploading a New NGC Model Version Using the NGC CLI

Use the Model repository structure to NGC:

To upload your model to the NVIDIA GPU Cloud (NGC) registry, follow these steps:

  1. Confirm that the model file is present in the correct directory.

  2. Upload the model to your NGC private registry by running the following command (note rnwu0zzwflg6 is the org ID found under Organization -> Profile):

Copy
Copied!
            

ngc registry model upload-version rnwu0zzwflg6/bis-test-2:v1 --gpu-model 'GV-100' --source ./kngo-test-customization2_v1/gpt2b_ptuning.llmservice.nemo

  1. After the upload is complete, you will receive an upload summary confirming the status:

Copy
Copied!
            

Model ID: bis-test-2[version=v1] Upload status: Completed ...

Creating a Function with Models Only

Note that the “containerImage” is omitted from the request. This Function will use the Triton Inference Server. Model-only Function uses the Predict Protocol - Version 2 for the “requestBody” (input & output).

Copy
Copied!
            

curl -X 'POST' \ 'https://api.nvcf.nvidia.com/v2/nvcf/functions' \ -H 'Authorization: Bearer <Token>' \ -H 'accept: application/json' \ -H 'Content-Type: application/json' \ -d '{ "name": "simple_int8_1", "inferenceUrl": "v2/models/simple_int8/infer", "models": [ { "name": "simple_int8", "version": "1", "uri": "v2/org/cf/team/myteam/models/simple_int8/versions/1/zip" } ], "apiBodyFormat": "PREDICT_V2" }'

Function Creation with your Docker Container

Bring Your Own Container Overview

This is a guide to enable users to build a Docker that will work within NVIDIA Cloud Functions.

Samples can be found here <https://github.com/NVIDIA/nv-cloud-function-helpers/tree/main/examples>

Note: A health check is required by NVCF in order to deploy the function.

First Method: Building a Triton Inference Server Container

NVCF is designed to working natively with Triton Inference Server based containers, including leveraging metrics and health checks from the server.

The pre-built Triton Docker images can be found within NGC’s Container catalog. A minimum version of 23.04 (2.33.0) is required.

When setting the Docker Image’s run command to start tritonserver the following command options are mandatory:

Copy
Copied!
            

CMD tritonserver --model-repository=${YOUR_PATH_HERE} --http-header-forward-pattern NVCF-.*

Once the Docker image is built and ready, it can be uploaded to NGC:

  1. Tag the Docker image:

    Copy
    Copied!
                

    docker tag my_model_image nvcr.io/[ngc_org]/[ngc_team]/echo:latest/my_model_image:latest

  2. Log in to NGC:

    Copy
    Copied!
                

    docker login nvcr.io

    • Enter your credentials

  3. Push the Docker image to NGC:

    Copy
    Copied!
                

    docker push nvcr.io/[ngc_org]/[ngc_team]/echo:latest/my_model_image:latest

Second Method: PyTriton

NVIDIA’s PyTriton is a Python native solution of Triton inference server that works natively with NVCF. A minimum version of 0.3.0 is required.

Create the ``requirements.txt`` file:

  • This file should list the Python dependencies required for your model.

  • Add nvidia-pytriton to your requirements.txt file.

Create the ``run.py`` file:

  1. Your run.py file (or similar python file) needs to define a PyTriton model.

  2. This involves importing your model dependencies, creating a PyTritonServer class with an __init__ function, an _infer_fn function and a run function that serves the inference_function, defining the model name, the inputs and the outputs along with optional configuration.

Here is an example of a run.py file:

run.py

Copy
Copied!
            

import numpy as np from pytriton.model_config import ModelConfig, Tensor from pytriton.triton import Triton, TritonConfig import time .... class PyTritonServer: """triton server for timed_sleeper""" def __init__(self): # basically need to accept image, mask(PIL Images), prompt, negative_prompt(str), seed(int) self.model_name = "timed_sleeper" def _infer_fn(self, requests): responses = [] for req in requests: req_data = req.data sleep_duration = numpy_array_to_variable(req_data.get("sleep_duration")) # deal with header dict keys being lowerscale request_parameters_dict = uppercase_keys(req.parameters) time.sleep(sleep_duration) responses.append({"sleep_duration": np.array([sleep_duration])}) return responses def run(self): """run triton server""" with Triton( config=TritonConfig( http_header_forward_pattern="NVCF-*", # this is required http_port=8000, grpc_port=8001, metrics_port=8002, ) ) as triton: triton.bind( model_name="timed_sleeper", infer_func=self._infer_fn, inputs=[ Tensor(name="sleep_duration", dtype=np.uint32, shape=(1,)), ], outputs=[Tensor(name="sleep_duration", dtype=np.uint32, shape=(1,))], config=ModelConfig(batching=False), ) triton.serve() if __name__ == "__main__": server = PyTritonServer() server.run()


Build the Dockerfile

  1. Create a file named Dockerfile in your model directory.

  2. You can use containers like NVIDIA CUDA, Pytorch or TensorRT as your base container. They can be downloaded from the NGC Catalog.

  3. Make sure to install your Python requirements in your Dockerfile.

  4. Copy in your model source code, and model weights unless you plan to host them in NGC Registry.

Here is a sample Dockerfile and requirements.txt:

Dockerfile

Copy
Copied!
            

FROM nvcr.io/nvidia/cuda:12.1.1-devel-ubuntu22.04 RUN apt-get update && apt-get install -y \ git \ python3 \ python3-pip \ python-is-python3 \ libsm6 \ libxext6 \ libxrender-dev \ curl \ && rm -rf /var/lib/apt/lists/* WORKDIR /workspace/ # install requirements file COPY requirements.txt requirements.txt RUN pip install --no-cache-dir --upgrade pip RUN pip install --no-cache-dir -r requirements.txt ENV DEBIAN_FRONTEND=noninteractive #copy model weights COPY model_weights /models COPY model_source . COPY run.py . CMD python3 run.py


requirements.txt

Copy
Copied!
            

--extra-index-url https://pypi.ngc.nvidia.com opencv-python-headless pycocotools matplotlib torch==2.1.0 nvidia-pytriton==0.3.0 numpy


This docker file will do the following:

  • Copy in model source code

  • Copy in the model weights

  • Installs requirements.txt python dependencies and nvidia-pytriton

  • Sets run command to start pytriton to serve the model

Build the Docker image

  1. Open a terminal or command prompt.

  2. Navigate to the my_model directory.

  3. Run the following command to build the Docker image:

Copy
Copied!
            

docker build -t my_model_image .

Replace my_model_image with the desired name for your Docker image.

Use the Docker image and upload to NGC

Currently, NVCF only supports containers hosted in NGC Private Registry. For detailed instructions on uploading a container to NGC please see Uploading an NVIDIA Container Image.

Note

NGC Private Registry has size constraints on layers, images, models and resources.

  1. Tag the Docker image:

    Copy
    Copied!
                

    docker tag my_model_image nvcr.io/[ngc_org]/[ngc_team]/echo:latest/my_model_image:latest

  2. Log in to NGC:

    Copy
    Copied!
                

    docker login nvcr.io

    • Enter your credentials

  3. Push the Docker image to NGC:

    Copy
    Copied!
                

    docker push nvcr.io/[ngc_org]/[ngc_team]/my_model_image:latest

Creating a Function with a Custom Container

Note that the “models” array is omitted from the request.

If no port is specified for the container, it will default to 8000. If you want to override the port, add the parameter inferencePort with the desired port number.

Copy
Copied!
            

curl -X 'POST' \ 'https://api.nvcf.nvidia.com/v2/nvcf/functions' \ -H 'Authorization: Bearer <Token>' \ -H 'accept: application/json' \ -H 'Content-Type: application/json' \ -d '{ "name": "echo_function", "inferenceUrl": "/echo", "containerImage": "nvcr.io/qtfpt1h06ieu/c-tryhhjju67hjgjf/echo:latest", "apiBodyFormat": "CUSTOM" }'

Function Creation with Helmchart

Prerequisites

  • A service must be included as part of the helm chart. The name of this service in your helm chart should be supplied by setting helmChartServiceName during function definition, see below. This allows the util container to communicate and make inference requests to the mini service entry point. Please note, service port defined should be used as inference port which is supplied during function creation, otherwise the util container will not be able to reach to mini service.

Secret Management

  • For pulling containers defined as part of the helm chart from NGC, a new value named ngcImagePullSecretName needs to be defined in the chart. The value is referred in deployment spec as spec.imagePullSecrets.name of pods in chart. Please note containers defined in the helm chart should be in the same NGC org and team that the helm chart itself is being pulled from.

How To

  • Upload your helm chart to NGC

  • Ensure adherence to the helm chart prerequisite (listed above).

  • Create your function

    • Include the following additional parameter in the function definition

      • helmChart

      • helmChartServiceName

    • The helmChart property in the function definition is an optional field. It should be a URL hosted by the NGC model registry pointing to the helm chart that will deploy the mini service. Please note, this helm chart URL should be accessible to the NGC org in which function will eventually be deployed in.

    • The helmChartServiceName is a required field only when helmChart property is supplied during function definition. It is used for checking if the mini service is ready for inference and is scraped for function metrics. At this time, templatized service names are not supported.

    • Here is an example:

      Copy
      Copied!
                  

      curl -X 'POST' \ 'https://api.nvcf.nvidia.com/v2/nvcf/functions' \ -H 'Authorization: Bearer <Token>' \ -H 'accept: application/json' \ -H 'Content-Type: application/json' \ -d '{ "name": "function_name", "inferenceUrl": "v2/models/model_name/versions/model_version/infer", "inferencePort": 8001, "helmChart": "https://helm.ngc.nvidia.com/plq9i6ygkfzp/charts/inference-test-1.0.tgz", "helmChartServiceName": "service_name", "apiBodyFormat": "CUSTOM" }'

Please note, for gRPC based NVCF functions, set "inferenceURL" : "/gRPC". This signals NVCF and workers are using gRPC protocol and the function is not expected to have /gRPC endpoint exposed for inferencing requests. Here’s the inference-test-2.3.tgz helm chart example as a reference.

  • Proceed with function deployment and invocation.

When using helm charts, the following limitations needs to be taken into consideration
  • For model and assets download is handled as part of the helm chart (customer logic), but download size is limited by the disk space on the VM (For GFN 100GB approximately and “bring your own cluster” this limit will vary)

  • Progress/partial response reporting is not supported, including any additional artifacts generated during inferencing. Consider opting for HTTP streaming or gRPC bidirectional support.

  • Supported k8s artifacts under Helm Chart Namespace (others will be rejected)

    • Deployment

    • Service

    • ServiceAccount

    • Role & RoleBindings

    • ConfigMaps

    • Secrets


Helm Chart Overrides

To override keys in your helm chart values.yml, you can provide the configuration parameter and supply corresponding key value pair in JSON format which you would like to be overridden when function is deployed.

Example helm chart override

Copy
Copied!
            

curl -X 'POST' \ 'https://api.nvcf.nvidia.com/v2/nvcf/deployments/functions/fe6e6589-12bb-423a-9bf6-8b9d028b8bf4/versions/fe6e6589-12bb-423a-9bf6-8b9d028b8bf4' \ -H 'Authorization: Bearer <Token>' \ -H 'accept: application/json' \ -H 'Content-Type: application/json' \ -d '{ "deploymentSpecifications": [{ "gpu": "L40", "backend": "OCI", "maxInstances": 2, "minInstances": 1 "configuration": { "key_one": "<value>", "key_two": { "key_two_subkey_one": "<value>", "key_two_subkey_two": "<value>" } ... }, { "gpu": "T10", "backend": "GFN", "maxInstances": 2, "minInstances": 1 }] }'


When you first register a function, it will have an initial version ID created. You can create additional versions of this function by specifying other models/containers/helm charts to use. Here is a sample API call:

Copy
Copied!
            

curl -X 'POST' \ 'https://api.nvcf.nvidia.com/v2/nvcf/functions/1ccd6b12-1ee0-44a5-bfad-1d3630213418/versions' \ -H 'accept: application/json' \ -H 'Authorization: Bearer <Token>' -d '{ "name": "echo_function", "inferenceUrl": "/echo", "containerImage": "nvcr.io/qtfpt1h06ieu/c-tryhhjju67hjgjf/echo:latest", "apiBodyFormat": "CUSTOM" }'

Multiple function versions allow for different deployment configurations for each version while still be accessible through a single function endpoint. Deployments will be discussed in the following section. Multiple functions versions can also be used to deployed to support A/B testing.

NOTE: Function versioning should only be used if the APIs between the various version are intercompatable. Different APIs should be created as new Functions.

Listing Functions

This is used to list the available functions that can be run.

Copy
Copied!
            

curl -X 'GET' \ 'https://api.nvcf.nvidia.com/v2/nvcf/functions' \ -H 'Authorization: Bearer <Token>' \ -H 'accept: application/json'

Listing Function Versions

This is used to list the versions of a specific Function ID.

Copy
Copied!
            

curl -X 'GET' \ 'https://api.nvcf.nvidia.com/v2/nvcf/functions/fe6e6589-12bb-423a-9bf6-8b9d028b8bf4/versions' \ -H 'Authorization: Bearer <Token>' \ -H 'accept: application/json'

Retrieve Function Version Details

This is used to list details of a specific Function version.

Copy
Copied!
            

curl -X 'GET' \ 'https://api.nvcf.nvidia.com//v2/nvcf/functions/2aca7eb1-1351-4072-a1ba-859abf893325/versions/f0620899-fdd5-4860-a619-149315188660' \ -H 'Authorization: Bearer <Token>' \ -H 'accept: application/json'

Public Functions

  • Functions marked as public are visible in the list_functions response for all Cloud Functions users.

  • You can filter these out if you wish using your NCAID.

Deleting a Function Version

Use both the Function ID and Function Version ID to delete a Function version.

Copy
Copied!
            

curl -X 'DELETE' \ 'https://api.nvcf.nvidia.com/v2/nvcf/functions/fe6e6589-12bb-423a-9bf6-8b9d028b8bf4/versions/fe6e6589-12bb-423a-9bf6-8b9d028b8bf4' \ -H 'Authorization: Bearer <Token>' \ -H 'accept: application/json'

To activate the function, it must be deployed as a Function version. This action requests the creation of worker pods to process requests for the invocation of the function. Once worker pods are successfully created, the status of the Function will transition to ACTIVE. If all worker pods fail to launch, the status will change to ERROR.

Copy
Copied!
            

curl -X 'POST' \ 'https://api.nvcf.nvidia.com/v2/nvcf/deployments/functions/fe6e6589-12bb-423a-9bf6-8b9d028b8bf4/versions/fe6e6589-12bb-423a-9bf6-8b9d028b8bf4' \ -H 'Authorization: Bearer <Token>' \ -H 'accept: application/json' \ -H 'Content-Type: application/json' \ -d '{ "deploymentSpecifications": [ { "gpu": "L40", "backend": "OCI", "maxInstances": 2, "minInstances": 1 }, { "gpu": "T10", "backend": "GFN", "maxInstances": 2, "minInstances": 1 } ] }'

Each function version can have a different deployment configuration allowing for heterogenous computing infrastucure to be used across a single function endpoint.

Delete Function Version Deployment

To delete a Function version deployment, you supply the Function ID and version ID.

Copy
Copied!
            

curl -X 'DELETE' \ 'https://api.nvcf.nvidia.com/v2/nvcf/functions/fe6e6589-12bb-423a-9bf'

My function is stuck in the deploying state. What do I do?

Depending on the size of your containers and models, it usually takes up to 30 minutes for your function to deploy, although durations up to 2 hours are permitted. If you believe your function should have deployed already, or if it has entered an error state, review the logs to understand what happened.

In some cases, there may not be enough capacity available to fulfill your deployment. Try reducing the number of instances you are requesting or changing the GPU/instance type used by your function.

I’m getting errors when invoking my function. What do I do?

Please review the error message and update your container or model as required. If the error message is emitting from your inference container, consider adding further logs in the container and redeploying to troubleshoot.

Common Deployment Failures

Failure Type

Description

Function configuration problems This occurs due to incorrect inference or health endpoints and ports defined, causing container to be marked unhealthy. Try the deployment validation tool on the container locally to rule out configuration issues.
Inadequate capacity for the chosen cluster This will usually be indicated in the deployment failure error message in the UI. Try reducing the number of instances you are requesting or changing the GPU/instance type used by your function.
Container in restart loop This will be indicated in the inference container logs (if your container is configured to emit logs) and is fixed by debugging and updating your inferencing container code.

Common Function Invocation Failures

Failure Type

Description

Invocation response returning 4xx or 5xx status code Check the “type” of the error message response, if the “type” includes “worker-service” or “container” this indicates the error is coming from your inference container. Please check the Open API Docs for other possible status code failure reasons in case where they are not generated from your inference container.
Invocation request taking long to get a result Check the capacity of your function using the metrics UI, to see if your function is queuing. Consider instrumenting your container with additional metrics to your chosen monitoring solution for further debugging - NVCF containers allow public egress. Set NVCF-POLL-SECONDS header to 300 (maximum) to wait for a sync response for up to 5 min to rule out errors in your polling logic.
Invocation response returning 401 or 403 This indicates that the caller is unauthorized, ensure Authorization header is correct per the documentation.
Container OOM This is difficult to detect without instrumenting your container with additional metrics, unless your container is emitting logs that indicate out of memory. We recommend profiling the memory usage locally. For testing locally and in the function, you can look at a profile of the memory allocation using this guide.

Function Management

The table below provides an overview of the Function lifecycle API endpoints and their respective usages.

Name

Method

Endpoint

Usage

Register Function POST /v2/nvcf/functions Creates a new Function.
Register Function Version POST /v2/nvcf/functions/{functionId}/versions Creates a new version of a Function.
Delete Function Version DELETE /v2/nvcf/functions/{functionId}/versions/{functionVersionId} Deletes a Function specified by its ID.
List Functions GET /v2/nvcf/functions Retrieves a list of functions associated with the account.
List Function Versions GET /v2/nvcf/functions/{functionId}/versions Retrieves a list of versions for a specific Function.
Retrieve Function Details GET /v2/nvcf/functions/{functionId}/versions/{functionVersionId} Retrieves details of a specific Function version.
Create Function Version Deployment POST /v2/nvcf/deployments/functions/{functionId}/versions/{functionVersionId} Initiates the deployment process for a Function version on worker nodes.
Delete Function Version Deployment DELETE /v2/nvcf/deployments/functions/{functionId}/versions/{functionVersionId} Initiates the undeployment process for a Function version.
Retrieve Function Version Deployment GET /v2/nvcf/deployments/functions/{functionId}/versions/{functionVersionId} Retrieves details of a specific Function version deployment.
Update Function Version Deployment PUT /v2/nvcf/deployments/functions/{functionId}/versions/{functionVersionId} Updates the configuration of a Function version deployment.

Function Metadata

When using the NVCF API to create a function, it’s possible to specify a function description and a list of tags as strings as part of the function creation request body. This metadata is then returned in all responses that include the function definition. This is an API only feature at this time. Please see the Open API Docs for more information.

Function Invocation

The table below provides an overview of the Function invocation API endpoints and their respective usages.

Name

Method

Endpoint

Usage

Invoke Function POST v2/nvcf/pexec/functions/{functionId} Invokes the specified Function to execute the job and returns the results, if available. NVCF randomly selects one of the active versions of the specified Function to execute the submitted job. Avoid making a GET request to obtain result if the original POST request returns a 200 response.
Invoke Function Version POST v2/nvcf/pexec/functions/{functionId}/versions/{functionVersionId} Invokes the specified version under the specified Function to execute the job and returns the results. Avoid making a GET request to obtain result if the original POST request returns a 200 response.
Get Function Invocation Status GET /v2/nvcf/pexec/status/{invocationRequestId} Used to poll for results of a job when 202 is returned. Avoid making this request to obtain result if the original POST request returns a 200 response.
Note

Result can be obtained just once either via the original POST request or via a subsequent GET request.

If the result was included in the original POST request, then the status will be 200. In that case, any subsequent attempts to obtain the result using the GET request will result in 404.

If the original POST request responds with 202(i.e. result is pending), then result should be obtained using the GET request. GET request can respond with either 202(i.e. result is pending) or 200(i.e. result is ready). Once GET request responds with 200, then any subsequent attempts to obtain the result using the GET request will result in 404.

Asset Management

Name

Method

Endpoint

Usage

Create Asset POST /v2/nvcf/assets Creates an asset id and a corresponding pre-signed URL to upload (file).
List Assets GET /v2/nvcf/assets Returns a list of assets associated with the account.
Delete Asset DELETE /v2/nvcf/assets/{nvcf_asset_id} Deletes an asset using the specified asset id.

Visibility

Name

Method

Endpoint

Usage

Get Queue Length for Function id GET /v2/nvcf/queues/functions/{functionId} Returns a list containing a single element with corresponding queue length for the specified Function.
Get Queue Length for Version id GET /v2/nvcf/queues/functions/{functionId}/versions/{functionVersionId} Returns a list containing a single element with corresponding queue length for the specified Function version id.
Get Available GPUs GET /v2/nvcf/supportedGpus Returns a list of GPU types you have access too.
Get Queue Position for Request id GET /v2/nvcf/queues/{requestId}/position Returns estimated position in queue, up to 1000, for a specific request id of a function invocation request.

The following is a reference of available variables via the headers of the invocation message (auto-populated by NVCF), accessible within the container.

For examples of how to extract and use some of these variables, see NVCF Container Helper Functions.

Name

Description

NVCF-REQID Request ID for this request.
NVCF-SUB Message subject.
NVCF-NCAID Function’s organization’s NCA ID.
NVCF-FUNCTION-NAME Function name.
NVCF-FUNCTION-ID Function ID.
NVCF-FUNCTION-VERSION-ID Function version ID.
NVCF-ASSET-DIR Asset directory path. Not available for helm deployments.
NVCF-LARGE-OUTPUT-DIR Large output directory path.
NVCF-MAX-RESPONSE-SIZE-BYTES Max response size in bytes for the function.
NVCF-NSPECTID NVIDIA reserved variable.
NVCF-BACKEND Backend or “Cluster Group” the function is deployed on.
NVCF-INSTANCETYPE Instance type the function is deployed on.
NVCF-REGION Region or zone the function is deployed in.
NVCF-ENV Spot environment if deployed on spot instances.

This section gives an overview of available metrics and logs within the Cloud Functions UI. Note that it is possible to emit logs, metrics, analytics etc. to any 3rd party from within your container and recommended for full observability and monitoring.

Emit and View Inference Container Logs

View inference container logs in the Cloud Functions UI via the “Logs” tab in the function details page. To get here, click any function version from the “Functions” list and click “View Details” on the side panel to the right.

function_details_logs-tab.png

Logs are currently available with up to 48 hours history, with the ability to view as expanded rows for scanning, or as a “window” view for ease of copying and pasting.

Warning

Note as a prerequisite, your inference container will have to be instrumented to emit logs. This is highly recommended.

How to Add Logs to Your Inference Container

Here is an example of adding NVCF compatible logs. The helper function for logging below, along with other helper functions, can be imported from the Helper Functions repository.

Copy
Copied!
            

import logging def get_logger() -> logging.Logger: """ gets a Logger that logs in a format compatible with NVCF :return: logging.Logger """ sys.stdout.reconfigure(encoding="utf-8") logging.basicConfig( level=logging.INFO, format="%(asctime)s[%(levelname)s] [INFERENCE]%(message)s", handlers=[logging.StreamHandler(sys.stdout)], ) logger = logging.getLogger(__name__) return logger class MyServer: def __init__(self): self.logger = get_logger() def _infer_fn(self, request): self.logger.info("Got a request!")

View Function Metrics

NVCF exposes the following metrics by default.

  • Instance counts (current, min and max)

  • Invocation activity and queue depth

  • Total invocation count, success rate and failure count

  • Average inference time

Metrics are viewable upon clicking any function from the “Functions” list page. The function overview page will display aggregated values across all function versions.

function_overview_metrics.png

When clicking into a function version’s details page, you will then see metrics for this specific function version.

function_details_metrics.png

Warning

There may be up to a 5 minute delay on metric ingestion. Any timeseries queries within the page are aggregated on 5 minute intervals with a step set to show 500 data points. All stat queries are based on the total selected time period and reduced to either show the latest total value or a mean value.

Below are instructions on setting up output directories and efficiently tracking and communicating progress using the utils container.

Setting Up the Output Directory

The utils container automatically configures the output directory for you. To access the path, simply read the NVCF-LARGE-OUTPUT-DIR header. NVCF-LARGE-OUTPUT-DIR points to the directory for that particular requestId.

Writing Large Outputs

When your Custom BLS generates large outputs, save them temporarily with the “*.partial” extension inside the NVCF-LARGE-OUTPUT-DIR directory. For instance, if you’re writing an image, name it image1.partial.

Finalizing Outputs

Once the writing of the output file is complete, rename it from “*.partial” to its appropriate extension. Continuing with our example, rename image1.partial to image1.jpg.

Handling Progress Messages

The utils container actively observes the output directory for a file named ‘progress’. This file can be used to communicate progress and partial responses back to the caller.

Structure of the Progress File

This file should contain well-formed JSON data.

Structure the JSON content as follows:

Copy
Copied!
            

{ "id": "<requestId>", "progress": 50, "partialResponse": { "exampleKey": "Insert any well-formed JSON here, but ensure its size is less than 250K" } }

Replace <requestId> with the actual request id if it’s present. Modify the progress integer as needed, ranging from 0 (just started) to 100 (fully complete). Within partialResponse, insert any JSON content you want to send as a partial response, making sure it’s smaller than 250KB.

Transferring Data to the Utils Container

Once the output files and progress file are correctly set up in the output directory under the correct request id, the utils container will automatically detect them. The utils container will then send these as a “progress” message.

Best Practices

  • Always use the “.partial” extension to avoid sending partial or incomplete data.

  • Rename to the final extension only when the writing process is fully complete.

  • Ensure your progress file remains under 250KB to maintain efficiency and avoid errors.

Previous API
Next Asset Management
© Copyright © 2024, NVIDIA Corporation. Last updated on May 13, 2024.