The lifecycle of a Function in any of the following states:
ACTIVE
- If at least one worker node is active. Only when a Function isACTIVE
it can be invoked.ERROR
- If all the worker nodes associated with the Function are in an ERROR state.INACTIVE
- When a Function is created, it isINACTIVE
. Also, when a Function is undeployed, the state is changed fromACTIVE
toINACTIVE
.DEPLOYING
- When the Function is being deployed and the instances or Workers are still coming up.
Functions can be created in one of three ways:
A Triton Inference Server compatible Model repository
A Docker Image
A Helm chart
Function Creation with your own Model
Bring Your Own Model Overview
This is a guide to enable model owners to make use of Triton Inference Server to serve a model within NVIDIA’s Cloud Functions and NGC Private Registry.
A single model can be easily deployed leveraging Triton’s Auto-Generated Model Configuration functionality; simply upload the model and the inputs will be automatically discovered and deployed as part of the endpoint. However, more complicated configurations are supported as well through manually specifying the configuration or having multiple models controlled inside of an overall “ensemble” or through backend logic scripting (BLS)
When using BYOM within Cloud Functions, you will need to take into consideration the following limitations:
Only a HTTP interface is exposed
The service is not stateful
Must load models at start time
Creating the Model
As described above, a single model can be deployed and the configuration will be autogenerated. However, if a more complicated setup is required a complete model configuration will need to be created. Please see the Triton Model Configuration documentation to learn more.
Typically we recommend using Triton’s Python model backend as one if not the only model for several reasons:
90% of use cases have no impact on performance
Faster development
You can call out to other backends
Pre or post-processing within
Refer to the model.py example to develop your model and refer to the config.pbtxt example and create the file with the necessary configuration for your model.
If you are deploying on NVCF and using a model-only function, the input sent should not exceed 5MB. For container-based functions using assets as input, the asset should be read from a specific path. The python library NVCF Container Helpers is available to assist with common tasks like these.
There is also a Stable Diffusion based example available under examples/byom/sd_txt2img that demonstrates
Here are some other examples of how to create more complex workflows within Triton:
Uploading the Model
For detailed instructions on uploading a model to NGC please see Uploading a New NGC Model Version Using the NGC CLI
Use the Model repository structure to NGC:
To upload your model to the NVIDIA GPU Cloud (NGC) registry, follow these steps:
Confirm that the model file is present in the correct directory.
Upload the model to your NGC private registry by running the following command (note rnwu0zzwflg6 is the org ID found under Organization -> Profile):
ngc registry model upload-version rnwu0zzwflg6/bis-test-2:v1 --gpu-model 'GV-100' --source ./kngo-test-customization2_v1/gpt2b_ptuning.llmservice.nemo
After the upload is complete, you will receive an upload summary confirming the status:
Model ID: bis-test-2[version=v1]
Upload status: Completed
...
Creating a Function with Models Only
Note that the “containerImage” is omitted from the request. This Function will use the Triton Inference Server. Model-only Function uses the Predict Protocol - Version 2 for the “requestBody” (input & output).
curl -X 'POST' \
'https://api.nvcf.nvidia.com/v2/nvcf/functions' \
-H 'Authorization: Bearer <Token>' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"name": "simple_int8_1",
"inferenceUrl": "v2/models/simple_int8/infer",
"models": [
{
"name": "simple_int8",
"version": "1",
"uri": "v2/org/cf/team/myteam/models/simple_int8/versions/1/zip"
}
],
"apiBodyFormat": "PREDICT_V2"
}'
Function Creation with your Docker Container
Bring Your Own Container Overview
This is a guide to enable users to build a Docker that will work within NVIDIA Cloud Functions.
Samples can be found here <https://github.com/NVIDIA/nv-cloud-function-helpers/tree/main/examples>
- Note: A health check is required by NVCF in order to deploy the function.
For HTTP based functionss, the endpoint specified must return a 200 code.
For gRPC requires the use of standard health check. See these docs for more info also gRPC Health Checking.
First Method: Building a Triton Inference Server Container
NVCF is designed to working natively with Triton Inference Server based containers, including leveraging metrics and health checks from the server.
The pre-built Triton Docker images can be found within NGC’s Container catalog. A minimum version of 23.04 (2.33.0) is required.
When setting the Docker Image’s run command to start tritonserver
the following command options are mandatory:
CMD tritonserver --model-repository=${YOUR_PATH_HERE} --http-header-forward-pattern NVCF-.*
Once the Docker image is built and ready, it can be uploaded to NGC:
Tag the Docker image:
docker tag my_model_image nvcr.io/[ngc_org]/[ngc_team]/echo:latest/my_model_image:latest
Log in to NGC:
docker login nvcr.io
Enter your credentials
Push the Docker image to NGC:
docker push nvcr.io/[ngc_org]/[ngc_team]/echo:latest/my_model_image:latest
Second Method: PyTriton
NVIDIA’s PyTriton is a Python native solution of Triton inference server that works natively with NVCF. A minimum version of 0.3.0 is required.
Create the ``requirements.txt`` file:
This file should list the Python dependencies required for your model.
Add nvidia-pytriton to your
requirements.txt
file.
Create the ``run.py`` file:
Your
run.py
file (or similar python file) needs to define a PyTriton model.This involves importing your model dependencies, creating a PyTritonServer class with an
__init__
function, an_infer_fn
function and arun
function that serves the inference_function, defining the model name, the inputs and the outputs along with optional configuration.
Here is an example of a run.py file:
run.py
import numpy as np
from pytriton.model_config import ModelConfig, Tensor
from pytriton.triton import Triton, TritonConfig
import time
....
class PyTritonServer:
"""triton server for timed_sleeper"""
def __init__(self):
# basically need to accept image, mask(PIL Images), prompt, negative_prompt(str), seed(int)
self.model_name = "timed_sleeper"
def _infer_fn(self, requests):
responses = []
for req in requests:
req_data = req.data
sleep_duration = numpy_array_to_variable(req_data.get("sleep_duration"))
# deal with header dict keys being lowerscale
request_parameters_dict = uppercase_keys(req.parameters)
time.sleep(sleep_duration)
responses.append({"sleep_duration": np.array([sleep_duration])})
return responses
def run(self):
"""run triton server"""
with Triton(
config=TritonConfig(
http_header_forward_pattern="NVCF-*", # this is required
http_port=8000,
grpc_port=8001,
metrics_port=8002,
)
) as triton:
triton.bind(
model_name="timed_sleeper",
infer_func=self._infer_fn,
inputs=[
Tensor(name="sleep_duration", dtype=np.uint32, shape=(1,)),
],
outputs=[Tensor(name="sleep_duration", dtype=np.uint32, shape=(1,))],
config=ModelConfig(batching=False),
)
triton.serve()
if __name__ == "__main__":
server = PyTritonServer()
server.run()
Build the Dockerfile
Create a file named
Dockerfile
in your model directory.You can use containers like NVIDIA CUDA, Pytorch or TensorRT as your base container. They can be downloaded from the NGC Catalog.
Make sure to install your Python requirements in your Dockerfile.
Copy in your model source code, and model weights unless you plan to host them in NGC Registry.
Here is a sample Dockerfile and requirements.txt:
Dockerfile
FROM nvcr.io/nvidia/cuda:12.1.1-devel-ubuntu22.04
RUN apt-get update && apt-get install -y \
git \
python3 \
python3-pip \
python-is-python3 \
libsm6 \
libxext6 \
libxrender-dev \
curl \
&& rm -rf /var/lib/apt/lists/*
WORKDIR /workspace/
# install requirements file
COPY requirements.txt requirements.txt
RUN pip install --no-cache-dir --upgrade pip
RUN pip install --no-cache-dir -r requirements.txt
ENV DEBIAN_FRONTEND=noninteractive
#copy model weights
COPY model_weights /models
COPY model_source .
COPY run.py .
CMD python3 run.py
requirements.txt
--extra-index-url https://pypi.ngc.nvidia.com
opencv-python-headless
pycocotools
matplotlib
torch==2.1.0
nvidia-pytriton==0.3.0
numpy
This docker file will do the following:
Copy in model source code
Copy in the model weights
Installs
requirements.txt
python dependencies andnvidia-pytriton
Sets run command to start pytriton to serve the model
Build the Docker image
Open a terminal or command prompt.
Navigate to the
my_model
directory.Run the following command to build the Docker image:
docker build -t my_model_image .
Replace my_model_image
with the desired name for your Docker image.
Use the Docker image and upload to NGC
Currently, NVCF only supports containers hosted in NGC Private Registry. For detailed instructions on uploading a container to NGC please see Uploading an NVIDIA Container Image.
NGC Private Registry has size constraints on layers, images, models and resources.
Tag the Docker image:
docker tag my_model_image nvcr.io/[ngc_org]/[ngc_team]/echo:latest/my_model_image:latest
Log in to NGC:
docker login nvcr.io
Enter your credentials
Push the Docker image to NGC:
docker push nvcr.io/[ngc_org]/[ngc_team]/my_model_image:latest
Creating a Function with a Custom Container
Note that the “models” array is omitted from the request.
If no port is specified for the container, it will default to 8000. If you want to override the port, add the parameter inferencePort
with the desired port number.
curl -X 'POST' \
'https://api.nvcf.nvidia.com/v2/nvcf/functions' \
-H 'Authorization: Bearer <Token>' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"name": "echo_function",
"inferenceUrl": "/echo",
"containerImage": "nvcr.io/qtfpt1h06ieu/c-tryhhjju67hjgjf/echo:latest",
"apiBodyFormat": "CUSTOM"
}'
Function Creation with Helmchart
Prerequisites
A service must be included as part of the helm chart. The name of this service in your helm chart should be supplied by setting
helmChartServiceName
during function definition, see below. This allows the util container to communicate and make inference requests to the mini service entry point. Please note, service port defined should be used as inference port which is supplied during function creation, otherwise the util container will not be able to reach to mini service.
Secret Management
For pulling containers defined as part of the helm chart from NGC, a new value named
ngcImagePullSecretName
needs to be defined in the chart. The value is referred in deployment spec asspec.imagePullSecrets.name
of pods in chart. Please note containers defined in the helm chart should be in the same NGC org and team that the helm chart itself is being pulled from.
How To
Upload your helm chart to NGC
Ensure adherence to the helm chart prerequisite (listed above).
Create your function
Include the following additional parameter in the function definition
helmChart
helmChartServiceName
The
helmChart
property in the function definition is an optional field. It should be a URL hosted by the NGC model registry pointing to the helm chart that will deploy the mini service. Please note, this helm chart URL should be accessible to the NGC org in which function will eventually be deployed in.The
helmChartServiceName
is a required field only whenhelmChart
property is supplied during function definition. It is used for checking if the mini service is ready for inference and is scraped for function metrics. At this time, templatized service names are not supported.Here is an example:
curl -X 'POST' \ 'https://api.nvcf.nvidia.com/v2/nvcf/functions' \ -H 'Authorization: Bearer <Token>' \ -H 'accept: application/json' \ -H 'Content-Type: application/json' \ -d '{ "name": "function_name", "inferenceUrl": "v2/models/model_name/versions/model_version/infer", "inferencePort": 8001, "helmChart": "https://helm.ngc.nvidia.com/plq9i6ygkfzp/charts/inference-test-1.0.tgz", "helmChartServiceName": "service_name", "apiBodyFormat": "CUSTOM" }'
Please note, for gRPC based NVCF functions, set "inferenceURL" : "/gRPC"
. This signals NVCF and workers are using gRPC protocol and the function is not expected to have /gRPC
endpoint exposed for inferencing requests. Here’s the inference-test-2.3.tgz helm chart example as a reference.
Proceed with function deployment and invocation.
- When using helm charts, the following limitations needs to be taken into consideration
For model and assets download is handled as part of the helm chart (customer logic), but download size is limited by the disk space on the VM (For GFN 100GB approximately and “bring your own cluster” this limit will vary)
Progress/partial response reporting is not supported, including any additional artifacts generated during inferencing. Consider opting for HTTP streaming or gRPC bidirectional support.
Supported k8s artifacts under Helm Chart Namespace (others will be rejected)
Deployment
Service
ServiceAccount
Role & RoleBindings
ConfigMaps
Secrets
Helm Chart Overrides
To override keys in your helm chart values.yml
, you can provide the configuration
parameter and supply corresponding key value pair in JSON format which you would like to be overridden when function is deployed.
Example helm chart override
curl -X 'POST' \
'https://api.nvcf.nvidia.com/v2/nvcf/deployments/functions/fe6e6589-12bb-423a-9bf6-8b9d028b8bf4/versions/fe6e6589-12bb-423a-9bf6-8b9d028b8bf4' \
-H 'Authorization: Bearer <Token>' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"deploymentSpecifications": [{
"gpu": "L40",
"backend": "OCI",
"maxInstances": 2,
"minInstances": 1
"configuration": {
"key_one": "<value>",
"key_two": { "key_two_subkey_one": "<value>", "key_two_subkey_two": "<value>" }
...
},
{
"gpu": "T10",
"backend": "GFN",
"maxInstances": 2,
"minInstances": 1
}]
}'
When you first register a function, it will have an initial version ID created. You can create additional versions of this function by specifying other models/containers/helm charts to use. Here is a sample API call:
curl -X 'POST' \
'https://api.nvcf.nvidia.com/v2/nvcf/functions/1ccd6b12-1ee0-44a5-bfad-1d3630213418/versions' \
-H 'accept: application/json' \
-H 'Authorization: Bearer <Token>'
-d '{
"name": "echo_function",
"inferenceUrl": "/echo",
"containerImage": "nvcr.io/qtfpt1h06ieu/c-tryhhjju67hjgjf/echo:latest",
"apiBodyFormat": "CUSTOM"
}'
Multiple function versions allow for different deployment configurations for each version while still be accessible through a single function endpoint. Deployments will be discussed in the following section. Multiple functions versions can also be used to deployed to support A/B testing.
NOTE: Function versioning should only be used if the APIs between the various version are intercompatable. Different APIs should be created as new Functions.
Listing Functions
This is used to list the available functions that can be run.
curl -X 'GET' \
'https://api.nvcf.nvidia.com/v2/nvcf/functions' \
-H 'Authorization: Bearer <Token>' \
-H 'accept: application/json'
Listing Function Versions
This is used to list the versions of a specific Function ID.
curl -X 'GET' \
'https://api.nvcf.nvidia.com/v2/nvcf/functions/fe6e6589-12bb-423a-9bf6-8b9d028b8bf4/versions' \
-H 'Authorization: Bearer <Token>' \
-H 'accept: application/json'
Retrieve Function Version Details
This is used to list details of a specific Function version.
curl -X 'GET' \
'https://api.nvcf.nvidia.com//v2/nvcf/functions/2aca7eb1-1351-4072-a1ba-859abf893325/versions/f0620899-fdd5-4860-a619-149315188660' \
-H 'Authorization: Bearer <Token>' \
-H 'accept: application/json'
Public Functions
Functions marked as public are visible in the list_functions response for all Cloud Functions users.
You can filter these out if you wish using your NCAID.
Deleting a Function Version
Use both the Function ID and Function Version ID to delete a Function version.
curl -X 'DELETE' \
'https://api.nvcf.nvidia.com/v2/nvcf/functions/fe6e6589-12bb-423a-9bf6-8b9d028b8bf4/versions/fe6e6589-12bb-423a-9bf6-8b9d028b8bf4' \
-H 'Authorization: Bearer <Token>' \
-H 'accept: application/json'
To activate the function, it must be deployed as a Function version. This action requests the creation of worker pods to process requests for the invocation of the function. Once worker pods are successfully created, the status of the Function will transition to ACTIVE. If all worker pods fail to launch, the status will change to ERROR.
curl -X 'POST' \
'https://api.nvcf.nvidia.com/v2/nvcf/deployments/functions/fe6e6589-12bb-423a-9bf6-8b9d028b8bf4/versions/fe6e6589-12bb-423a-9bf6-8b9d028b8bf4' \
-H 'Authorization: Bearer <Token>' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"deploymentSpecifications": [
{
"gpu": "L40",
"backend": "OCI",
"maxInstances": 2,
"minInstances": 1
},
{
"gpu": "T10",
"backend": "GFN",
"maxInstances": 2,
"minInstances": 1
}
]
}'
Each function version can have a different deployment configuration allowing for heterogenous computing infrastucure to be used across a single function endpoint.
Delete Function Version Deployment
To delete a Function version deployment, you supply the Function ID and version ID.
curl -X 'DELETE' \
'https://api.nvcf.nvidia.com/v2/nvcf/functions/fe6e6589-12bb-423a-9bf'
My function is stuck in the deploying state. What do I do?
Depending on the size of your containers and models, it usually takes up to 30 minutes for your function to deploy, although durations up to 2 hours are permitted. If you believe your function should have deployed already, or if it has entered an error state, review the logs to understand what happened.
In some cases, there may not be enough capacity available to fulfill your deployment. Try reducing the number of instances you are requesting or changing the GPU/instance type used by your function.
I’m getting errors when invoking my function. What do I do?
Please review the error message and update your container or model as required. If the error message is emitting from your inference container, consider adding further logs in the container and redeploying to troubleshoot.
Common Deployment Failures
Failure Type |
Description |
---|---|
Function configuration problems | This occurs due to incorrect inference or health endpoints and ports defined, causing container to be marked unhealthy. Try the deployment validation tool on the container locally to rule out configuration issues. |
Inadequate capacity for the chosen cluster | This will usually be indicated in the deployment failure error message in the UI. Try reducing the number of instances you are requesting or changing the GPU/instance type used by your function. |
Container in restart loop | This will be indicated in the inference container logs (if your container is configured to emit logs) and is fixed by debugging and updating your inferencing container code. |
Common Function Invocation Failures
Failure Type |
Description |
---|---|
Invocation response returning 4xx or 5xx status code | Check the “type” of the error message response, if the “type” includes “worker-service” or “container” this indicates the error is coming from your inference container. Please check the Open API Docs for other possible status code failure reasons in case where they are not generated from your inference container. |
Invocation request taking long to get a result | Check the capacity of your function using the metrics UI, to see if your function is queuing. Consider instrumenting your container with additional metrics to your chosen monitoring solution for further debugging - NVCF containers allow public egress. Set NVCF-POLL-SECONDS header to 300 (maximum) to wait for a sync response for up to 5 min to rule out errors in your polling logic. |
Invocation response returning 401 or 403 | This indicates that the caller is unauthorized, ensure Authorization header is correct per the documentation. |
Container OOM | This is difficult to detect without instrumenting your container with additional metrics, unless your container is emitting logs that indicate out of memory. We recommend profiling the memory usage locally. For testing locally and in the function, you can look at a profile of the memory allocation using this guide. |
Function Management
The table below provides an overview of the Function lifecycle API endpoints and their respective usages.
Name |
Method |
Endpoint |
Usage |
---|---|---|---|
Register Function | POST | /v2/nvcf/functions |
Creates a new Function. |
Register Function Version | POST | /v2/nvcf/functions/{functionId}/versions |
Creates a new version of a Function. |
Delete Function Version | DELETE | /v2/nvcf/functions/{functionId}/versions/{functionVersionId} |
Deletes a Function specified by its ID. |
List Functions | GET | /v2/nvcf/functions |
Retrieves a list of functions associated with the account. |
List Function Versions | GET | /v2/nvcf/functions/{functionId}/versions |
Retrieves a list of versions for a specific Function. |
Retrieve Function Details | GET | /v2/nvcf/functions/{functionId}/versions/{functionVersionId} |
Retrieves details of a specific Function version. |
Create Function Version Deployment | POST | /v2/nvcf/deployments/functions/{functionId}/versions/{functionVersionId} |
Initiates the deployment process for a Function version on worker nodes. |
Delete Function Version Deployment | DELETE | /v2/nvcf/deployments/functions/{functionId}/versions/{functionVersionId} |
Initiates the undeployment process for a Function version. |
Retrieve Function Version Deployment | GET | /v2/nvcf/deployments/functions/{functionId}/versions/{functionVersionId} |
Retrieves details of a specific Function version deployment. |
Update Function Version Deployment | PUT | /v2/nvcf/deployments/functions/{functionId}/versions/{functionVersionId} |
Updates the configuration of a Function version deployment. |
Function Metadata
When using the NVCF API to create a function, it’s possible to specify a function description and a list of tags as strings as part of the function creation request body. This metadata is then returned in all responses that include the function definition. This is an API only feature at this time. Please see the Open API Docs for more information.
Function Invocation
The table below provides an overview of the Function invocation API endpoints and their respective usages.
Name |
Method |
Endpoint |
Usage |
---|---|---|---|
Invoke Function | POST | v2/nvcf/pexec/functions/{functionId} |
Invokes the specified Function to execute the job and returns the results, if available. NVCF randomly selects one of the active versions of the specified Function to execute the submitted job. Avoid making a GET request to obtain result if the original POST request returns a 200 response. |
Invoke Function Version | POST | v2/nvcf/pexec/functions/{functionId}/versions/{functionVersionId} |
Invokes the specified version under the specified Function to execute the job and returns the results. Avoid making a GET request to obtain result if the original POST request returns a 200 response. |
Get Function Invocation Status | GET | /v2/nvcf/pexec/status/{invocationRequestId} |
Used to poll for results of a job when 202 is returned. Avoid making this request to obtain result if the original POST request returns a 200 response. |
Result can be obtained just once either via the original POST request or via a subsequent GET request.
If the result was included in the original POST request, then the status will be 200. In that case, any subsequent attempts to obtain the result using the GET request will result in 404.
If the original POST request responds with 202(i.e. result is pending), then result should be obtained using the GET request. GET request can respond with either 202(i.e. result is pending) or 200(i.e. result is ready). Once GET request responds with 200, then any subsequent attempts to obtain the result using the GET request will result in 404.
Asset Management
Name |
Method |
Endpoint |
Usage |
---|---|---|---|
Create Asset | POST | /v2/nvcf/assets |
Creates an asset id and a corresponding pre-signed URL to upload (file). |
List Assets | GET | /v2/nvcf/assets |
Returns a list of assets associated with the account. |
Delete Asset | DELETE | /v2/nvcf/assets/{nvcf_asset_id} |
Deletes an asset using the specified asset id. |
Visibility
Name |
Method |
Endpoint |
Usage |
---|---|---|---|
Get Queue Length for Function id | GET | /v2/nvcf/queues/functions/{functionId} |
Returns a list containing a single element with corresponding queue length for the specified Function. |
Get Queue Length for Version id | GET | /v2/nvcf/queues/functions/{functionId}/versions/{functionVersionId} |
Returns a list containing a single element with corresponding queue length for the specified Function version id. |
Get Available GPUs | GET | /v2/nvcf/supportedGpus |
Returns a list of GPU types you have access too. |
Get Queue Position for Request id | GET | /v2/nvcf/queues/{requestId}/position |
Returns estimated position in queue, up to 1000, for a specific request id of a function invocation request. |
The following is a reference of available variables via the headers of the invocation message (auto-populated by NVCF), accessible within the container.
For examples of how to extract and use some of these variables, see NVCF Container Helper Functions.
Name |
Description |
---|---|
NVCF-REQID | Request ID for this request. |
NVCF-SUB | Message subject. |
NVCF-NCAID | Function’s organization’s NCA ID. |
NVCF-FUNCTION-NAME | Function name. |
NVCF-FUNCTION-ID | Function ID. |
NVCF-FUNCTION-VERSION-ID | Function version ID. |
NVCF-ASSET-DIR | Asset directory path. Not available for helm deployments. |
NVCF-LARGE-OUTPUT-DIR | Large output directory path. |
NVCF-MAX-RESPONSE-SIZE-BYTES | Max response size in bytes for the function. |
NVCF-NSPECTID | NVIDIA reserved variable. |
NVCF-BACKEND | Backend or “Cluster Group” the function is deployed on. |
NVCF-INSTANCETYPE | Instance type the function is deployed on. |
NVCF-REGION | Region or zone the function is deployed in. |
NVCF-ENV | Spot environment if deployed on spot instances. |
This section gives an overview of available metrics and logs within the Cloud Functions UI. Note that it is possible to emit logs, metrics, analytics etc. to any 3rd party from within your container and recommended for full observability and monitoring.
Emit and View Inference Container Logs
View inference container logs in the Cloud Functions UI via the “Logs” tab in the function details page. To get here, click any function version from the “Functions” list and click “View Details” on the side panel to the right.
Logs are currently available with up to 48 hours history, with the ability to view as expanded rows for scanning, or as a “window” view for ease of copying and pasting.
Note as a prerequisite, your inference container will have to be instrumented to emit logs. This is highly recommended.
How to Add Logs to Your Inference Container
Here is an example of adding NVCF compatible logs. The helper function for logging below, along with other helper functions, can be imported from the Helper Functions repository.
import logging
def get_logger() -> logging.Logger:
"""
gets a Logger that logs in a format compatible with NVCF
:return: logging.Logger
"""
sys.stdout.reconfigure(encoding="utf-8")
logging.basicConfig(
level=logging.INFO,
format="%(asctime)s[%(levelname)s] [INFERENCE]%(message)s",
handlers=[logging.StreamHandler(sys.stdout)],
)
logger = logging.getLogger(__name__)
return logger
class MyServer:
def __init__(self):
self.logger = get_logger()
def _infer_fn(self, request):
self.logger.info("Got a request!")
View Function Metrics
NVCF exposes the following metrics by default.
Instance counts (current, min and max)
Invocation activity and queue depth
Total invocation count, success rate and failure count
Average inference time
Metrics are viewable upon clicking any function from the “Functions” list page. The function overview page will display aggregated values across all function versions.
When clicking into a function version’s details page, you will then see metrics for this specific function version.
There may be up to a 5 minute delay on metric ingestion. Any timeseries queries within the page are aggregated on 5 minute intervals with a step set to show 500 data points. All stat queries are based on the total selected time period and reduced to either show the latest total value or a mean value.
Below are instructions on setting up output directories and efficiently tracking and communicating progress using the utils container.
Setting Up the Output Directory
The utils container automatically configures the output directory for you. To access the path, simply read the NVCF-LARGE-OUTPUT-DIR
header. NVCF-LARGE-OUTPUT-DIR
points to the directory for that particular requestId.
Writing Large Outputs
When your Custom BLS generates large outputs, save them temporarily with the “*.partial” extension inside the NVCF-LARGE-OUTPUT-DIR
directory. For instance, if you’re writing an image, name it image1.partial
.
Finalizing Outputs
Once the writing of the output file is complete, rename it from “*.partial” to its appropriate extension. Continuing with our example, rename image1.partial
to image1.jpg
.
Handling Progress Messages
The utils container actively observes the output directory for a file named ‘progress’. This file can be used to communicate progress and partial responses back to the caller.
Structure of the Progress File
This file should contain well-formed JSON data.
Structure the JSON content as follows:
{
"id": "<requestId>",
"progress": 50,
"partialResponse": {
"exampleKey": "Insert any well-formed JSON here, but ensure its size is less than 250K"
}
}
Replace <requestId>
with the actual request id if it’s present. Modify the progress integer as needed, ranging from 0 (just started) to 100 (fully complete). Within partialResponse
, insert any JSON content you want to send as a partial response, making sure it’s smaller than 250KB.
Transferring Data to the Utils Container
Once the output files and progress file are correctly set up in the output directory under the correct request id, the utils container will automatically detect them. The utils container will then send these as a “progress” message.
Best Practices
Always use the “.partial” extension to avoid sending partial or incomplete data.
Rename to the final extension only when the writing process is fully complete.
Ensure your progress file remains under 250KB to maintain efficiency and avoid errors.