This page describes the steps to create a function within Cloud Functions.
Please ensure prior to function creation, you’ve installed and configured the NGC CLI for working with the NGC Private Registry.
Functions can be created in one of three ways, listed below, and also visible in the Cloud Functions UI.
![function-creation-start.png](https://docscontent.nvidia.com/dims4/default/a583a5e/2147483647/strip/true/crop/2616x1162+0+0/resize/1440x640!/quality/90/?url=https%3A%2F%2Fk3-prod-nvidia-docs.s3.us-west-2.amazonaws.com%2Fbrightspot%2Fsphinx%2F0000018f-f319-d1d5-a3bf-f3b9b89f0000%2Fcloud-functions%2Fuser-guide%2Flatest%2F_images%2Ffunction-creation-start.png)
Custom Container
Enables any container-based workload as long as the container exposes an inference endpoint and a health check.
Option to leverage any server, ex. PyTriton, FastAPI, Triton.
More easily take advantage of Cloud Functions features such as the Asset Management API for input sizes >200KB, HTTP streaming or gRPC, and partial response reporting.
Triton Model
Only supports Triton Inference Server compatible models and HTTP based invocation.
Leverages Triton’s Auto-Generated Model Configuration.
Models are loaded at start time, and the resulting deployed function instance is stateless.
Helm Chart
Enables orchestration across multiple containers. For complex use cases where a single container isn’t flexible enough.
Requires one “mini-service” container defined as the inference entry point for the function.
Does not support partial response reporting, gRPC or HTTP streaming based invocation.
Function creation requires your model, container, helm chart and/or static resources to be hosted within NGC Private Registry as a prerequisite. Follow the steps below to optimally configure the NGC CLI to work with NGC Private Registry and Cloud Functions.
NGC Private Registry has size constraints on layers, images, models and resources.
Ensure that your uploaded resources conform to these constraints.
Generate an NGC Personal API Key
Do this by navigating to the Personal Keys Page. For more details see Generate an NGC Personal API Key.
It’s recommended that the API Key that you generate includes both Cloud Functions and Private Registry scopes to enable ideal Cloud Functions workflows.
Download & Configure the NGC CLI
Navigate to the NGC CLI Installer Page to download the CLI and follow the installation instructions for your platform.
Find your NGC organization name within the NGC Organization Profile Page. This is not the Display Name. For example:
qdrlnbkss123
.Run
ngc config set
and input the Personal API Key generated in the previous step, along with your organization name. If prompted, default tono-team
andno-ace
.
> ngc config set
Enter API key [****bi9Z]. Choices: [<VALID_APIKEY>, 'no-apikey']: $API_KEY
Enter CLI output format type [json]. Choices: ['ascii', 'csv', 'json']: json
Enter org [ax3ysqem02xw]. Choices: ['$ORG_NAME']: $ORG_NAME
Enter team [no-team]. Choices: ['no-team']:
Enter ace [no-ace]. Choices: ['no-ace']:
Authenticate with NGC Docker Registry
Run
docker login nvcr.io
and input the following, note$oauthtoken
is the actual string to input, and$API_KEY
is the Personal API key generated in the first step.
> docker login nvcr.io
Username: $oauthtoken
Password: $API_KEY
(Optional) Push a Container to NGC Private Registry
You should now be able to push a container to NGC Private Registry. Optionally, validate this by pushing an example container from the samples repository:
First clone and build the docker image.
> git clone https://github.com/NVIDIA/nv-cloud-function-helpers.git
> cd nv-cloud-function-helpers/examples/fastapi_echo_sample
> docker build . -t fastapi_echo_sample
Now tag and push the docker image to NGC Private Registry.
> docker tag fastapi_echo_sample:latest nvcr.io/$ORG_NAME/fastapi_echo_sample:latest
> docker push nvcr.io/$ORG_NAME/fastapi_echo_sample:latest
Once this finishes, you’ll now be able to see the new container in the NGC Private Registry Containers Page and it will be available for use in function creation.
Container-based functions require building and pushing a Cloud Functions compatible Docker container image to NGC Private Registry.
Before proceeding, ensure that you have the NGC CLI installed and configured with an API Key that has the required scopes for Cloud Functions and Private Registry.
See Working with NGC Private Registry for instructions.
Resources
Example containers can be found here.
The repository also contains helper functions that are useful when authoring your container, including:
Helpers that parse Cloud Functions-specific parameters on invocation
Helpers that can be used to instrument your container with Cloud Functions compatible logs
Helpers for working with assets
After container creation, but before proceeding to deployment, it is strongly recommended to validate your container’s configuration locally, see Deployment Validation.
It’s always a best practice to emit logs from your inference container. See Logging and Metrics for how to add logs to your container. Cloud Functions also supports third-party logging and metrics emission from your container.
Container Endpoints
Any server can be implemented within the container, as long as it implements the following:
For HTTP-based functions, a health check endpoint that returns a 200 HTTP Status Code on success.
For gRPC-based functions, a standard gRPC health check. See these docs for more info also gRPC Health Checking.
An inference endpoint (this endpoint will be called during function invocation)
These endpoints are expected to be served on the same port, defined as the inferencePort
.
Cloud Functions reserves the following ports on your container for internal monitoring and metrics:
Port
8080
Port
8010
Composing a FastAPI Container
It’s possible to use any container with Cloud Functions as long as it implements a server with the above endpoints. The below is an example of a FastAPI-based container compatible with Cloud Functions. Clone the full example here.
Create the “requirements.txt” File
requirements.txt
fastapi==0.110.0
uvicorn==0.29.0
Implement the Server
http_echo_server.py
import os
import time
import uvicorn
from pydantic import BaseModel
from fastapi import FastAPI, status
from fastapi.responses import StreamingResponse
app = FastAPI()
class HealthCheck(BaseModel):
status: str = "OK"
# Implement the health check endpoint
@app.get("/health", tags=["healthcheck"], summary="Perform a Health Check", response_description="Return HTTP Status Code 200 (OK)", status_code=status.HTTP_200_OK, response_model=HealthCheck)
def get_health() -> HealthCheck:
return HealthCheck(status="OK")
class Echo(BaseModel):
message: str
delay: float = 0.000001
repeats: int = 1
stream: bool = False
# Implement the inference endpoint
@app.post("/echo")
async def echo(echo: Echo):
if echo.stream:
def stream_text():
for _ in range(echo.repeats):
time.sleep(echo.delay)
yield f"data:{echo.message}\n\n"
return StreamingResponse(stream_text(), media_type="text/event-stream")
else:
time.sleep(echo.delay)
return echo.message*echo.repeats
# Serve the endpoints on a port
if __name__ == "__main__":
uvicorn.run(app, host="0.0.0.0", port=8000, workers=int(os.getenv('WORKER_COUNT', 500)))
Note in the example above, the function’s configuration during creation will be:
Inference Protocol: HTTP
Inference Endpoint:
/echo
Health Endpoint:
/health
Inference Port (also used for health check):
8000
Create the Dockerfile
Dockerfile
FROM python:3.10.13-bookworm
ENV WORKER_COUNT=10
WORKDIR /app
COPY requirements.txt ./
RUN python -m pip install --no-cache-dir -U pip && \
python -m pip install --no-cache-dir -r requirements.txt
COPY http_echo_server.py /app/
CMD uvicorn http_echo_server:app --host=0.0.0.0 --workers=$WORKER_COUNT
Build the Container & Create the Function
See the Quickstart for remaining steps.
Composing a PyTriton Container
NVIDIA’s PyTriton is a Python native solution of Triton inference server. A minimum version of 0.3.0 is required.
Create the “requirements.txt” File
This file should list the Python dependencies required for your model.
Add nvidia-pytriton to your
requirements.txt
file.
Here is an example of a requirements.txt
file:
requirements.txt
--extra-index-url https://pypi.ngc.nvidia.com
opencv-python-headless
pycocotools
matplotlib
torch==2.1.0
nvidia-pytriton==0.3.0
numpy
Create the “run.py” File
Your
run.py
file (or similar python file) needs to define a PyTriton model.This involves importing your model dependencies, creating a PyTritonServer class with an
__init__
function, an_infer_fn
function and arun
function that serves the inference_function, defining the model name, the inputs and the outputs along with optional configuration.
Here is an example of a run.py
file:
run.py
import numpy as np
from pytriton.model_config import ModelConfig, Tensor
from pytriton.triton import Triton, TritonConfig
import time
....
class PyTritonServer:
"""triton server for timed_sleeper"""
def __init__(self):
# basically need to accept image, mask(PIL Images), prompt, negative_prompt(str), seed(int)
self.model_name = "timed_sleeper"
def _infer_fn(self, requests):
responses = []
for req in requests:
req_data = req.data
sleep_duration = numpy_array_to_variable(req_data.get("sleep_duration"))
# deal with header dict keys being lowerscale
request_parameters_dict = uppercase_keys(req.parameters)
time.sleep(sleep_duration)
responses.append({"sleep_duration": np.array([sleep_duration])})
return responses
def run(self):
"""run triton server"""
with Triton(
config=TritonConfig(
http_header_forward_pattern="NVCF-*", # this is required
http_port=8000,
grpc_port=8001,
metrics_port=8002,
)
) as triton:
triton.bind(
model_name="timed_sleeper",
infer_func=self._infer_fn,
inputs=[
Tensor(name="sleep_duration", dtype=np.uint32, shape=(1,)),
],
outputs=[Tensor(name="sleep_duration", dtype=np.uint32, shape=(1,))],
config=ModelConfig(batching=False),
)
triton.serve()
if __name__ == "__main__":
server = PyTritonServer()
server.run()
Create the “Dockerfile”
Create a file named
Dockerfile
in your model directory.It’s strongly recommended to use NVIDIA-optimized containers like CUDA, Pytorch or TensorRT as your base container. They can be downloaded from the NGC Catalog.
Make sure to install your Python requirements in your
Dockerfile
.Copy in your model source code, and model weights unless you plan to host them in NGC Private Registry.
Here is an example of a Dockerfile
:
Dockerfile
FROM nvcr.io/nvidia/cuda:12.1.1-devel-ubuntu22.04
RUN apt-get update && apt-get install -y \
git \
python3 \
python3-pip \
python-is-python3 \
libsm6 \
libxext6 \
libxrender-dev \
curl \
&& rm -rf /var/lib/apt/lists/*
WORKDIR /workspace/
# Install requirements file
COPY requirements.txt requirements.txt
RUN pip install --no-cache-dir --upgrade pip
RUN pip install --no-cache-dir -r requirements.txt
ENV DEBIAN_FRONTEND=noninteractive
# Copy model source code and weights
COPY model_weights /models
COPY model_source .
COPY run.py .
# Set run command to start PyTriton to serve the model
CMD python3 run.py
Build the Docker Image
Open a terminal or command prompt.
Navigate to the
my_model
directory.Run the following command to build the docker image:
docker build -t my_model_image .
Replace my_model_image
with the desired name for your docker image.
Push the Docker Image
Before beginning, ensure that you have authenticated with the NGC Docker Registry.
Tag and push the docker image to NGC Private Registry.
> docker tag my_model_image:latest nvcr.io/$ORG_NAME/my_model_image:latest
> docker push nvcr.io/$ORG_NAME/my_model_image:latest
Create the Function
Create the function via API by running the following curl with an
$API_KEY
and your$ORG_NAME
. In this example, we defined the inference endpoint as8000
and are using the default inference and health endpoint paths.
curl --location 'https://api.ngc.nvidia.com/v2/nvcf/functions' \
--header 'Content-Type: application/json' \
--header 'Accept: application/json' \
--header 'Authorization: Bearer $API_KEY' \
--data '{
"name": "my-model-function",
"inferenceUrl": "/v2/models/my_model_image/infer",
"healthUri": "/v2/health/ready",
"inferencePort": 8000,
"containerImage": "nvcr.io/$ORG_NAME/my_model_image:latest"
}'
Additional Examples
See more examples of PyTriton containers that are Cloud Functions compatible here.
Triton-based Container Configuration
NVIDIA Cloud Functions is designed to working natively with Triton Inference Server based containers, including leveraging metrics and health checks from the server.
Pre-built Triton docker images can be found within NGC’s Container catalog. A minimum version of 23.04 (2.33.0) is required.
Configuration
The default health /v2/health/ready
, port 8000
, and inference endpoint (v2/models/$MODEL_NAME/infer
) work automatically with Triton-based containers.
The docker image’s run command must be configured with the following:
CMD tritonserver --model-repository=${MODEL_PATH} --http-header-forward-pattern NVCF-.*
Here is an example of a Dockerfile
:
Dockerfile
FROM nvcr.io/nvidia/tritonserver:24.01-py3
# install requirements file
COPY requirements.txt requirements.txt
RUN pip install --no-cache-dir --upgrade pip
RUN pip install --no-cache-dir -r requirements.txt
COPY model_repository /model_repository
ENV CUDA_MODULE_LOADING LAZY
ENV LOG_VERBOSE 0
CMD tritonserver --log-verbose ${LOG_VERBOSE} --http-header-forward-pattern (nvcf-.*|NVCF-.*) \
--model-repository /model_repository/ --model-control-mode=none --strict-readiness 1
See a full example of a Triton container.
Creating Functions with NGC Models & Resources
When creating a function, models and resources can be mounted to the function instance. The models will be available under /config/models/{modelName}
and /config/resources/{resourceName}
where modelName
and resourceName
are specified as part of the API request.
Here is an example where a model and resource are added to a function creation API call, for an echo sample function:
curl -X 'POST' \
'https://api.nvcf.nvidia.com/v2/nvcf/functions' \
-H 'Authorization: Bearer $API_KEY' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"name": "echo_function",
"inferenceUrl": "/echo",
"containerImage": "nvcr.io/$ORG_NAME/echo:latest",
"apiBodyFormat": "CUSTOM",
"models": [
{
"name": "simple_int8",
"version": "1",
"uri": "v2/org/cf/$ORG_NAME/models/simple_int8/versions/1/zip"
}
],
"resources": [
{
"name": "simple_resource",
"version": "1",
"uri": "v2/org/cf/$ORG_NAME/resources/simple_resource/versions/1/zip"
}
]
}'
Within the container once the function instance is deployed, the model would be mounted at /config/models/simple_int8
and resource mounted at /config/resources/simple_int8
Creating gRPC-based Functions
Cloud Functions supports function invocation via gRPC. During function creation, specify that the function is a gRPC function by setting the “Inference Protocol”, or inferenceUrl
field to grpc.nvcf.nvidia.com:443
.
Prerequisites
The function container must implement a gRPC port, endpoint and health check. The health check is expected to be served by the gRPC inference port, there is no need to define a separate health endpoint path.
See gRPC health checking.
See an example container with a gRPC server that is Cloud Functions compatible.
gRPC Function Creation via UI
In the Function Creation Page, set the “Inference Protocol” to gRPC
and port to whatever your gRPC server has implemented.
![grpc-function-creation.png](https://docscontent.nvidia.com/dims4/default/a9be1e0/2147483647/strip/true/crop/2694x1310+0+0/resize/1440x700!/quality/90/?url=https%3A%2F%2Fk3-prod-nvidia-docs.s3.us-west-2.amazonaws.com%2Fbrightspot%2Fsphinx%2F0000018f-f319-d1d5-a3bf-f3b9b89f0000%2Fcloud-functions%2Fuser-guide%2Flatest%2F_images%2Fgrpc-function-creation.png)
gRPC Function Creation via CLI
When creating the gRPC function, set the --inference-url
argument to grpc.nvcf.nvidia.com:443
:
ngc cf function create --inference-port 8001 --container-image nvcr.io/$ORG_NAME/grpc_echo_sample:latest --name my-grpc-function --inference-url grpc.nvcf.nvidia.com:443
gRPC Function Creation via API
When creating the gRPC function, set the inferenceURl
field to grpc.nvcf.nvidia.com:443
:
curl --location 'https://api.ngc.nvidia.com/v2/nvcf/functions' \
--header 'Content-Type: application/json' \
--header 'Accept: application/json' \
--header 'Authorization: Bearer $API_KEY' \
--data '{
"name": "my-grpc-function",
"inferenceUrl": "grpc.nvcf.nvidia.com:443",
"inferencePort": 8001,
"containerImage": "nvcr.io/$ORG_NAME/grpc_echo_sample:latest"
}'
gRPC Function Invocation
See gRPC Invocation for details on how to authenticate and invoke your gRPC function.
Available Container Variables
The following is a reference of available variables via the headers of the invocation message (auto-populated by Cloud Functions), accessible within the container.
For examples of how to extract and use some of these variables, see NVCF Container Helper Functions.
Name | Description |
---|---|
NVCF-REQID | Request ID for this request. |
NVCF-SUB | Message subject. |
NVCF-NCAID | Function’s organization’s NCA ID. |
NVCF-FUNCTION-NAME | Function name. |
NVCF-FUNCTION-ID | Function ID. |
NVCF-FUNCTION-VERSION-ID | Function version ID. |
NVCF-ASSET-DIR | Asset directory path. Not available for helm deployments. |
NVCF-LARGE-OUTPUT-DIR | Large output directory path. |
NVCF-MAX-RESPONSE-SIZE-BYTES | Max response size in bytes for the function. |
NVCF-NSPECTID | NVIDIA reserved variable. |
NVCF-BACKEND | Backend or “Cluster Group” the function is deployed on. |
NVCF-INSTANCETYPE | Instance type the function is deployed on. |
NVCF-REGION | Region or zone the function is deployed in. |
NVCF-ENV | Spot environment if deployed on spot instances. |
Adding Partial Response (Progress)
Below are instructions on setting up output directories and efficiently tracking and communicating inferencing progress using Cloud Functions. This functionality is only supported for container-based functions.
Cloud Functions automatically configures the output directory for you. To access the path, simply read the
NVCF-LARGE-OUTPUT-DIR
header.NVCF-LARGE-OUTPUT-DIR
points to the directory for that particularrequestId
.To enable partial progress reporting, you will need to store partial and completed outputs, and create a
progress
file in the output directory.Once the output file and progress file are correctly set up in the output directory under the correct request id, Cloud Functions will automatically detect them.
When using the invocation API to poll for a response,
progress
will be returned as the headerNVCF-PERCENT-COMPLETE
, along with any partial response data.
Storing Partial and Complete Outputs
When your Custom BLS generates large outputs, save them temporarily with the “*.partial” extension inside the
NVCF-LARGE-OUTPUT-DIR
directory. For instance, if you’re writing an image, name itimage1.partial
.Once the writing of the output file is complete, rename it from “*.partial” to its appropriate extension. Continuing with our example, rename
image1.partial
toimage1.jpg
.
Creating a Progress File
Cloud Functions actively observes the output directory for a file named progress
. This file is used to communicate progress and partial responses back to the caller.
This file should contain well-formed JSON data. Structure the JSON content as follows:
{
"id": "{requestId}",
"progress": 50,
"partialResponse": {
"exampleKey": "Insert any well-formed JSON here, but ensure its size is less than 250K"
}
}
Replace requestId
with the actual request id if it’s present. Modify the progress integer as needed, ranging from 0 (just started) to 100 (fully complete). Within partialResponse
, insert any JSON content you want to send as a partial response, making sure it’s smaller than 250KB.
Best Practices
Always use the “.partial” extension to avoid sending partial or incomplete data.
Rename to the final extension only when the writing process is fully complete.
Ensure your progress file remains under 250KB to maintain efficiency and avoid errors.
It’s possible to deploy a model hosted within NGC Private Registry directly leveraging Triton Inference Server and Triton’s Auto-Generated Model Configuration. However, if a more complicated setup is required a complete model configuration will need to be created. Please see the Triton Model Configuration documentation to learn more.
Once the model is uploaded, the inputs will be automatically discovered and deployed as part of the endpoint.
Model-only function creation has the following limitations:
Function invocation is only supported via HTTP
Defining custom logic within the function is not possible
Models are loaded in at start time
Inference inputs should not exceed 5MB
Model Creation Resources
Refer to triton_echo_sample model.py example to develop your model, and config.pbtxt example as an example to create the neccessary configuration.
Here are some other examples of how to create more complex workflows within Triton:
Upload the Model
Ensure you have an API key created, see Generate an NGC Personal API Key.
Ensure you have the NGC CLI configured.
Via terminal, navigate to the directory where the model file is present.
Upload the model to your NGC Private Registry by running the following command:
ngc registry model upload-version $ORG_NAME/$MODEL_NAME:v1 --gpu-model '$GPU' --source ./$PATH/$TO/$MODEL_NAME
Refer to NGC Models for further guidance on creating models.
After the upload is complete, you will receive an upload summary confirming the status:
Model ID: $MODEL_NAME[version=v1]
Upload status: Completed
...
Create a Model-Only Function
See below for an example API call creating a function based only on a model.
Note that the containerImage
is omitted from the request. This function will use the Triton Inference Server.
Model-only functions use the Predict Protocol - Version 2 for the requestBody
(input & output), therefore, you must set the apiBodyFormat
to PREDICT_V2
.
curl -X 'POST' \
'https://api.nvcf.nvidia.com/v2/nvcf/functions' \
-H 'Authorization: Bearer $API_KEY' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"name": "my-test-function",
"inferenceUrl": "v2/models/my-model/infer",
"models": [
{
"name": "my-model",
"version": "1",
"uri": "v2/org/cf/$ORG_NAME/models/my-model/versions/1/zip"
}
],
"apiBodyFormat": "PREDICT_V2"
}'
Cloud functions supports helm-based functions for orchestration across multiple containers.
Prerequisites
The helm chart must have a “mini-service” container defined, which will be used as the inference entry point.
The name of this service in your helm chart should be supplied by setting
helmChartServiceName
during function definition. This allows Cloud Functions to communicate and make inference requests to the “mini-service” endpoint.
The servicePort
defined within the helm chart should be used as the inferencePort
supplied during function creation. Otherwise, Cloud Functions will not be able to reach the “mini-service”.
Ensure you have the NGC CLI configured and have pushed your helm chart to NGC Private Registry. Refer to Managing Helm Charts Using the NGC CLI.
Secret Management
For pulling containers defined as part of the helm chart from NGC Private Registry, a new value named ngcImagePullSecretName
needs to be defined in the chart.cThe value is referred in deployment spec as spec.imagePullSecrets.name
of pods in chart.
Containers defined in the helm chart should be in the same NGC Organization and Team that the helm chart itself is being pulled from.
Create a Helm-based Function
Ensure your helm chart is uploaded to NGC Private Registry and adheres to the Prerequisites listed above.
Create the function:
Include the following additional parameters in the function definition
helmChart
helmChartServiceName
The
helmChart
property should be set to the URL hosted by the NGC Model Registry pointing to the helm chart that will deploy the “mini-service”. Please note, this helm chart URL should be accessible to the NGC org in which function will eventually be deployed in.The
helmChartServiceName
is used for checking if the “mini-service” is ready for inference and is also scraped for function metrics. At this time, templatized service names are not supported. This must match the service name of your “mini-service” with the exposed entrypoint port.
Example Creation via API
Please see our sample helm chart used in this example for reference.
Below is an example function creation API call creating a helm-based function:
curl -X 'POST' \
'https://api.nvcf.nvidia.com/v2/nvcf/functions' \
-H 'Authorization: Bearer $API_KEY' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"name": "function_name",
"inferenceUrl": "v2/models/model_name/versions/model_version/infer",
"inferencePort": 8001,
"helmChart": "https://helm.ngc.nvidia.com/$ORG_NAME/charts/inference-test-1.0.tgz",
"helmChartServiceName": "service_name",
"apiBodyFormat": "CUSTOM"
}'
For gRPC based functions, set “inferenceURL” : "/gRPC”
. This signals to Cloud Functions that the function is using gRPC protocol and is not expected to have a /gRPC
endpoint exposed for inferencing requests.
Proceed with function deployment and invocation normally.
Limitations
- When using helm charts, the following limitations needs to be taken into consideration
Automatic mounting of NGC Models and Resources for your container is not supported.
For any downloads (such as of assets or models) occurring within your function’s containers, download size is limited by the disk space on the VM - for GFN this is 100GB approximately, and for other clusters this limit will vary.
Progress/partial response reporting is not supported, including any additional artifacts generated during inferencing. Consider opting for HTTP streaming or gRPC bidirectional support.
Supported k8s artifacts under Helm Chart Namespace are listed below. Others will be rejected:
Deployment
Service
ServiceAccount
Role & RoleBindings
ConfigMaps
Secrets
Helm Chart Overrides
To override keys in your helm chart values.yml
, you can provide the configuration
parameter and supply corresponding key value pairs in JSON format which you would like to be overridden when the function is deployed.
Example helm chart override
curl -X 'POST' \
'https://api.nvcf.nvidia.com/v2/nvcf/deployments/functions/fe6e6589-12bb-423a-9bf6-8b9d028b8bf4/versions/fe6e6589-12bb-423a-9bf6-8b9d028b8bf4' \
-H 'Authorization: Bearer $API_KEY' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"deploymentSpecifications": [{
"gpu": "L40",
"backend": "OCI",
"maxInstances": 2,
"minInstances": 1
"configuration": {
"key_one": "<value>",
"key_two": { "key_two_subkey_one": "<value>", "key_two_subkey_two": "<value>" }
...
},
{
"gpu": "T10",
"backend": "GFN",
"maxInstances": 2,
"minInstances": 1
}]
}'