Helm-Based Function Creation#

Cloud functions support helm-based functions for orchestration across multiple containers.

Prerequisites#

Warning

Ensure that your helm charts version does not contain - For example v1 is ok but v1-test will cause issues.

  1. The helm chart must have a “mini-service” container defined, which will be used as the inference entry point.

  2. The name of this service in your helm chart should be supplied by setting helmChartServiceName during the function definition. This allows Cloud Functions to communicate and make inference requests to the “mini-service” endpoint.

Attention

The servicePort defined within the helm chart should be used as the inferencePort supplied during function creation. Otherwise, Cloud Functions will not be able to reach the “mini-service”.

  1. Ensure you have pushed your helm chart to your OCI container registry.

Pull Secret Management#

All Pod specs in your helm chart will be updated with pull secrets at runtime, so any images are authorized to pull automatically. No other configuration is needed.

Create a Helm-based Function#

  1. Ensure your helm chart is uploaded to your registry and adheres to the Prerequisites listed above.

  2. Create the function:

    • Include the following additional parameters in the function definition

      • helmChart

      • helmChartServiceName

    • The helmChart property should be set to the OCI URL of the helm chart that will deploy the “mini-service”. The helm chart URL should follow the format: oci://${REGISTRY}/${REPOSITORY}/charts/$NAME-X.Y.Z.tgz. The chart name should not contain - in the version string.

    • The helmChartServiceName is used for checking if the “mini-service” is ready for inference and is also scraped for function metrics. At this time, templatized service names are not supported. This must match the service name of your “mini-service” with the exposed entry point port.

    • Important: The Helm chart name should not contain underscores or other special symbols, as that may cause issues during deployment.

Example Creation via API

Please see our sample helm chart used in this example for reference.

Below is an example function creation API call creating a helm-based function:

 1curl -s -X POST "http://${GATEWAY_ADDR}/v2/nvcf/functions" \
 2    -H "Host: api.${GATEWAY_ADDR}" \
 3    -H "Authorization: Bearer $NVCF_TOKEN" \
 4    -H 'accept: application/json' \
 5    -H 'Content-Type: application/json' \
 6    -d '{
 7    "name": "function_name",
 8    "inferenceUrl": "v2/models/model_name/versions/model_version/infer",
 9    "inferencePort": 8001,
10    "helmChart": "oci://'${REGISTRY}'/'${REPOSITORY}'/charts/inference-test-1.0.tgz",
11    "helmChartServiceName": "service_name",
12    "apiBodyFormat": "CUSTOM"
13}'

Note

For gRPC-based functions, set "inferenceURL" : "/gRPC". This signals to Cloud Functions that the function is using gRPC protocol and is not expected to have a /gRPC endpoint exposed for inferencing requests.

  1. Proceed with function deployment and invocation normally.

Multi-node helm deployment To create a multi-node helm deployment, you need to use the following format for the instanceType: <CSP>.GPU.<GPU_NAME>_<number of gpus per node>x[.x<number of nodes>]. For example, DGXC.GPU.L40S_1x is a single L40S instance while ON-PREM.GPU.B200_8x.x2 is two full nodes of 8-way B200.

A sample helm chart for a multi-node deployment can be found in the multi-node helm example.

Limitations#

When using Helm Charts to deploy a function, the following limitations need to be taken into consideration.

1. Asset caching#

  • For any downloads (such as of assets or models) occurring within your function’s containers, download size is limited by the disk space on the node.

2. Inference#

Progress/partial response reporting is not supported, including any additional artifacts generated during inferencing. Consider opting for HTTP streaming or gRPC bidirectional support.

3. Security Constraints#

Helm charts must conform to certain security standards to be deployable as a function. This means that certain helm and Kubernetes features are restricted in NVCF backends. NVCF will process your helm chart on function creation, then later on deployment with your Helm values and other deployment metadata, to ensure standards are enforced.

NVCF may automatically modify certain objects in your chart so they conform to these standards; it will only do so if modification will not break your chart when it is installed in the targeted backend. Possible areas amenable to modification will be noted in the restrictions section below. Any standard that cannot be enforced by modification will result in error(s) during function creation.

Restrictions
  • Supported k8s artifacts under Helm Chart Namespace are listed below; others will be rejected:

    • ConfigMaps

    • Secrets

    • Services - Only type: ClusterIP or none

    • Deployments

    • ReplicaSets

    • StatefulSets

    • Jobs

    • CronJobs

    • Pods

    • ServiceAccounts

    • Roles

    • Rolebindings

    • PersistentVolumeClaims

  • A rendered Helm chart may contain a maximum of 300 of the aforementioned objects.

  • The only allowed Pod or Pod template volume types are:

    • configMap

    • secret

    • projected.sources.* of any of the above

    • persistentVolumeClaim

    • emptyDir

  • No chart hooks are allowed; if specified in the chart, they will not be executed.

Note

CustomResourceDefinitions in helm charts will be skipped on installation. There is no need to modify your chart to remove them from helm template output for NVCF.

Helm charts _should_ conform to these additional security standards. While not enforced now, they will be at a later date.
  • All containers have resource limits for at least cpu and memory (and nvidia.com/gpu, ephemeral-storage if required for certain containers).

  • All Pod’s and resources that define a Pod template conform to the Kubernetes Pod Security Standards Baseline and Restricted policies.

  • Pod and container securityContext’s conform to these parameters:

  • automountServiceAccountToken must be unset or set to false

  • runAsNonRoot must be explicitly set to true

  • hostIPC, hostPID, and hostNetwork must be unset or set to false

  • No privilege escalation, root capabilities, or non-defalt Seccomp, AppArmor, or SELinux profiles are allowed. See the Baseline and Restricted Pod security standards for fields that cannot be explicitly set.

Helm Chart Overrides#

To override keys in your helm chart values.yml, you can provide the configuration parameter and supply corresponding key-value pairs in JSON format which you would like to be overridden when the function is deployed.

Example helm chart override#
 1curl -s -X POST "http://${GATEWAY_ADDR}/v2/nvcf/deployments/functions/${FUNCTION_ID}/versions/${FUNCTION_VERSION_ID}" \
 2 -H "Host: api.${GATEWAY_ADDR}" \
 3 -H "Authorization: Bearer $NVCF_TOKEN" \
 4 -H 'accept: application/json' \
 5 -H 'Content-Type: application/json' \
 6 -d '{
 7     "deploymentSpecifications": [{
 8         "gpu": "L40",
 9         "backend": "nvcf-default",
10         "maxInstances": 2,
11         "minInstances": 1,
12         "configuration": {
13         "key_one": "<value>",
14         "key_two": { "key_two_subkey_one": "<value>", "key_two_subkey_two": "<value>" }
15     ...
16     }]
17 }'