Working with Third-Party Registries
Working with Third-Party Registries
This guide covers how to configure and manage registry credentials for function container images and Helm charts in self-hosted NVCF deployments.
Overview
In NVCF, third-party registries refer to container registries used for hosting:
- Function container images - The containers that run your inference workloads
- Helm charts - Used for deploying helm chart functions
When a function is created or deployed, these credentials are used by different components:
- NVCF API - Stores and manages registry credentials, validates that images exist during function creation. See self-hosted-api for the full API specification.
- NVCA (Cluster Agent) - Renders Helm charts or pod specs for container functions and handles deployment lifecycle. Generates image pull credentials based on the registry type.
- Worker init container - Responsible for pulling the function container images during deployment.
Supported Registries
NVCF supports the following container registries:
- Amazon ECR (Elastic Container Registry)
- NVIDIA NGC (nvcr.io)
- Azure ACR (Azure Container Registry)
- VolcEngine CR (Volcano Engine Container Registry)
- JFrog Artifactory
- Harbor
Registry Credential Format
All registry credentials in NVCF use a base64-encoded username:password format:
The specific username and password values depend on your registry type. See the sections below for registry-specific instructions.
Bootstrap vs. Runtime Credentials
Understanding Initial Bootstrap
When you first deploy the NVCF control plane, registry credentials are configured in the secrets/<environment>-secrets.yaml file under api.accountBootstrap.registryCredentials. Example using ECR:
Example using Volcano Engine Container Registry:
These bootstrap credentials are loaded during the initial helmfile sync deployment and stored in the NVCF backend.
The registryHostname must be the registry base URL only (e.g., 779846807323.dkr.ecr.us-west-2.amazonaws.com), not including repository paths.
Managing Credentials After Deployment
After initial deployment, registry credentials can be managed dynamically using the NVCF API or CLI without redeploying the control plane:
- Add new registry credentials
- List existing credentials
- Update credentials (e.g., rotate secrets)
- Delete unused credentials
This is the recommended approach for credential rotation and adding or modifying registries post-deployment of the control plane.
Adding AWS ECR Registry Credentials
AWS ECR requires permanent IAM credentials (Access Key ID + Secret Access Key). Temporary SSO/STS credentials will not work.
Why SSO credentials don’t work: AWS SSO and assumed role credentials include a session token that must be passed alongside the access key and secret. The NVCF registry credential format (ACCESS_KEY_ID:SECRET_ACCESS_KEY) does not support session tokens, so temporary credentials will fail with UnrecognizedClientException.
NVCF supports both ECR Private and ECR Public registries:
- ECR Private:
<account-id>.dkr.ecr.<region>.amazonaws.com - ECR Public:
public.ecr.aws
Step 1: Create an IAM User
Step 2: Create and Attach ECR Policy
Create a least privilege IAM policy based on your ECR type.
For Private ECR:
About REPO_PREFIX: ECR repository names can include path-like prefixes for organization. The REPO_PREFIX scopes the IAM policy to only allow access to repositories matching that prefix.
Examples:
- If your repositories are named
nvcf/echo-server,nvcf/triton-server, etc., setREPO_PREFIX="nvcf" - If using the
nvcf-baseTerraform which creates repos likemy-cluster/nvcf-api, setREPO_PREFIX="my-cluster" - To allow access to all repositories in the account, set
REPO_PREFIX="*"(less secure)
The resulting IAM resource ARN arn:aws:ecr:<region>:<account>:repository/<prefix>/* will match all repositories starting with that prefix.
For Public ECR:
About REPO_PREFIX for Public ECR: ECR Public uses aliases instead of account-based paths. When you create a public repository, you choose an alias (e.g., nvidia, my-company). Images are then referenced as public.ecr.aws/<alias>/<repo-name>:<tag>.
Set REPO_PREFIX to your ECR Public alias to scope the policy to your repositories.
Step 3: Create Access Keys
Example response:
Save these credentials immediately! The secret access key is only shown once. Store it securely (e.g., in a password manager or secrets vault).
Step 4: Add ECR Credentials via CLI
Generate the base64-encoded credential and add it using the CLI.
For Private ECR:
For Public ECR:
Cleanup IAM Resources (if needed)
To remove the IAM user and associated resources:
Adding NGC Registry Credentials
NGC (NVIDIA GPU Cloud) uses API keys for authentication.
Step 1: Generate NGC API Key
- Navigate to https://ngc.nvidia.com/setup/api-key to generate a new API key or use an existing one if you have one saved.
- Copy the key (format:
nvapi-xxxxxxxxxxxx)
Step 2: Add NGC Credentials via CLI
For NGC, the username is always $oauthtoken:
Adding Volcano Engine Container Registry Credentials
Volcano Engine Container Registry (VCR) uses access keys for authentication.
Step 1: Get Volcano Engine Access Key
- Login to the Volcano Engine Console
- Go to the Access Key management page: https://console.volcengine.com/iam/keymanage.
- Copy the Access Key ID and Secret Access Key. (Click on Create Access Key if you don’t have one already.)
Step 2: Add VCR Credentials via CLI
Listing Registry Credentials
To view all configured registry credentials:
Deleting Registry Credentials
To remove a registry credential:
How NVCF Matches Registries to Images
When you create or deploy a function, NVCF automatically matches the containerImage path to the appropriate registry credentials based on the hostname.
Example: ECR Private
NVCF extracts the hostname 779846807323.dkr.ecr.us-west-2.amazonaws.com and looks for a matching registry credential.
Example: ECR Public
NVCF extracts the hostname public.ecr.aws and looks for a matching registry credential.
If credentials are found, they are used to:
- Validate the image exists (during function creation)
- Pull the image (during function deployment)
Troubleshooting
Incorrect registryHostname Format
The registryHostname in your credentials must exactly match the hostname portion of your container image path. Do not include repository paths in the hostname.
ECR Private:
- Correct:
779846807323.dkr.ecr.us-west-2.amazonaws.com - Incorrect:
779846807323.dkr.ecr.us-west-2.amazonaws.com/my-repo
ECR Public:
- Correct:
public.ecr.aws - Incorrect:
public.ecr.aws/my-alias
NGC:
- Correct:
nvcr.io - Incorrect:
nvcr.io/nvidia/pytorch
UnrecognizedClientException
Error:
Cause: This AWS-specific error indicates a malformed or invalid security token. Common causes include:
- Using temporary AWS credentials (SSO, STS assumed role) that include a session token, which NVCF’s credential format does not support
- Incorrectly formatted credentials (e.g., wrong base64 encoding, missing or extra characters)
- Expired or revoked access keys
Solution:
- Verify you are using permanent IAM user credentials (Access Key ID + Secret Access Key), not temporary credentials
- Re-generate the base64-encoded credential ensuring the format is exactly
ACCESS_KEY_ID:SECRET_ACCESS_KEYwith no trailing newlines etc. - If using SSO/assumed roles, create a dedicated IAM user instead. See ecr-registry-setup
Registry Not Found
Error:
Cause: The hostname in your containerImage doesn’t match any configured registry credential.
Solution:
- List your credentials with
./nvcf-cli registry list - Verify the hostname matches exactly (no trailing slashes, no repository paths)
- Add the missing registry credential if needed
Authentication Failed
Error:
Cause: The credentials are malformed or the password/key is incorrect.
Solution:
- Verify the credential format is
username:passwordbase64-encoded - For ECR: Use
ACCESS_KEY_ID:SECRET_ACCESS_KEY - For NGC: Use
$oauthtoken:NGC_API_KEY - Re-add the credential with correct values