FAQ#

Overview#

This page contains frequently asked questions and answers to those questions. In some cases, steps for debugging and solving issues are provided. Any questions that are not addressed on this page or Known Issues should be brought up on the official forum.

Prerequisite FAQs

Failed to fetch blueprint: 403 Forbidden

Deployment FAQs

How do I monitor progress of VSS helm chart deployment?
What are some of the common errors users may encounter trying to deploy VSS?
Where to see VSS logs?
Why do I see VSS deployment pod restart?

Prerequisite FAQs#

Failed to fetch blueprint: 403 Forbidden#

Error

Error: failed to fetch https://helm.ngc.nvidia.com/nvidia/blueprint/charts/nvidia-blueprint-vss-2.0.0.tgz : 403 Forbidden

This can occur for multiple reasons, most commonly:

The account does not have VSS EA enablement
The wrong NGC API key was used

To debug this issue, first ensure you followed the steps for setting up a new account with EA enablement:

When you were approved into the program, you received two emails: 1) Confirming your acceptance into the VSS EA program and 2) “Welcome to NVIDIA NGC”.
Click on the link in the “Welcome to NVIDIA NGC” email.

Note

If you get an INVALID_REQUEST status when clicking on the link, that means you have already created a new NGC account with enablement and must select that account when logging in.
After clicking the link, you are brought to an accounts page where you must select Create New NVIDIA Cloud Account, as this specific Cloud Account will be the one with the VSS EA enablement. Do not click an existing account!

Note

If you click on an existing account with “owner” access, you will get an INTERNAL_ERROR.
Select a distinct name for your cloud account that will be associated with your VSS enablement.
You should now be able to follow the steps to Obtain NGC API Key.

If you followed the above steps and still receive the 403 Forbidden error, please check that:

You have the correct Organization/Team selected in the top right.
The selected Organization/Team has the NVIDIA VSS Early Access subscription and it is active ( Organzation>Subscriptions )

Deployment FAQs#

How do I monitor progress of VSS helm chart deployment?#

Use microk8s kubectl get po -A and microk8s kubectl describe po commands to see progress of VSS deployment.

Check Default Deployment Topology and Models in Use to see the names of pods involved in VSS deployment.

You can use microk8s kubectl describe po POD_NAME to individually check status while its initializing for each pod.

The unique Pod name can be found using microk8s kubectl get po -A.

What are some of the common errors users may encounter trying to deploy VSS?#

Insufficient VRAM on deployment GPUs or insufficient CPU RAM.

Check Prerequisites for more info on exact memory requirement.

OpenAI API Key not having GPT-4o model access.

Make sure you have access to GPT-4o model API endpoint at https://platform.openai.com/apps.

Make sure you have enough credits available at Settings > Usage and be educated on rate limits at Settings > Limits. https://platform.openai.com/settings/organization/usage

Incorrect version of NVIDIA Drivers.

Check Prerequisites for more info on exact NVIDIA driver requirement and link to download driver.

GPU operator issues were observed with newer driver versions.

Invalid access to NGC.

Make sure you provide valid NGC API Key by setting up secrets as mentioned in Create Required Secrets.

Use microk8s kubectl get po -A and microk8s kubectl describe po commands to see progress of VSS deployment.

Where to see VSS logs?#

VSS logs are written to the /tmp/via-logs/ directory inside the running VSS container in the pod: vss-vss-deployment-*.

Refer to VSS Observability for more information on VSS observability.

Why do I see VSS deployment pod restart?#

If for any reason VSS pod errors out, it should restart and try to self correct.

One reason for this is an LLM Exception error showing VSS exceeded max-retries internally trying to connect to the LLM pod.

More details on this in Known Issues.

If this happens, please wait for additional few minutes and observe if a restart fixes the issue.

Users can monitor this using sudo watch microk8s kubectl get pod.