FAQ#
Overview#
This page contains frequently asked questions and answers to those questions. In some cases, steps for debugging and solving issues are provided. Any questions that are not addressed on this page or Known Issues should be brought up on the official forum.
Prerequisite FAQs
Deployment FAQs
Prerequisite FAQs#
Failed to fetch blueprint: 403 Forbidden#
Error |
|
This can occur for multiple reasons, most commonly:
The account does not have VSS EA enablement
The wrong NGC API key was used
To debug this issue, first ensure you followed the steps for setting up a new account with EA enablement:
When you were approved into the program, you received two emails: 1) Confirming your acceptance into the VSS EA program and 2) “Welcome to NVIDIA NGC”.
Click on the link in the “Welcome to NVIDIA NGC” email.
Note
If you get an
INVALID_REQUEST
status when clicking on the link, that means you have already created a new NGC account with enablement and must select that account when logging in.After clicking the link, you are brought to an accounts page where you must select Create New NVIDIA Cloud Account, as this specific Cloud Account will be the one with the VSS EA enablement. Do not click an existing account!
Note
If you click on an existing account with “owner” access, you will get an
INTERNAL_ERROR
.Select a distinct name for your cloud account that will be associated with your VSS enablement.
You should now be able to follow the steps to Obtain NGC API Key.
If you followed the above steps and still receive the 403 Forbidden
error, please check that:
You have the correct Organization/Team selected in the top right.
The selected Organization/Team has the NVIDIA VSS Early Access subscription and it is active ( Organzation>Subscriptions )
Deployment FAQs#
How do I monitor progress of VSS helm chart deployment?#
Use microk8s kubectl get po -A
and microk8s kubectl describe po
commands to see progress of VSS deployment.
Check Default Deployment Topology and Models in Use to see the names of pods involved in VSS deployment.
You can use microk8s kubectl describe po POD_NAME
to individually check status while its initializing for each pod.
The unique Pod name can be found using microk8s kubectl get po -A
.
What are some of the common errors users may encounter trying to deploy VSS?#
Insufficient VRAM on deployment GPUs or insufficient CPU RAM.
Check Prerequisites for more info on exact memory requirement.
OpenAI API Key not having GPT-4o model access.
Make sure you have access to GPT-4o model API endpoint at https://platform.openai.com/apps.
Make sure you have enough credits available at Settings > Usage and be educated on rate limits at Settings > Limits. https://platform.openai.com/settings/organization/usage
Incorrect version of NVIDIA Drivers.
Check Prerequisites for more info on exact NVIDIA driver requirement and link to download driver.
GPU operator issues were observed with newer driver versions.
Invalid access to NGC.
Make sure you provide valid NGC API Key by setting up secrets as mentioned in Create Required Secrets.
Use microk8s kubectl get po -A
and microk8s kubectl describe po
commands to see progress of VSS deployment.
Where to see VSS logs?#
VSS logs are written to the /tmp/via-logs/
directory inside the running VSS container in the pod: vss-vss-deployment-*
.
Refer to VSS Observability for more information on VSS observability.
Why do I see VSS deployment pod restart?#
If for any reason VSS pod errors out, it should restart and try to self correct.
One reason for this is an LLM Exception error showing VSS exceeded max-retries internally trying to connect to the LLM pod.
More details on this in Known Issues.
If this happens, please wait for additional few minutes and observe if a restart fixes the issue.
Users can monitor this using sudo watch microk8s kubectl get pod
.