Create Dedicated Endpoint from NVIDIA NIM
For enhanced performance and seamless compatibility, NVIDIA-optimized models using NIM container registry are also available on DGX Cloud Lepton.
Prerequisites
The NVIDIA-optimized models require an NVIDIA account with access to the NIM container registry.
NVIDIA Registry
You need to have an NVIDIA account with access to the NIM container registry, and configure the registry auth key on DGX Cloud Lepton.
Refer to this guide for more details. Once the registry auth key is created, you can create a private registry via Settings > Registries > New Registry auth.

Choose NVIDIA as the registry type, and paste the registry auth key in the API Key field.

NGC API Key
Besides the registry auth key, you also need to have an NGC API key. Navigate to the NGC API key creation page, click on Generate Personal Key.
In the Service Included field, select Public API Endpoints.

Then you can store the NGC API key on DGX Cloud Lepton as a secret.
Create Endpoint from NVIDIA NIM
Navigate to the create endpoint page on the dashboard.
For Endpoint name, you can set it to nim-endpoint
or any other name you prefer.
For Model, click on the Load from Hugging Face button and type keywords to search for the model you want to use. For example, you can use meta-llama/Llama-3.1-8B-Instruct
. If the model is gated, you need to provide a Hugging Face token to access it. You can create a new token in your Hugging Face account and save it as a secret in your workspace.
For Resource, choose the appropriate resource for your model based on the model size.
For NIM configuration, select the NVIDIA registry auth you created as the registry auth, and choose the NGC API key you saved as a secret in your workspace.
For other endpoint-related configurations, refer to this guide.
For NIM engine-related configurations, refer to this guide. By adding corresponding environment variables, you can configure the NIM engine according to your needs.
Once all configurations are complete, click on the Create Endpoint button to create the endpoint.