DGX Cloud Lepton supports the REST API protocol and includes a Python SDK for interacting with workspaces. Common tasks include monitoring and launching batch jobs and endpoints. This document provides an overview of how to interact with the Python SDK for DGX Cloud Lepton.

Installation and authentication

First, you must install the Python SDK and authenticate with your workspace. Install the SDK with:

pip3 install -U leptonai

Next, authenticate with your workspace:

lep login

If credentials have not been previously provided, a browser page will open prompting you to log in to your DGX Cloud Lepton account. You will then be redirected to a credentials page which will display your login token. Copy the command and paste it back in the terminal where you previously ran lep login to authenticate.

You only need to authenticate once locally as long as your credentials remain valid.

Validate installation

After authentication, validate the installation by running:

lep workspace list

This will list all of your available workspaces and should look similar to the following if authentication was successful:

Current workspace: xxxxxxxx
All workspaces:
┏━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ ID       ┃ Name                 ┃ URL                                                               ┃ Auth Token ┃
┡━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ xxxxxxxx │ My-DGXC-Lepton-WS    │ https://gateway.dgxc-lepton.nvidia.com/api/v2/workspaces/xxxxxxxx │ nv****XX   │
└──────────┴──────────────────────┴───────────────────────────────────────────────────────────────────┴────────────┘

Basic Python SDK flow

Nearly all workflows using the Python SDK follow the same basic flow:

Initialize a client
Define the task to perform
Execute the task

The following sections break down these steps and provide a complete example.

Initialize a client

Initializing the client is straightforward - simply import the Lepton API module and instantiate the client:

from leptonai.api.v2.client import APIClient

client = APIClient()

The client variable can be reused for multiple tasks.

Define the task to perform

Most tasks available to users on DGX Cloud Lepton are supported via the SDK. The following API resources are accessible:

Batch Jobs
Endpoints
Events
Health Checks
Ingress
Logs
Monitoring
Node Groups
Queue
Readiness
Replicas
Secrets
Templates
Workspaces

Each of these resources have a specific template they expect for the API request. For example, the Batch Jobs API expects a job to have a leptonai.api.v1.types.job.LeptonJob type for submission. Similarly, Endpoints (also known as "Deployments" in the SDK), expect a leptonai.api.v1.types.deployment.LeptonDeployment object for submission. The list of API specs can be found here. Open the file for the specific task you need and review its specification.

For a batch job, you need a LeptonJob object with a LeptonJobUserSpec. Review the LeptonJobUserSpec in the Python script for the list of settings which are required for launching a job. The following is a quick example of defining a batch job spec (this expands upon the previous code which instantiated the client):

# Import the batch job API specifications
from leptonai.api.v1.types.affinity import LeptonResourceAffinity
from leptonai.api.v1.types.common import Metadata
from leptonai.api.v1.types.deployment import LeptonContainer
from leptonai.api.v1.types.job import LeptonJob, LeptonJobUserSpec

# Get the ID of the node group to run on
node_groups = client.nodegroup.list_all()
node_group_map = {ng.metadata.name: ng for ng in node_groups}
# Replace "my-dgxc-lepton-node-group" with the name of your node group
node_group_id = node_group_map["my-dgxc-lepton-node-group"]

# Get a list of all node IDs available in the node group
valid_node_ids = set()
node_ids = client.nodegroup.list_nodes(node_group_id)
for node in node_ids:
    valid_node_ids.add(node.metadata.id_)

job_spec = LeptonJobUserSpec(
    resource_shape="my-resource-shape",  # Specify your resource shape here
    affinity=LeptonResourceAffinity(
        allowed_dedicated_node_groups=[node_group_id.metadata.id_],
        allowed_nodes_in_node_group=valid_node_ids,
    ),
    container=LeptonContainer(
        image="my-container-image:tag",  # Specify the container here
        command=["my", "command", "to", "run"],  # Specify the container command here
    ),
    completions=1,
    parallelism=1,  # Specify the number of workers here
)

job = LeptonJob(
    spec=job_spec,
    metadata=Metadata(id="my-job-name")  # Specify the job name here
)

The example above does the following:

Imports all required modules
Finds the ID of the specified node group - Update the listed node group for your specific needs
Get the list of the node IDs for all nodes in your node group - this specifies which nodes the job can be scheduled on
Specify the job spec - this includes defining the resource shape, container, command, and number of workers
Define the job by passing the job spec and giving it a name

Execute the task

After the job has been defined in the previous step, it can be launched using the client. Since we are launching a job, we would use:

launched_job = client.job.create(job)

This adds the job to the queue and schedules it when resources become available. The job should appear in the UI after the create function runs.

Example job submission via SDK

The following is a self-contained example of launching a batch job using the Python SDK following the flow outlined earlier.

from leptonai.api.v2.client import APIClient
from leptonai.api.v1.types.affinity import LeptonResourceAffinity
from leptonai.api.v1.types.common import Metadata
from leptonai.api.v1.types.deployment import LeptonContainer
from leptonai.api.v1.types.job import LeptonJob, LeptonJobUserSpec

client = APIClient()

node_groups = client.nodegroup.list_all()
node_group_map = {ng.metadata.name: ng for ng in node_groups}
# Replace "my-dgxc-lepton-node-group" with the name of your node group
node_group_id = node_group_map["my-dgxc-lepton-node-group"]

valid_node_ids = set()
node_ids = client.nodegroup.list_nodes(node_group_id)
for node in node_ids:
    valid_node_ids.add(node.metadata.id_)

job_spec = LeptonJobUserSpec(
    resource_shape="my-resource-shape",  # Specify your resource shape here
    affinity=LeptonResourceAffinity(
        allowed_dedicated_node_groups=[node_group_id.metadata.id_],
        allowed_nodes_in_node_group=valid_node_ids,
    ),
    container=LeptonContainer(
        image="nvcr.io/nvidia/pytorch:25.06-py3",  # Specify the container here
        command=["echo", "hello world!"],  # Specify the container command here
    ),
    completions=1,
    parallelism=1,
)

job = LeptonJob(
    spec=job_spec,
    metadata=Metadata(id="test-python-sdk")
)

launched_job = client.job.create(job)

Save the script above to a file such as run.py and launch it with:

python3 run.py

Once the script completes, the launched job should be viewable in the UI.

1. Bring Your Own Compute

1. Endpoint

2. Dev Pod

3. Batch Job

4. Node Group

8. Workspace

1. Dev Pod

2. Batch Job

1. API Reference

2. CLI Reference

3. Limits