DGX Cloud Lepton supports the REST API protocol and includes a Python SDK for interacting with workspaces. Common tasks include monitoring and launching batch jobs and endpoints. This document provides an overview of how to interact with the Python SDK for DGX Cloud Lepton.
Installation and authentication
First, you must install the Python SDK and authenticate with your workspace. Install the SDK with:
pip3 install -U leptonai
Next, authenticate with your workspace:
lep login
If credentials have not been previously provided, a browser page will open prompting you to log in to your DGX Cloud Lepton account. You will then be redirected to a credentials page which will display your login token. Copy the command and paste it back in the terminal where you previously ran lep login to authenticate.
You only need to authenticate once locally as long as your credentials remain valid.
Validate installation
After authentication, validate the installation by running:
lep workspace list
This will list all of your available workspaces and should look similar to the following if authentication was successful:
Current workspace: xxxxxxxx
All workspaces:
┏━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ ID       ┃ Name                 ┃ URL                                                               ┃ Auth Token ┃
┡━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ xxxxxxxx │ My-DGXC-Lepton-WS    │ https://gateway.dgxc-lepton.nvidia.com/api/v2/workspaces/xxxxxxxx │ nv****XX   │
└──────────┴──────────────────────┴───────────────────────────────────────────────────────────────────┴────────────┘
Basic Python SDK flow
Nearly all workflows using the Python SDK follow the same basic flow:
- Initialize a client
- Define the task to perform
- Execute the task
The following sections break down these steps and provide a complete example.
Initialize a client
Initializing the client is straightforward - simply import the Lepton API module and instantiate the client:
from leptonai.api.v2.client import APIClient
client = APIClient()
The client variable can be reused for multiple tasks.
Define the task to perform
Most tasks available to users on DGX Cloud Lepton are supported via the SDK. The following API resources are accessible:
- Batch Jobs
- Endpoints
- Events
- Health Checks
- Ingress
- Logs
- Monitoring
- Node Groups
- Queue
- Readiness
- Replicas
- Secrets
- Templates
- Workspaces
Each of these resources have a specific template they expect for the API request. For example, the Batch Jobs API expects a job to have a leptonai.api.v1.types.job.LeptonJob type for submission. Similarly, Endpoints (also known as "Deployments" in the SDK), expect a leptonai.api.v1.types.deployment.LeptonDeployment object for submission. The list of API specs can be found here. Open the file for the specific task you need and review its specification.
For a batch job, you need a LeptonJob object with a LeptonJobUserSpec. Review the LeptonJobUserSpec in the Python script for the list of settings which are required for launching a job. The following is a quick example of defining a batch job spec (this expands upon the previous code which instantiated the client):
# Import the batch job API specifications
from leptonai.api.v1.types.affinity import LeptonResourceAffinity
from leptonai.api.v1.types.common import Metadata
from leptonai.api.v1.types.deployment import LeptonContainer
from leptonai.api.v1.types.job import LeptonJob, LeptonJobUserSpec
# Get the ID of the node group to run on
node_groups = client.nodegroup.list_all()
node_group_map = {ng.metadata.name: ng for ng in node_groups}
# Replace "my-dgxc-lepton-node-group" with the name of your node group
node_group_id = node_group_map["my-dgxc-lepton-node-group"]
# Get a list of all node IDs available in the node group
valid_node_ids = set()
node_ids = client.nodegroup.list_nodes(node_group_id)
for node in node_ids:
    valid_node_ids.add(node.metadata.id_)
job_spec = LeptonJobUserSpec(
    resource_shape="my-resource-shape",  # Specify your resource shape here
    affinity=LeptonResourceAffinity(
        allowed_dedicated_node_groups=[node_group_id.metadata.id_],
        allowed_nodes_in_node_group=valid_node_ids,
    ),
    container=LeptonContainer(
        image="my-container-image:tag",  # Specify the container here
        command=["my", "command", "to", "run"],  # Specify the container command here
    ),
    completions=1,
    parallelism=1,  # Specify the number of workers here
)
job = LeptonJob(
    spec=job_spec,
    metadata=Metadata(id="my-job-name")  # Specify the job name here
)
The example above does the following:
- Imports all required modules
- Finds the ID of the specified node group - Update the listed node group for your specific needs
- Get the list of the node IDs for all nodes in your node group - this specifies which nodes the job can be scheduled on
- Specify the job spec - this includes defining the resource shape, container, command, and number of workers
- Define the job by passing the job spec and giving it a name
Execute the task
After the job has been defined in the previous step, it can be launched using the client. Since we are launching a job, we would use:
launched_job = client.job.create(job)
This adds the job to the queue and schedules it when resources become available. The job should appear in the UI after the create function runs.
Example job submission via SDK
The following is a self-contained example of launching a batch job using the Python SDK following the flow outlined earlier.
from leptonai.api.v2.client import APIClient
from leptonai.api.v1.types.affinity import LeptonResourceAffinity
from leptonai.api.v1.types.common import Metadata
from leptonai.api.v1.types.deployment import LeptonContainer
from leptonai.api.v1.types.job import LeptonJob, LeptonJobUserSpec
client = APIClient()
node_groups = client.nodegroup.list_all()
node_group_map = {ng.metadata.name: ng for ng in node_groups}
# Replace "my-dgxc-lepton-node-group" with the name of your node group
node_group_id = node_group_map["my-dgxc-lepton-node-group"]
valid_node_ids = set()
node_ids = client.nodegroup.list_nodes(node_group_id)
for node in node_ids:
    valid_node_ids.add(node.metadata.id_)
job_spec = LeptonJobUserSpec(
    resource_shape="my-resource-shape",  # Specify your resource shape here
    affinity=LeptonResourceAffinity(
        allowed_dedicated_node_groups=[node_group_id.metadata.id_],
        allowed_nodes_in_node_group=valid_node_ids,
    ),
    container=LeptonContainer(
        image="nvcr.io/nvidia/pytorch:25.06-py3",  # Specify the container here
        command=["echo", "hello world!"],  # Specify the container command here
    ),
    completions=1,
    parallelism=1,
)
job = LeptonJob(
    spec=job_spec,
    metadata=Metadata(id="test-python-sdk")
)
launched_job = client.job.create(job)
Save the script above to a file such as run.py and launch it with:
python3 run.py
Once the script completes, the launched job should be viewable in the UI.