Export Settings

DGX Cloud Lepton provides a way to export your logs and metrics data to your own storage or analytics platform.

Note

Only workspace admins can setup the export settings.

How to Setup

Login to the Lepton Dashboard.
Navigate to the Observability settings page
You can see two export configurations:
- Logs Export: Export logs for all workloads created in this workspace. Logs will be streamed in real time to the specified external destination.
- Metrics Export: Collect and export all custom metrics instrumented by developers within their applications from the workloads created in this workspace.

Logs Configuration

Destination Type

Currently, we only support exporting logs to Datadog.

Endpoint URL

The endpoint URL is the URL of your Datadog instance, you can learn more about how to get the endpoint URL from Datadog documentation.

Secret Key

Select the secret key of your Datadog instance, you can add the secret key directly here or under Settings -> Secrets in Lepton dashboard, refer to this guide for more details.

For Datadog secret key, refer to the official documentation for more details.

Metrics Configuration

Destination Type

Currently, we only support exporting metrics to Datadog.

Endpoint URL

The endpoint URL is the URL of your Datadog instance, you can learn more about how to get the endpoint URL from Datadog documentation.

Secret Key

Select the secret key of your Datadog instance, you can add the secret key directly here or under Settings -> Secrets in Lepton dashboard, refer to this guide for more details.

For Datadog secret key, refer to the official documentation for more details.

Enable Metrics Export When Creating Workload

For workloads that want to collect and export metrics, you need to enable the metrics export when creating the workload and specify the export port and path.

You can specify the export port and path under Advanced Configuration when creating the workload.

The Lepton platform expects that the application of the workload provides an HTTP endpoint which provides metrics in the format of prometheus on the specified port and path. Typical applications rely on the official prometheus library.

In the case of python application, it should handle HTTP requests like below example code. In this case, the export port and path should be configured as 8000 and /metrics.

from prometheus_client import Gauge, generate_latest, CONTENT_TYPE_LATEST
from http.server import BaseHTTPRequestHandler, HTTPServer
import random
import time

response_time_gauge = Gauge('http_response_time_seconds', 'HTTP response time in seconds')

class MetricsHandler(BaseHTTPRequestHandler):
    def do_GET(self):
        if self.path == '/metrics':
            metrics_data = generate_latest()

            self.send_response(200)
            self.send_header('Content-type', CONTENT_TYPE_LATEST)
            self.end_headers()
            self.wfile.write(metrics_data)
        else:
            self.send_response(404)
            self.end_headers()
            self.wfile.write(b"404 Not Found")

def process_request():
    response_time = random.uniform(0.1, 0.5)
    response_time_gauge.set(response_time)
    time.sleep(response_time)

if __name__ == '__main__':
    server = HTTPServer(('0.0.0.0', 8080), MetricsHandler)
    
    while True:
        process_request()
        server.handle_request()

Example output from the endpoint includes:

# HELP http_response_time_seconds HTTP response time in seconds
# TYPE http_response_time_seconds gauge
http_response_time_seconds 0.4385008384432796

http_response_time_seconds is a metric emitted by the above application.

You can find the detail of the prometheus python library in the official documentation.

Labels of logs and metrics

This table contains labels attached to exported logs and metrics which are specific to Lepton platform. These labels will be available in the export destination (e.g. Datadog) and can be used for filtering logs and metrics.

Label	Description
`lepton_job_name`	Job name, only available for metrics from lepton job
`lepton_deployment_name`	Deployment name, only available for metrics from lepton endpoint and dev pod
`lepton_replica_id`	Replica ID
`lepton_workspace`	Lepton workspace ID

1. Bring Your Own Compute

1. Endpoint

2. Dev Pod

3. Batch Job

4. Node Group

7. Observability

9. Workspace

1. Dev Pod

2. Batch Job

3. Endpoint

4. RayCluster

5. Connections

1. API Reference

2. CLI Reference

3. Limits

Export Settings

How to Setup

Logs Configuration

Destination Type

Endpoint URL

Secret Key

Metrics Configuration

Destination Type

Endpoint URL

Secret Key

Enable Metrics Export When Creating Workload

Labels of logs and metrics

Corporate Info

NVIDIA Developer

Resources