> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/curator/llms.txt.
> For full documentation content, see https://docs.nvidia.com/nemo/curator/llms-full.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.nvidia.com/nemo/curator/_mcp/server.

# nemo_curator.metrics.utils

## Module Contents

### Functions

| Name                                                                                                                             | Description                                                                            |
| -------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------- |
| [`_get_all_discovery_paths`](#nemo_curator-metrics-utils-_get_all_discovery_paths)                                               | Extract all file paths from all file\_sd\_configs entries in the prometheus config.    |
| [`_is_process_running_from_pidfile`](#nemo_curator-metrics-utils-_is_process_running_from_pidfile)                               | Check if a process is running by reading its PID from a file and verifying it's alive. |
| [`_resolve_metrics_dir`](#nemo_curator-metrics-utils-_resolve_metrics_dir)                                                       | Resolve the metrics directory, defaulting to DEFAULT\_NEMO\_CURATOR\_METRICS\_PATH.    |
| [`_write_ray_default_dashboards`](#nemo_curator-metrics-utils-_write_ray_default_dashboards)                                     | Generate and write Ray's default Grafana dashboards to the dashboards directory.       |
| [`add_ray_prometheus_metrics_service_discovery`](#nemo_curator-metrics-utils-add_ray_prometheus_metrics_service_discovery)       | Add the ray prometheus metrics service discovery to the prometheus config.             |
| [`download_and_extract_prometheus`](#nemo_curator-metrics-utils-download_and_extract_prometheus)                                 | Download the prometheus tarball and extract it to the metrics directory.               |
| [`download_grafana`](#nemo_curator-metrics-utils-download_grafana)                                                               | Download the grafana tarball and extract it to the metrics directory.                  |
| [`get_prometheus_port`](#nemo_curator-metrics-utils-get_prometheus_port)                                                         | Get the port number that Prometheus is running on by reading the port file.            |
| [`is_grafana_running`](#nemo_curator-metrics-utils-is_grafana_running)                                                           | Check if Grafana is currently running for this metrics instance.                       |
| [`is_prometheus_running`](#nemo_curator-metrics-utils-is_prometheus_running)                                                     | Check if Prometheus is currently running for this metrics instance.                    |
| [`launch_grafana`](#nemo_curator-metrics-utils-launch_grafana)                                                                   | Launch the grafana server.                                                             |
| [`remove_ray_prometheus_metrics_service_discovery`](#nemo_curator-metrics-utils-remove_ray_prometheus_metrics_service_discovery) | Remove the ray prometheus metrics service discovery from the prometheus config.        |
| [`run_prometheus`](#nemo_curator-metrics-utils-run_prometheus)                                                                   | Run the prometheus server.                                                             |
| [`write_grafana_configs`](#nemo_curator-metrics-utils-write_grafana_configs)                                                     | Write the grafana configs to the grafana directory.                                    |

### API

```python
nemo_curator.metrics.utils._get_all_discovery_paths(
    prometheus_config: dict
) -> list[str]
```

Extract all file paths from all file\_sd\_configs entries in the prometheus config.

```python
nemo_curator.metrics.utils._is_process_running_from_pidfile(
    pid_file_path: str
) -> bool
```

Check if a process is running by reading its PID from a file and verifying it's alive.

```python
nemo_curator.metrics.utils._resolve_metrics_dir(
    metrics_dir: str | None
) -> str
```

Resolve the metrics directory, defaulting to DEFAULT\_NEMO\_CURATOR\_METRICS\_PATH.

```python
nemo_curator.metrics.utils._write_ray_default_dashboards(
    dashboards_path: str
) -> None
```

Generate and write Ray's default Grafana dashboards to the dashboards directory.

```python
nemo_curator.metrics.utils.add_ray_prometheus_metrics_service_discovery(
    ray_temp_dir: str,
    metrics_dir: str | None = None
) -> None
```

Add the ray prometheus metrics service discovery to the prometheus config.

```python
nemo_curator.metrics.utils.download_and_extract_prometheus(
    metrics_dir: str | None = None,
    os_type = None,
    architecture = None,
    prometheus_version = None
) -> str
```

Download the prometheus tarball and extract it to the metrics directory.

```python
nemo_curator.metrics.utils.download_grafana(
    metrics_dir: str | None = None
) -> str
```

Download the grafana tarball and extract it to the metrics directory.

```python
nemo_curator.metrics.utils.get_prometheus_port(
    metrics_dir: str | None = None
) -> int
```

Get the port number that Prometheus is running on by reading the port file.

```python
nemo_curator.metrics.utils.is_grafana_running(
    metrics_dir: str | None = None
) -> bool
```

Check if Grafana is currently running for this metrics instance.

```python
nemo_curator.metrics.utils.is_prometheus_running(
    metrics_dir: str | None = None
) -> bool
```

Check if Prometheus is currently running for this metrics instance.

```python
nemo_curator.metrics.utils.launch_grafana(
    grafana_dir: str,
    grafana_ini_path: str,
    grafana_web_port: int,
    metrics_dir: str | None = None
) -> None
```

Launch the grafana server.

```python
nemo_curator.metrics.utils.remove_ray_prometheus_metrics_service_discovery(
    ray_temp_dir: str,
    metrics_dir: str | None = None
) -> None
```

Remove the ray prometheus metrics service discovery from the prometheus config.

```python
nemo_curator.metrics.utils.run_prometheus(
    prometheus_dir: str,
    prometheus_web_port: int,
    metrics_dir: str | None = None
) -> None
```

Run the prometheus server.

```python
nemo_curator.metrics.utils.write_grafana_configs(
    grafana_web_port: int,
    prometheus_web_port: int,
    metrics_dir: str | None = None
) -> str
```

Write the grafana configs to the grafana directory.