Install on Slurm (bare metal)

View as Markdown

Install Topograph on a Slurm head node so it can generate topology configuration (topology.conf or per-partition topology.yaml) for the Slurm controller to consume.

Prerequisites

  • Slurm cluster with a head node you can install system packages on
  • Go and make to build the package from source (see go.mod for the exact Go version), or a pre-built Debian/RPM package if your organization distributes one
  • A supported provider for your environment — see the provider documentation for per-provider setup

Install

Clone the repo and build a native package for your distribution:

$git clone https://github.com/NVIDIA/topograph.git
$cd topograph
$
$make deb # Debian / Ubuntu — produces .deb under dist/
$# or
$make rpm # RHEL / Rocky / SUSE — produces .rpm under dist/

Install the resulting package:

$sudo dpkg -i dist/topograph_*.deb # Debian / Ubuntu
$# or
$sudo rpm -ivh dist/topograph-*.rpm # RHEL / Rocky / SUSE

The package installs the service but does not start it. Edit /etc/topograph/topograph-config.yaml to set at minimum:

1http:
2 port: 49021
3provider: <provider> # aws, gcp, oci, nebius, netq, infiniband-bm, ...
4engine: slurm
5requestAggregationDelay: 15s

Then enable and start the service:

$sudo systemctl enable --now topograph.service

Verify

Check that the service is running and the API is reachable:

$curl http://localhost:49021/healthz

HTTP 200 means the API server is up.

Where to go next