Configuration and API

View as Markdown

Configuration

Topograph accepts its configuration file path using the -c command-line parameter. The configuration file is a YAML document. A sample configuration file is located at config/topograph-config.yaml.

The configuration file supports the following parameters:

1# serving topograph endpoint
2http:
3 # port: specifies the port on which the API server will listen (required).
4 port: 49021
5 # ssl: enables HTTPS protocol if set to `true` (optional).
6 ssl: false
7
8# provider: the provider that topograph will use (optional)
9# Valid options include "aws", "oci", "gcp", "nebius", "netq", "dra", "infiniband-k8s", "infiniband-bm" or "test".
10# Can be overridden if the provider is specified in a topology request to topograph
11provider: test
12
13# engine: the engine that topograph will use (optional)
14# Valid options include "slurm", "k8s", or "slinky".
15# Can be overridden if the engine is specified in a topology request to topograph
16engine: slurm
17
18# requestAggregationDelay: defines the delay before processing a request (required).
19# Topograph aggregates multiple sequential requests within this delay into a single request,
20# processing only if no new requests arrive during the specified duration.
21requestAggregationDelay: 15s
22
23# pageSize: sets the page size for topology requests against a CSP API (optional).
24pageSize: 100
25
26# ssl: specifies the paths to the TLS certificate, private key,
27# and CA certificate (required if `http.ssl=true`).
28ssl:
29 cert: /etc/topograph/ssl/server-cert.pem
30 key: /etc/topograph/ssl/server-key.pem
31 ca_cert: /etc/topograph/ssl/ca-cert.pem
32
33# credentialsPath: specifies the path to a YAML file containing API credentials (optional).
34# When using credentials in Kubernetes-based engines ("k8s" or "slinky"),
35# the secret file must be named `credentials.yaml`. For example:
36# `kubectl create secret generic <secret-name> --from-file=credentials.yaml=<path to credentials>`
37# For more details about credential configuration, refer to the docs/providers section.
38# credentialsPath:
39
40# env: environment variable names and values to inject into Topograph's shell (optional).
41# The `PATH` variable, if provided, will append the specified value to the existing `PATH`.
42# env:
43# SLURM_CONF: /etc/slurm/slurm.conf
44# PATH:

API

Topograph exposes three endpoints for interacting with the service. Below are the details of each endpoint:

1. Health Endpoint

  • URL: GET http://<server>:<port>/healthz
  • Description: This endpoint verifies the service status. It returns a “200 OK” HTTP response if the service is reachable.

2. Topology Request Endpoint

  • URL: POST http://<server>:<port>/v1/generate

  • Description: This endpoint is used to request a new cluster topology.

  • Payload: The request body is a JSON object organized into three top-level sections:

    • provider: (optional) Selects the topology source and provides any provider-specific authentication or parameters.
      • name: (optional) A string specifying the Service Provider, such as aws, oci, gcp, nebius, netq, dra, infiniband-k8s, infiniband-bm or test. This parameter will override the provider set in the topograph config.
      • creds: (optional) A key-value map with provider-specific parameters for authentication.
      • params: (optional) A key-value map with provider-specific parameters. The test provider uses these parameters for response simulation; for complete behavior and examples, see Test Mode and Test Provider.
    • engine: (optional) Selects the topology output and provides any engine-specific parameters.
      • name: (optional) A string specifying the topology output, either slurm, k8s, or slinky. This parameter will override the engine set in the topograph config.
      • params: (optional) A key-value map with engine-specific parameters.
        • plugin: (optional) Used in: [slurm, slinky]. A string specifying the cluster-wide topology plugin: topology/tree or topology/block. For slurm, this defaults to topology/tree when neither plugin nor topologies is set. Do not set plugin together with topologies.
        • blockSizes: (optional) Used in: [slurm, slinky]. An array of block sizes for topology/block.
        • topologyConfigPath: Used in: [slurm, slinky]. Optional for slurm; required for slinky. For slurm, a file path for the topology configuration; if omitted, the topology config content is returned in the HTTP response. For slinky, the key for the topology config in the ConfigMap.
        • topologies: (optional) Used in: [slurm, slinky]. A map of named per-partition topology settings. Do not set top-level plugin together with topologies.
          • plugin: Used in: [slurm, slinky]. A required string specifying the per-partition topology plugin: topology/tree, topology/block, or topology/flat.
          • blockSizes: (optional) Used in: [slurm, slinky]. An array of block sizes for topology/block.
          • nodes: (optional) Used in: [slurm, slinky]. An explicit list of SLURM nodes for this topology. If omitted, Topograph can discover membership from podSelector (slinky only) or partition.
          • partition: (optional) Used in: [slurm, slinky]. A SLURM partition name used to discover nodes with scontrol show partition when nodes is not set. For slinky, this fallback is used only when the topology entry does not set podSelector.
          • podSelector: (optional) Used in: [slinky]. A Kubernetes label selector for slurmd pods in this partition. nodes and podSelector are mutually exclusive on the same topology entry.
          • clusterDefault: (optional) Used in: [slurm, slinky]. If true, marks this topology as the default for nodes not assigned to another topology; commonly used with plugin: topology/flat.
        • reconfigure: (optional) Used in: [slurm]. If true, invoke scontrol reconfigure after topology config is generated. Default false.
        • namespace: Used in: [slinky]. The required namespace where the SLURM cluster is running.
        • podSelector: Used in: [slinky]. A required Kubernetes label selector for pods running SLURM nodes.
        • nodeSelector: (optional) Used in: [k8s, slinky]. A Kubernetes node label map that filters which nodes participate in topology generation.
        • topologyConfigmapName: Used in: [slinky]. The required name of the ConfigMap containing the topology config.
        • useDynamicNodes: (optional) Used in: [slinky]. If true, Kubernetes nodes matched by the Node Selector will be annotated with the topology spec.
        • configUpdateMode: (optional) Used in: [slinky]. By default, the full topology YAML is written in the Slurm ConfigMap. skeleton-only overrides to include switches or blocks only (no node lines); none skips updating the topology key in the ConfigMap.
    • nodes: (optional) Supplies the cluster nodes used for topology generation as an array of regions mapping instance IDs to node names.

    Example:

1{
2 "provider": {
3 "name": "aws",
4 "creds": {
5 "accessKeyId": "id",
6 "secretAccessKey": "secret"
7 }
8 },
9 "engine": {
10 "name": "slurm",
11 "params": {
12 "plugin": "topology/block",
13 "blockSizes": [30, 120]
14 }
15 },
16 "nodes": [
17 {
18 "region": "region1",
19 "instances": {
20 "instance1": "node1",
21 "instance2": "node2",
22 "instance3": "node3"
23 }
24 },
25 {
26 "region": "region2",
27 "instances": {
28 "instance4": "node4",
29 "instance5": "node5",
30 "instance6": "node6"
31 }
32 }
33 ]
34}
  • Response: This endpoint immediately returns a “202 Accepted” status with a unique request ID if the request is valid. If not, it returns an appropriate error code.

3. Topology Result Endpoint

  • URL: GET http://<server>:<port>/v1/topology
  • Description: This endpoint retrieves the result of a topology request.
  • URL Query Parameters:
    • uid: Specifies the request ID returned by the topology request endpoint.
  • Response: Depending on the request’s execution stage, this endpoint can return:
    • “200 OK” - The request has completed successfully.
    • “202 Accepted” - The request is still in progress and has not completed yet.
    • “404 Not Found” - The specified request ID does not exist.
    • Other error responses encountered by Topograph during request execution.

Example usage:

$id=$(curl -s -X POST -H "Content-Type: application/json" -d @payload.json http://localhost:49021/v1/generate)
$
$curl -s "http://localhost:49021/v1/topology?uid=$id"