Configuration and API
Configuration
Topograph accepts its configuration file path using the -c command-line parameter. The configuration file is a YAML document. A sample configuration file is located at config/topograph-config.yaml.
The configuration file supports the following parameters:
API
Topograph exposes three endpoints for interacting with the service. Below are the details of each endpoint:
1. Health Endpoint
- URL:
GET http://<server>:<port>/healthz - Description: This endpoint verifies the service status. It returns a “200 OK” HTTP response if the service is reachable.
2. Topology Request Endpoint
-
URL:
POST http://<server>:<port>/v1/generate -
Description: This endpoint is used to request a new cluster topology.
-
Payload: The request body is a JSON object organized into three top-level sections:
- provider: (optional) Selects the topology source and provides any provider-specific authentication or parameters.
- name: (optional) A string specifying the Service Provider, such as
aws,oci,gcp,nebius,netq,dra,infiniband-k8s,infiniband-bmortest. This parameter will override the provider set in the topograph config. - creds: (optional) A key-value map with provider-specific parameters for authentication.
- params: (optional) A key-value map with provider-specific parameters. The
testprovider uses these parameters for response simulation; for complete behavior and examples, see Test Mode and Test Provider.
- name: (optional) A string specifying the Service Provider, such as
- engine: (optional) Selects the topology output and provides any engine-specific parameters.
- name: (optional) A string specifying the topology output, either
slurm,k8s, orslinky. This parameter will override the engine set in the topograph config. - params: (optional) A key-value map with engine-specific parameters.
- plugin: (optional) Used in: [
slurm,slinky]. A string specifying the cluster-wide topology plugin:topology/treeortopology/block. Forslurm, this defaults totopology/treewhen neitherpluginnortopologiesis set. Do not setplugintogether withtopologies. - blockSizes: (optional) Used in: [
slurm,slinky]. An array of block sizes fortopology/block. - topologyConfigPath: Used in: [
slurm,slinky]. Optional forslurm; required forslinky. Forslurm, a file path for the topology configuration; if omitted, the topology config content is returned in the HTTP response. Forslinky, the key for the topology config in the ConfigMap. - topologies: (optional) Used in: [
slurm,slinky]. A map of named per-partition topology settings. Do not set top-levelplugintogether withtopologies.- plugin: Used in: [
slurm,slinky]. A required string specifying the per-partition topology plugin:topology/tree,topology/block, ortopology/flat. - blockSizes: (optional) Used in: [
slurm,slinky]. An array of block sizes fortopology/block. - nodes: (optional) Used in: [
slurm,slinky]. An explicit list of SLURM nodes for this topology. If omitted, Topograph can discover membership frompodSelector(slinkyonly) orpartition. - partition: (optional) Used in: [
slurm,slinky]. A SLURM partition name used to discover nodes withscontrol show partitionwhennodesis not set. Forslinky, this fallback is used only when the topology entry does not setpodSelector. - podSelector: (optional) Used in: [
slinky]. A Kubernetes label selector for slurmd pods in this partition.nodesandpodSelectorare mutually exclusive on the same topology entry. - clusterDefault: (optional) Used in: [
slurm,slinky]. Iftrue, marks this topology as the default for nodes not assigned to another topology; commonly used withplugin: topology/flat.
- plugin: Used in: [
- reconfigure: (optional) Used in: [
slurm]. Iftrue, invokescontrol reconfigureafter topology config is generated. Defaultfalse. - namespace: Used in: [
slinky]. The required namespace where the SLURM cluster is running. - podSelector: Used in: [
slinky]. A required Kubernetes label selector for pods running SLURM nodes. - nodeSelector: (optional) Used in: [
k8s,slinky]. A Kubernetes node label map that filters which nodes participate in topology generation. - topologyConfigmapName: Used in: [
slinky]. The required name of the ConfigMap containing the topology config. - useDynamicNodes: (optional) Used in: [
slinky]. Iftrue, Kubernetes nodes matched by the Node Selector will be annotated with the topology spec. - configUpdateMode: (optional) Used in: [
slinky]. By default, the full topology YAML is written in the Slurm ConfigMap.skeleton-onlyoverrides to include switches or blocks only (no node lines);noneskips updating the topology key in the ConfigMap.
- plugin: (optional) Used in: [
- name: (optional) A string specifying the topology output, either
- nodes: (optional) Supplies the cluster nodes used for topology generation as an array of regions mapping instance IDs to node names.
Example:
- provider: (optional) Selects the topology source and provides any provider-specific authentication or parameters.
- Response: This endpoint immediately returns a “202 Accepted” status with a unique request ID if the request is valid. If not, it returns an appropriate error code.
3. Topology Result Endpoint
- URL:
GET http://<server>:<port>/v1/topology - Description: This endpoint retrieves the result of a topology request.
- URL Query Parameters:
- uid: Specifies the request ID returned by the topology request endpoint.
- Response: Depending on the request’s execution stage, this endpoint can return:
- “200 OK” - The request has completed successfully.
- “202 Accepted” - The request is still in progress and has not completed yet.
- “404 Not Found” - The specified request ID does not exist.
- Other error responses encountered by Topograph during request execution.
Example usage: