Modeling and API Simulation
Topograph models are YAML files used to simulate discovered topology without querying a real cloud API, NetQ instance, InfiniBand fabric, or Kubernetes cluster. They are primarily used by tests and local development, but they are also useful when validating a scheduler integration against known topology shapes.
A model describes the same canonical topology that real providers eventually produce:
- A switch tree, used for Slurm
topology/treeoutput and Kubernetesleaf/spine/corelabels - Node membership in accelerated domains, used for block topology and accelerator labels
- Optional per-node attributes used by provider simulations
Model loading lives in pkg/models. Model fixtures live under tests/models/.
Where Models Are Used
Models are consumed in two different simulation flows.
Test Provider
The test provider simulates the Topograph API lifecycle itself. It can return successful topology output, delayed completion, malformed-request failures, provider failures, or a request that remains pending.
Use it when testing clients that call:
POST /v1/generateGET /v1/topology?uid=<request-id>
For the complete API status-code simulation behavior, see Test Mode and Test Provider.
Provider Simulations
Several providers also have simulation variants, such as:
aws-simgcp-simoci-simnebius-simlambdai-simdsx-sim
These providers load a model file and then simulate that provider’s API responses. This is useful when you want to exercise the normal provider translation logic without real provider credentials or infrastructure.
Simulation providers share these common parameters:
Example request:
Model File Shape
A model has three top-level sections:
All three sections are maps. nodes and capacity_blocks are flexible: you can specify node membership in either section, and Topograph completes the missing side during model loading.
Switches
The switches map describes the network hierarchy. Each key is the switch ID. Each value may contain:
Example:
Switch rules:
- A switch can have at most one parent switch.
- A node can be attached to at most one switch.
- If a switch references a node, that node must exist either in
nodesor be generated fromcapacity_blocks. - Switch
nodesentries are expanded through the same compact range syntax used elsewhere.
Nodes
The nodes map describes compute nodes directly. Each key is the node name. The value may contain:
Example:
Node rules:
capacity_block_idis optional.- Nodes without
capacity_block_idare still valid compute nodes. - If
capacity_block_idis set andcapacity_blocksis omitted, Topograph creates the capacity block and adds the node to it. - If a node is listed under
capacity_blocks.<id>.nodes, Topograph fills in the node’s missingcapacity_block_id. - If both sides specify different capacity block IDs for the same node, model loading fails.
Capacity Blocks
The capacity_blocks map describes accelerated domains. Each key is the capacity block ID. The value may contain:
Example:
Capacity block rules:
- The entire
capacity_blockssection may be omitted. - Individual capacity block entries may omit
nodes. - Capacity block entries with no corresponding nodes are allowed and preserved.
- If top-level
nodesis omitted,capacity_blocks.<id>.nodescreates node entries automatically. - If top-level
nodesis present,capacity_blocks.<id>.nodesmust reference nodes in the top-levelnodesmap.
Compact Ranges
Model node lists support compact ranges:
These expand to:
Ranges are accepted in:
switches.<switch>.nodescapacity_blocks.<id>.nodes
Derived Data
After YAML parsing, Topograph completes the model before simulation uses it:
- Switch node ranges are expanded.
- Capacity block node ranges are expanded.
- Node names are copied from their map keys.
- Switch names are copied from their map keys.
- Missing nodes can be created from
capacity_blocks.<id>.nodes. - Missing capacity block entries can be created from node
capacity_block_idvalues. - Node
NetLayersis derived from the switch path from leaf to root. - Node
Metadatais built by merging switch metadata along the same path. Instancesis derived from node names and grouped bymetadata.region; nodes without a region usenone.
These derived fields are not written in YAML.
Complete Examples
Nodes From Capacity Blocks
This compact model omits the nodes section. Nodes are created from capacity block membership.
After loading:
n1andn2belong tocb1and haveattributes.nvlink: nvl1n3belongs tocb2and hasattributes.nvlink: nvl2- All three nodes have network layers
[leaf, core]
Capacity Blocks From Nodes
This model omits capacity_blocks. Topograph creates cb1 from n1.capacity_block_id.
After loading:
cb1.nodescontainsn1cb1.attributes.nvlinkis populated fromn1.attributes.nvlinkn2remains a valid node without capacity block membership
Orphan Capacity Block
This is valid. It declares a capacity block that currently has no nodes.
After loading:
cb1.nodescontainsn1cb2remains present with no nodes
Simulating the API
To simulate the Topograph API lifecycle, configure the test provider:
Then submit a request that names a model:
Expected flow:
POST /v1/generatereturns202 Acceptedand a request ID.GET /v1/topology?uid=<request-id>returns202 Acceptedwhile the request is queued or processing.- When processing completes,
/v1/topologyreturns200 OKwith the selected engine output.
To simulate API failures, set generateResponseCode, topologyResponseCode, and errorMessage in provider.params. For example:
Choosing the Right Simulation Path
Use the test provider when you want to validate API-client behavior:
- Request IDs
- Polling
- Pending responses
- Error status codes
- Retry behavior
Use a *-sim provider when you want to validate provider-specific topology translation:
- AWS, GCP, OCI, Nebius, Lambda AI, or DSX topology paths
- Pagination behavior in simulated provider APIs
- Engine output generated from provider-shaped data
- Tree and block topology output from the same model
Validation Checklist
Before using a new model in a regression test:
- Confirm every switch child has only one parent.
- Confirm every switched node is defined in
nodesor generated fromcapacity_blocks. - Confirm no node appears under two switches.
- Confirm capacity block membership does not conflict with node
capacity_block_id. - Run the relevant provider simulation test or API flow with the target engine.