Test Mode and Test Provider

View as Markdown

Topograph test mode uses the test provider to simulate topology-generation responses without querying a cloud API, NetQ, InfiniBand fabric, or Kubernetes labels. It is intended for integration and regression testing of downstream components that consume Topograph output.

Use test mode when you need to verify how a client handles successful topology generation, delayed topology generation, unknown request IDs, malformed requests, and simulated provider failures. This is especially useful for preventing regressions where an unhandled Topograph error causes a downstream system to discard a previously valid topology configuration.

Run Topograph in Test Mode

Set the default provider to test in topograph-config.yaml, and choose the engine whose output your client consumes:

1http:
2 port: 49021
3 ssl: false
4
5provider: test
6engine: slurm
7
8requestAggregationDelay: 2s

Then start Topograph:

1make build
2bin/topograph -c config/topograph-config.yaml

You can also leave the global provider and engine unset and specify them in each /v1/generate request payload. This is useful when one regression suite needs to exercise multiple engines.

Model files can be referenced by basename, such as small-tree.yaml, to load one of the embedded fixtures from tests/models/. You can also provide an absolute or relative path to a YAML model file.

API Flow

Topology generation uses two API endpoints.

/v1/generate

POST /v1/generate starts a topology-generation request.

Possible responses:

ResponseMeaningClient guidance
202 AcceptedThe request was accepted. The response body contains the request ID.Poll /v1/topology?uid=<request-id>.
4xxThe request is invalid or cannot be accepted.Do not retry the same request without changing it. Investigate the payload and configuration.
5xxTopograph returned a server-side failure.Retry the generate request according to the client’s retry policy.

/v1/topology

GET /v1/topology?uid=<request-id> retrieves the result for a previously accepted request.

Possible responses:

ResponseMeaningClient guidance
200 OKTopology generation completed. The response body contains the engine output.Consume the returned topology.
202 AcceptedThe request is still queued, still processing, or intentionally simulated as pending.Retry for a bounded period. Topology discovery should normally finish within about 2 minutes.
404 Not FoundThe request ID is unknown or no longer in the request history.Do not retry the same request ID. Submit a new /v1/generate request.
Other errorsTopology generation failed.Do not retry the same /v1/topology request indefinitely. Submit a new /v1/generate request if the client policy allows it.

Topograph internally retries topology processing up to 5 attempts, with exponential backoff starting at 2 seconds, for these retryable HTTP status codes:

  • 408 Request Timeout
  • 429 Too Many Requests
  • 500 Internal Server Error
  • 502 Bad Gateway
  • 503 Service Unavailable
  • 504 Gateway Timeout

While those internal retries are running, /v1/topology continues to return 202 Accepted for the request ID.

Request Payload

The test provider is configured through provider.params in the /v1/generate request:

1{
2 "provider": {
3 "name": "test",
4 "params": {
5 "testcaseName": "optional short test case name",
6 "description": "optional test case description",
7 "generateResponseCode": 202,
8 "topologyResponseCode": 200,
9 "modelFileName": "small-tree.yaml",
10 "errorMessage": "optional error message"
11 }
12 },
13 "engine": {
14 "name": "slurm"
15 }
16}

The engine object follows the normal Topograph engine configuration. For example, use slurm parameters to request topology/tree or topology/block output, use k8s parameters to write node labels, or use slinky parameters to update a Slinky ConfigMap.

Test Provider Parameters

ParameterRequiredDefaultDescription
testcaseNameNoEmptyHuman-readable name for the scenario. Topograph does not interpret this value.
descriptionNoEmptyLonger scenario description. Topograph does not interpret this value.
generateResponseCodeNo202Status code to return from /v1/generate. Valid values are 202 and HTTP error codes from 400 through 599. Any other value returns 400 Bad Request.
topologyResponseCodeNo200Status code to return from /v1/topology after the request finishes queueing and processing. Valid values are 200, 202, and HTTP error codes from 400 through 599. Any other value returns 400 Bad Request.
modelFileNameNoBuilt-in test treeModel file used when topologyResponseCode is 200. Ignored for error responses. If the model cannot be loaded, Topograph returns 400 Bad Request.
errorMessageNoEmptyResponse body used for simulated error responses.

Processing Behavior

When /v1/generate receives a request for provider test, Topograph decodes the test parameters before putting the request into the async queue.

  • If generateResponseCode is between 400 and 599, Topograph immediately returns that status code and errorMessage.
  • If generateResponseCode is 202, Topograph accepts the request and returns a request ID.
  • If generateResponseCode is any other value, Topograph returns 400 Bad Request.

For accepted requests, /v1/topology behaves like the normal asynchronous Topograph flow.

  • If the request ID is unknown, Topograph returns 404 Not Found.
  • If the request is still waiting for requestAggregationDelay to expire, or is still processing, Topograph returns 202 Accepted.
  • If topologyResponseCode is 202, Topograph keeps returning 202 Accepted after processing. This simulates a topology request that never completes.
  • If topologyResponseCode is between 400 and 599, Topograph returns that status code and errorMessage after processing. Retryable codes are retried internally first.
  • If topologyResponseCode is 200, Topograph loads the requested model, translates it through the selected engine, and returns the generated output.

Examples

Successful Topology Discovery

1{
2 "provider": {
3 "name": "test",
4 "params": {
5 "testcaseName": "success-case-01",
6 "description": "Return 202 for generate and then a valid topology.",
7 "generateResponseCode": 202,
8 "topologyResponseCode": 200,
9 "modelFileName": "small-tree.yaml"
10 }
11 },
12 "engine": {
13 "name": "slurm"
14 }
15}

Expected behavior:

  • /v1/generate returns 202 Accepted with a request ID.
  • /v1/topology returns 202 Accepted until the aggregation delay and processing complete.
  • /v1/topology then returns 200 OK with the generated Slurm topology configuration.

Generate Request Failure

1{
2 "provider": {
3 "name": "test",
4 "params": {
5 "testcaseName": "failure-case-01",
6 "description": "Return 500 from generate.",
7 "generateResponseCode": 500,
8 "errorMessage": "Internal Server Error"
9 }
10 },
11 "engine": {
12 "name": "slurm"
13 }
14}

Expected behavior:

  • /v1/generate returns 500 Internal Server Error.
  • No request ID is created.
  • The client should not call /v1/topology for this request.

Topology Request Failure

1{
2 "provider": {
3 "name": "test",
4 "params": {
5 "testcaseName": "failure-case-02",
6 "description": "Return 408 from topology after processing.",
7 "generateResponseCode": 202,
8 "topologyResponseCode": 408,
9 "errorMessage": "Request to AWS timed out"
10 }
11 },
12 "engine": {
13 "name": "slurm"
14 }
15}

Expected behavior:

  • /v1/generate returns 202 Accepted with a request ID.
  • /v1/topology returns 202 Accepted while the request is queued and while Topograph performs its internal retries.
  • /v1/topology eventually returns 408 Request Timeout with the configured error message.

Curl Workflow

Save a test payload to payload.json, submit it, and poll the result:

1uid=$(curl -sS -X POST \
2 -H "Content-Type: application/json" \
3 -d @payload.json \
4 http://localhost:49021/v1/generate)
5
6curl -i "http://localhost:49021/v1/topology?uid=${uid}"

For ready-made regression payloads, see tests/integration/ and tests/payloads/.