Data Flow Architecture

View as Markdown

Data transformations in the four-stage workflow.

Overview

Data flows through four stages:

System Config → Snapshot → Recipe → Validate → Bundle → Deployment
(Raw) (Capture) (Optimize) (Check) (Package) (Deploy)

Each stage transforms input data into a different format:

  • Snapshot: Captures raw system state (OS, GPU, Kubernetes, SystemD)
  • Recipe: Generates configuration recommendations by matching query parameters against overlay rules
  • Validate: Checks recipe constraints against actual system measurements
  • Bundle: Produces deployment artifacts (Helm values, manifests, scripts)

Stage 1: Snapshot (Data Capture)

Input Sources

SystemD Services:

  • Source: systemctl show containerd.service
  • Data: Service configuration, resource limits, cgroup delegates
  • Format: Key-value pairs from SystemD properties

OS Configuration:

  • grub: /proc/cmdline - Boot parameters
  • kmod: /proc/modules - Loaded kernel modules
  • sysctl: /proc/sys/**/* - Kernel runtime parameters
  • release: /etc/os-release - OS identification

Kubernetes Cluster:

  • Source: Kubernetes API via client-go
  • server: Version info from /version endpoint
  • image: Container images from all pods across namespaces
  • policy: GPU Operator ClusterPolicy custom resource

GPU Hardware:

  • Source: nvidia-smi command-line tool
  • Data: Driver version, CUDA version, MIG settings, device info
  • Format: Parsed XML/text output

Snapshot Data Structure

┌─────────────────────────────────────────────────────────┐
│ Snapshot (aicr.nvidia.com/v1alpha1) │
├─────────────────────────────────────────────────────────┤
│ metadata: │
│ created: timestamp │
│ hostname: string │
│ │
│ measurements: []Measurement │
│ ├─ SystemD │
│ │ └─ subtypes: [containerd.service, ...] │
│ │ └─ data: map[string]Reading │
│ │ │
│ ├─ OS │
│ │ └─ subtypes: [grub, kmod, sysctl, release] │
│ │ └─ data: map[string]Reading │
│ │ │
│ ├─ K8s │
│ │ └─ subtypes: [server, image, policy] │
│ │ └─ data: map[string]Reading │
│ │ │
│ ├─ GPU │
│ │ └─ subtypes: [smi, driver, device] │
│ │ └─ data: map[string]Reading │
│ │ │
│ └─ NodeTopology │
│ └─ subtypes: [summary, taint, label] │
│ └─ data: map[string]Reading │
└─────────────────────────────────────────────────────────┘

Output Destinations:

  • File: aicr snapshot --output system.yaml
  • Stdout: aicr snapshot (default, pipe to other commands)
  • ConfigMap: aicr snapshot --output cm://namespace/name (Kubernetes-native)

ConfigMap Storage Pattern:

1apiVersion: v1
2kind: ConfigMap
3metadata:
4 name: aicr-snapshot
5 namespace: gpu-operator
6data:
7 snapshot.yaml: |
8 # Complete snapshot YAML stored as ConfigMap data
9 apiVersion: aicr.nvidia.com/v1alpha1
10 kind: Snapshot
11 measurements: [...]

Agent Deployment:
Kubernetes Job writes snapshots directly to ConfigMap without volumes:

$aicr snapshot --output cm://gpu-operator/aicr-snapshot

Reading Interface:

1type Reading interface {
2 Any() interface{} // Type-safe value extraction
3 String() string // String representation
4 // Supports: int, string, bool, float64
5}

Collection Process

Parallel Collection

┌──────────────┐
│ Snapshotter │
└──────┬───────┘
│ errgroup.WithContext()
├────────────┬─────────────┬─────────────┐
│ │ │ │
┌────▼────┐ ┌───▼───┐ ┌───▼───┐ ┌───▼───┐
│ SystemD │ │ OS │ │ K8s │ │ GPU │
│Collector│ │Collect│ │Collect│ │Collect│
└────┬────┘ └───┬───┘ └───┬───┘ └───┬───┘
│ │ │ │
└────────────┴─────────────┴─────────────┘
┌─────▼──────┐
│ Snapshot │
│ (YAML) │
└────────────┘

Context Propagation:

  • All collectors respect context cancellation
  • First error cancels remaining operations
  • Timeout: 30 seconds per collector

Stage 2: Recipe (Data Optimization)

Recipe Input Options

Query Mode - Direct generation from parameters:

$aicr recipe --os ubuntu --accelerator h100 --service eks --intent training --platform kubeflow

Snapshot Mode (File) - Analyze captured snapshot:

$aicr snapshot --output system.yaml
$aicr recipe --snapshot system.yaml --intent training --platform kubeflow

Snapshot Mode (ConfigMap) - Read from Kubernetes:

$# Agent or CLI writes snapshot to ConfigMap
$aicr snapshot --output cm://gpu-operator/aicr-snapshot
$
$# CLI reads from ConfigMap to generate recipe
$aicr recipe --snapshot cm://gpu-operator/aicr-snapshot --intent training --platform kubeflow
$
$# Recipe can also be written to ConfigMap
$aicr recipe --snapshot cm://gpu-operator/aicr-snapshot \
> --intent training \
> --platform kubeflow \
> --output cm://gpu-operator/aicr-recipe

Query Extraction (Snapshot Mode)

When a snapshot is provided, the recipe builder extracts query parameters:

Snapshot → Query Extractor → Recipe Query

Extraction mapping

K8s/server/version → k8s (version)
K8s/image/gpu-operator → service (eks/gke/aks detection)
K8s/config/* → intent hints
OS/release/ID → os (family)
OS/release/VERSION_ID → osv (version)
OS/grub/BOOT_IMAGE → kernel (version)
GPU/smi/model → accelerator (type)

Recipe Generation

Inheritance Chain Resolution

When a query matches a leaf recipe that has a spec.base reference, the system resolves the full inheritance chain before merging:

┌─────────────────────────────────────────────────────────────┐
│ Inheritance Resolution │
├─────────────────────────────────────────────────────────────┤
│ │
│ Query: {service: eks, accelerator: gb200, os: ubuntu, │
│ intent: training} │
│ │
│ 1. Find matching recipes (by specificity): │
│ - eks (specificity: 1) │
│ - eks-training (specificity: 2) │
│ - gb200-eks-training (specificity: 3) │
│ - gb200-eks-ubuntu-training (specificity: 4) │
│ │
│ 2. Resolve inheritance chain for each: │
│ gb200-eks-ubuntu-training.spec.base = "gb200-eks-training"
│ gb200-eks-training.spec.base = "eks-training" │
│ eks-training.spec.base = "eks" │
│ eks.spec.base = "" (implicit base) │
│ │
│ 3. Build chain (root to leaf): │
│ [base] → [eks] → [eks-training] → [gb200-eks-training] │
│ → [gb200-eks-ubuntu-training] │
│ │
│ 4. Merge in order (later overrides earlier): │
│ result = base │
│ result = merge(result, eks) │
│ result = merge(result, eks-training) │
│ result = merge(result, gb200-eks-training) │
│ result = merge(result, gb200-eks-ubuntu-training) │
│ │
└─────────────────────────────────────────────────────────────┘

Base and Overlay Merging

┌────────────────────────────────────────────────────────┐
│ Recipe Builder │
├────────────────────────────────────────────────────────┤
│ │
│ 1. Load base measurements (universal config) │
│ └─ From embedded overlays/base.yaml │
│ │
│ 2. Match query to overlays (by criteria) │
│ ├─ Query matches recipes where: │
│ │ - Recipe "any" field = wildcard (matches any) │
│ │ - Query "any" field = only matches recipe "any"│
│ └─ Resolve inheritance chain for each match │
│ │
│ 3. Merge inheritance chain in order │
│ ├─ Base values (from overlays/base.yaml) │
│ ├─ + eks (EKS-specific settings) │
│ ├─ + eks-training (training optimizations) │
│ ├─ + gb200-eks-training (GB200 overrides) │
│ └─ + gb200-eks-ubuntu-training (Ubuntu specifics) │
│ │
│ 4. Apply mixins (if spec.mixins declared) │
│ ├─ Load mixin files from recipes/mixins/ │
│ ├─ Append mixin constraints and componentRefs │
│ └─ If snapshot provided, evaluate mixin constraints│
│ │
│ 5. Strip context (if !context) │
│ └─ Remove context maps from all subtypes │
│ │
│ 6. Return recipe │
│ │
└────────────────────────────────────────────────────────┘

Overlay Matching Algorithm

1// Overlay matches if all specified fields match query
2// Omitted fields act as wildcards
3
4overlay.key {
5 service: "eks" // Must match
6 accelerator: "gb200" // Must match
7 os: <omitted> // Wildcard (any OS)
8}
9
10query {
11 service: "eks"
12 accelerator: "gb200"
13 os: "ubuntu"
14}
15
16Result: MATCH (os wildcarded)

Recipe Data Structure

┌─────────────────────────────────────────────────────────┐
│ Recipe (aicr.nvidia.com/v1alpha1) │
├─────────────────────────────────────────────────────────┤
│ metadata: │
│ version: recipe format version │
│ created: timestamp │
│ appliedOverlays: inheritance chain (root to leaf) │
│ │
│ criteria: Criteria (service, accelerator, intent, os) │
│ │
│ componentRefs: []ComponentRef │
│ ├─ name: component name │
│ ├─ version: component version │
│ ├─ order: deployment order │
│ └─ repository: Helm repository URL │
│ │
│ constraints: │
│ └─ driver: version, cudaVersion │
└─────────────────────────────────────────────────────────┘

Applied Overlays Example (with inheritance):

1metadata:
2 appliedOverlays:
3 - base
4 - eks
5 - eks-training
6 - gb200-eks-training
7 - gb200-eks-ubuntu-training

Stage 3: Validate (Constraint Checking)

Validation Process

The validate stage compares recipe constraints against actual measurements from a cluster snapshot.

┌────────────────────────────────────────────────────────┐
│ Validator │
├────────────────────────────────────────────────────────┤
│ │
│ Recipe Constraints + Snapshot → Validation Results │
│ │
│ ┌─────────────────┐ ┌─────────────────┐ │
│ │ Recipe │ │ Snapshot │ │
│ │ constraints: │ │ measurements: │ │
│ │ - K8s.version │ │ - K8s/server │ │
│ │ - OS.release │ │ - OS/release │ │
│ └────────┬────────┘ └────────┬────────┘ │
│ │ │ │
│ └───────────┬──────────┘ │
│ │ │
│ ┌────────▼────────┐ │
│ │ Constraint │ │
│ │ Evaluation │ │
│ │ ├─ Version cmp │ │
│ │ ├─ Equality │ │
│ │ └─ Exact match │ │
│ └────────┬────────┘ │
│ │ │
│ ┌────────▼────────┐ │
│ │ Results │ │
│ │ ├─ Passed │ │
│ │ ├─ Failed │ │
│ │ └─ Skipped │ │
│ └─────────────────┘ │
│ │
└────────────────────────────────────────────────────────┘

Constraint Path Format

Constraints use fully qualified paths: {Type}.{Subtype}.{Key}

PathDescription
K8s.server.versionKubernetes server version
OS.release.IDOperating system family (ubuntu, rhel)
OS.release.VERSION_IDOS version (22.04, 24.04)
OS.sysctl./proc/sys/kernel/osreleaseKernel version
GPU.driver.versionNVIDIA driver version

Supported Operators

OperatorDescriptionExample
>=Greater than or equalK8s.server.version>=1.28
<=Less than or equalK8s.server.version<=1.30
>Greater thanOS.release.VERSION_ID>22.04
<Less thanOS.release.VERSION_ID<25.00
==Exactly equalOS.release.ID==ubuntu
!=Not equalOS.release.ID!=rhel
(none)Exact matchGPU.driver.version

Narrower subsets per validator. A small number of validators accept only a subset of these operators when that’s the only form the evaluator actually honors — using a broader operator would be silently reinterpreted as the honored form, so the validator rejects it with ErrCodeInvalidRequest at parse time instead. Current narrowings:

Validator / metricAccepted operatorRationale
inference-throughput>= onlyEvaluator enforces throughput >= threshold * 0.9 (10% tolerance); strict >, ==, !=, bare, and inverted forms are all coerced to the same check and would mislead recipe authors.
inference-ttft-p99<= onlyEvaluator enforces ttftP99 <= threshold * 1.1; same rationale as throughput, opposite direction.

Input Sources

File-based:

$aicr validate --recipe recipe.yaml --snapshot snapshot.yaml

ConfigMap-based:

$aicr validate \
> --recipe recipe.yaml \
> --snapshot cm://gpu-operator/aicr-snapshot

HTTP/HTTPS:

$aicr validate \
> --recipe https://example.com/recipe.yaml \
> --snapshot https://example.com/snapshot.yaml

Validation Output

Results are output in CTRF (Common Test Report Format) JSON:

1{
2 "reportFormat": "CTRF",
3 "specVersion": "0.0.1",
4 "timestamp": "2026-03-10T20:10:44Z",
5 "generatedBy": "aicr",
6 "results": {
7 "tool": { "name": "aicr", "version": "v0.10.3-next" },
8 "summary": {
9 "tests": 16, "passed": 13, "failed": 0, "skipped": 3,
10 "pending": 0, "other": 0,
11 "start": 1773173400872, "stop": 1773173799002
12 },
13 "tests": [
14 {
15 "name": "operator-health",
16 "status": "passed",
17 "duration": 0,
18 "suite": ["deployment"],
19 "stdout": ["Found 1 gpu-operator pod(s)", "Running: 1/1"]
20 },
21 {
22 "name": "nccl-all-reduce-bw",
23 "status": "passed",
24 "duration": 234000,
25 "suite": ["performance"],
26 "stdout": ["NCCL All Reduce bandwidth: 488.37 GB/s", "Constraint: >= 100 → true"]
27 },
28 {
29 "name": "inference-perf",
30 "status": "passed",
31 "duration": 612000,
32 "suite": ["performance"],
33 "stdout": [
34 "RESULT: Inference throughput: 37961.24 tokens/sec",
35 "RESULT: Inference TTFT p99: 146.30 ms",
36 "Throughput constraint: >= 5000 → PASS",
37 "TTFT p99 constraint: <= 200 → PASS"
38 ]
39 }
40 ]
41 }
42}

CI/CD Integration

By default, the command exits with non-zero status on validation failures (ideal for CI/CD):

$aicr validate \
> --recipe recipe.yaml \
> --snapshot cm://gpu-operator/aicr-snapshot
$
$# Exit code: 0 = all passed, 1 = failures detected
$# Use --fail-on-error=false for informational mode without failing

Stage 4: Bundle (Data Packaging)

Bundler Framework

┌────────────────────────────────────────────────────────┐
│ Bundle Generator │
├────────────────────────────────────────────────────────┤
│ │
│ RecipeResult → Bundler Registry → Parallel Execution │
│ │
│ ┌─────────────────┐ │
│ │ RecipeResult │ │
│ └────────┬────────┘ │
│ │ │
│ ┌────────▼────────┐ │
│ │ Get Component │ (GetComponentRef) │
│ │ ├─ Name │ │
│ │ ├─ Version │ │
│ │ └─ Values map │ (GetValuesForComponent) │
│ └────────┬────────┘ │
│ │ │
│ ┌──────┴──────┐ │
│ │ Parallel │ │
│ ├─────────────┤ │
│ ├─ GPU Operator │
│ │ ├─ values map → values.yaml │
│ │ ├─ values map → clusterpolicy.yaml │
│ │ └─ ScriptData → install.sh, README.md │
│ │ │
│ ├─ Network Operator │
│ │ ├─ values map → values.yaml │
│ │ └─ ScriptData → install.sh, README.md │
│ │ │
│ ├─ Cert-Manager │
│ │ └─ values map → values.yaml │
│ │ │
│ ├─ NVSentinel │
│ │ └─ values map → values.yaml │
│ │ │
│ └─ Nodewright │
│ ├─ values map → values.yaml │
│ └─ values map → nodewright-cr.yaml │
│ │
│ ┌────────▼────────┐ │
│ │ Template Engine │ (go:embed templates) │
│ │ ├─ values.yaml │ │
│ │ ├─ manifests/ │ │
│ │ └─ checksums.txt│ │
│ └────────┬────────┘ │
│ │ │
│ ┌────────▼────────┐ │
│ │ Generate Files │ │
│ │ └─ checksums │ │
│ └─────────────────┘ │
│ │
└────────────────────────────────────────────────────────┘

Configuration Extraction

RecipeResult Pattern

Bundlers receive RecipeResult with component references and values maps:

1// Get component reference and values from RecipeResult
2component := input.GetComponentRef("gpu-operator")
3values := input.GetValuesForComponent("gpu-operator")
4
5// Values map contains nested configuration
6// {
7// "driver": {"enabled": true, "version": "580.82.07"},
8// "mig": {"strategy": "single"},
9// "gds": {"enabled": false}
10// }

Template Usage:

1# Helm values.yaml - receives values map
2driver:
3 version: {{ index .Values "driver.version" }}
4
5# README.md - receives combined map with Values + Script
6Driver Version: {{ index .Values "driver.version" }}
7Namespace: {{ .Script.Namespace }}

ScriptData for Metadata

1// ScriptData struct for scripts and README metadata
2type ScriptData struct {
3 Timestamp string
4 Version string
5 Namespace string
6 HelmRepository string
7 HelmChartVersion string
8}

Bundle Structure

The deployer generates the final output structure. See Deployer-Specific Output for details per deployer type.

Stage 5: Deployment (GitOps Integration)

Deployer Framework

After bundlers generate artifacts, the deployer framework transforms them into deployment-specific formats based on the --deployer flag.

┌────────────────────────────────────────────────────────┐
│ Deployer Selection │
├────────────────────────────────────────────────────────┤
│ │
│ Bundle Artifacts + Recipe → Deployer → Output │
│ │
│ ┌─────────────────┐ ┌─────────────────┐ │
│ │ Bundle Output │ │ Recipe │ │
│ │ ├─ values.yaml │ │ deploymentOrder │ │
│ │ ├─ manifests/ │ │ componentRefs │ │
│ │ └─ scripts/ │ └────────┬────────┘ │
│ └────────┬────────┘ │ │
│ │ │ │
│ └───────────┬──────────┘ │
│ │ │
│ ┌────────────────────▼────────────────────┐ │
│ │ Deployer Selection (--deployer flag) │ │
│ │ │ │
│ │ ├─ helm (default) │ │
│ │ │ └─ Helm charts + README │ │
│ │ │ │ │
│ │ └─ argocd │ │
│ │ └─ Argo CD Application + sync-wave │ │
│ └─────────────────────────────────────────┘ │
│ │
└────────────────────────────────────────────────────────┘

Deployment Order Flow

The deploymentOrder field in recipes specifies component deployment sequence. Each deployer implements ordering differently:

┌─────────────────────────────────────────────────────────┐
│ Deployment Order Processing │
├─────────────────────────────────────────────────────────┤
│ │
│ Recipe deploymentOrder: │
│ 1. cert-manager │
│ 2. gpu-operator │
│ 3. network-operator │
│ │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────────────────────┐ │
│ │ orderComponentsByDeployment() │ │
│ │ Sorts components based on deploymentOrder │ │
│ │ Returns: []orderedComponent{Name, Order} │ │
│ └───────────────────────┬──────────────────────────┘ │
│ │ │
│ ┌────────────────┴────────────────┐ │
│ ▼ ▼ │
│ ┌────────────┐ ┌────────────┐ │
│ │ Helm │ │ Argo CD │ │
│ │ Deployer │ │ Deployer │ │
│ │ (default) │ │ │ │
│ └──────┬─────┘ └──────┬─────┘ │
│ │ │ │
│ ▼ ▼ │
│ Per-component dirs sync-wave: │
│ + deploy.sh script - cert-manager:0 │
│ - gpu-operator:1 │
│ - network-op:2 │
│ │
└─────────────────────────────────────────────────────────┘

Deployer-Specific Output

Helm Deployer (default):

bundle-output/
├── README.md # Root deployment guide with ordered steps
├── deploy.sh # Automation script (chmod +x)
├── recipe.yaml # Copy of the input recipe
├── checksums.txt # SHA256 checksums of all files
├── cert-manager/
│ ├── values.yaml # Component Helm values
│ └── README.md # Component install/upgrade/uninstall
├── gpu-operator/
│ ├── values.yaml # Component Helm values
│ ├── README.md # Component install/upgrade/uninstall
│ └── manifests/ # Optional manifest files
│ └── dcgm-exporter.yaml
└── network-operator/
├── values.yaml
└── README.md

Argo CD Deployer:

bundle-output/
├── app-of-apps.yaml # Parent Application (bundle root)
├── gpu-operator/
│ ├── values.yaml
│ ├── manifests/
│ └── argocd/
│ └── application.yaml # With sync-wave annotation
├── network-operator/
│ ├── values.yaml
│ └── argocd/
│ └── application.yaml # With sync-wave annotation
└── README.md

Argo CD Application with multi-source:

1apiVersion: argoproj.io/v1alpha1
2kind: Application
3metadata:
4 name: gpu-operator
5 annotations:
6 argocd.argoproj.io/sync-wave: "1" # After cert-manager (0)
7spec:
8 sources:
9 # Helm chart from upstream
10 - repoURL: https://helm.ngc.nvidia.com/nvidia
11 targetRevision: v25.3.3
12 chart: gpu-operator
13 helm:
14 valueFiles:
15 - $values/gpu-operator/values.yaml
16 # Values from GitOps repo
17 - repoURL: <YOUR_GIT_REPO>
18 targetRevision: main
19 ref: values
20 # Additional manifests (if present)
21 - repoURL: <YOUR_GIT_REPO>
22 targetRevision: main
23 path: gpu-operator/manifests

Deployer Data Flow

┌──────────────────────────────────────────────────────────────┐
│ Complete Bundle + Deploy Flow │
├──────────────────────────────────────────────────────────────┤
│ │
│ aicr bundle -r recipe.yaml --deployer argocd \ │
│ --repo https://github.com/my-org/my-repo.git -o ./out │
│ │
│ 1. Parse recipe │
│ └─ Extract componentRefs + deploymentOrder │
│ │
│ 2. Order components │
│ └─ orderComponentsByDeployment() │
│ │
│ 3. Run bundlers (parallel) │
│ ├─ cert-manager → values.yaml, manifests/ │
│ ├─ gpu-operator → values.yaml, manifests/ │
│ └─ network-operator → values.yaml, manifests/ │
│ │
│ 4. Run deployer (argocd) → per-component argocd/ dirs │
│ ├─ cert-manager/argocd/application.yaml (wave: 0) │
│ ├─ gpu-operator/argocd/application.yaml (wave: 1) │
│ └─ network-operator/argocd/application.yaml (wave: 2) │
│ └─ app-of-apps.yaml (bundle root, uses --repo URL) │
│ │
│ 5. Generate checksums │
│ └─ checksums.txt for each component │
│ │
└──────────────────────────────────────────────────────────────┘

Data Serialization

Formats Supported

JSON:

1{
2 "apiVersion": "v1",
3 "kind": "Recipe",
4 "measurements": [...]
5}

YAML:

1apiVersion: v1
2kind: Recipe
3measurements:
4 - type: K8s
5 subtypes: [...]

Table (Human-readable):

TYPE SUBTYPE KEY VALUE
K8s image gpu-operator v25.3.3
K8s image driver 580.82.07
GPU driver version 580.82.07

Serialization Pipeline

Go Struct → Interface → Marshaler → Output Format
Measurement{
Type: "K8s"
Subtypes: []Subtype{...}
}
json.Marshal() / yaml.Marshal()
{"type":"K8s","subtypes":[...]}

API Server Data Flow

Request Processing

HTTP Request → Middleware Chain → Handler → Response
1. Metrics Middleware (record request)
2. Version Middleware (check API version)
3. RequestID Middleware (add/echo request ID)
4. Panic Recovery (catch panics)
5. Rate Limit (100 req/s)
6. Logging (structured logs)
7. Handler:
├─ Parse query parameters
├─ Build Query
├─ recipe.Builder.Build(ctx, query)
├─ Serialize response
└─ Return JSON

Response Headers

HTTP/1.1 200 OK
Content-Type: application/json
X-Request-Id: 550e8400-e29b-41d4-a716-446655440000
Cache-Control: public, max-age=300
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 95
X-RateLimit-Reset: 1735650000
{recipe JSON}

Data Storage

Embedded Data

Recipe Data:

  • Location: recipes/overlays/*.yaml (including base.yaml), recipes/mixins/*.yaml
  • Embedded at compile time via //go:embed directives
  • Loaded once per process, cached in memory
  • TTL: 5 minutes (in-memory cache)

Bundle Templates:

  • Location: pkg/bundler/*/templates/*.tmpl
  • Embedded at compile time: //go:embed templates/*.tmpl
  • Parsed once per bundler initialization

No External Dependencies:

  • No database
  • No configuration files
  • No network calls (except Kubernetes API for snapshots)
  • Fully self-contained binaries

Performance Characteristics

Snapshot Collection

  • Parallel: All collectors run concurrently
  • Timeout: 30 seconds per collector
  • Memory: ~10-50MB depending on cluster size
  • Duration: 1-5 seconds typical

Recipe Generation

  • Cached: Recipe data cached in memory (5min TTL)
  • Overlay Matching: O(n) where n = number of overlays
  • Memory: <1MB per request
  • Duration: <100ms typical (in-memory only)

Bundle Generation

  • Parallel: All bundlers run concurrently
  • Template Rendering: Minimal overhead (<10ms per template)
  • File I/O: ~10-20 files per bundler
  • Duration: <1 second typical

API Server

  • Concurrency: 100 req/s sustained, 200 burst
  • Latency: p50: 50ms, p95: 150ms, p99: 300ms
  • Memory: ~100MB baseline + 1MB per concurrent request
  • CPU: Minimal (<5% single core at 100 req/s)

Data Validation

Input Validation

Query Parameters:

  • Type checking (string, int, bool)
  • Enum validation (eks, gke, aks, etc.)
  • Version format validation (regex)
  • Range validation (if applicable)

Snapshot Files:

  • YAML/JSON schema validation
  • Required fields presence
  • Type consistency
  • Measurement structure validation

Output Validation

Recipes:

  • Valid apiVersion and kind
  • Metadata with version and timestamp
  • Criteria properly populated
  • ComponentRefs have required fields (name, version)

Bundles:

  • All required files generated
  • Templates rendered successfully
  • Checksums computed
  • File permissions correct (scripts executable)

See Also