Data Flow Architecture
Data transformations in the four-stage workflow.
Overview
Data flows through four stages:
Each stage transforms input data into a different format:
- Snapshot: Captures raw system state (OS, GPU, Kubernetes, SystemD)
- Recipe: Generates configuration recommendations by matching query parameters against overlay rules
- Validate: Checks recipe constraints against actual system measurements
- Bundle: Produces deployment artifacts (Helm values, manifests, scripts)
Stage 1: Snapshot (Data Capture)
Input Sources
SystemD Services:
- Source:
systemctl show containerd.service - Data: Service configuration, resource limits, cgroup delegates
- Format: Key-value pairs from SystemD properties
OS Configuration:
- grub:
/proc/cmdline- Boot parameters - kmod:
/proc/modules- Loaded kernel modules - sysctl:
/proc/sys/**/*- Kernel runtime parameters - release:
/etc/os-release- OS identification
Kubernetes Cluster:
- Source: Kubernetes API via
client-go - server: Version info from
/versionendpoint - image: Container images from all pods across namespaces
- policy: GPU Operator ClusterPolicy custom resource
GPU Hardware:
- Source:
nvidia-smicommand-line tool - Data: Driver version, CUDA version, MIG settings, device info
- Format: Parsed XML/text output
Snapshot Data Structure
Output Destinations:
- File:
aicr snapshot --output system.yaml - Stdout:
aicr snapshot(default, pipe to other commands) - ConfigMap:
aicr snapshot --output cm://namespace/name(Kubernetes-native)
ConfigMap Storage Pattern:
Agent Deployment:
Kubernetes Job writes snapshots directly to ConfigMap without volumes:
Reading Interface:
Collection Process
Parallel Collection
Context Propagation:
- All collectors respect context cancellation
- First error cancels remaining operations
- Timeout: 30 seconds per collector
Stage 2: Recipe (Data Optimization)
Recipe Input Options
Query Mode - Direct generation from parameters:
Snapshot Mode (File) - Analyze captured snapshot:
Snapshot Mode (ConfigMap) - Read from Kubernetes:
Query Extraction (Snapshot Mode)
When a snapshot is provided, the recipe builder extracts query parameters:
Extraction mapping
Recipe Generation
Inheritance Chain Resolution
When a query matches a leaf recipe that has a spec.base reference, the system resolves the full inheritance chain before merging:
Base and Overlay Merging
Overlay Matching Algorithm
Recipe Data Structure
Applied Overlays Example (with inheritance):
Stage 3: Validate (Constraint Checking)
Validation Process
The validate stage compares recipe constraints against actual measurements from a cluster snapshot.
Constraint Path Format
Constraints use fully qualified paths: {Type}.{Subtype}.{Key}
Supported Operators
Narrower subsets per validator. A small number of validators accept
only a subset of these operators when that’s the only form the evaluator
actually honors — using a broader operator would be silently reinterpreted
as the honored form, so the validator rejects it with
ErrCodeInvalidRequest at parse time instead. Current narrowings:
Input Sources
File-based:
ConfigMap-based:
HTTP/HTTPS:
Validation Output
Results are output in CTRF (Common Test Report Format) JSON:
CI/CD Integration
By default, the command exits with non-zero status on validation failures (ideal for CI/CD):
Stage 4: Bundle (Data Packaging)
Bundler Framework
Configuration Extraction
RecipeResult Pattern
Bundlers receive RecipeResult with component references and values maps:
Template Usage:
ScriptData for Metadata
Bundle Structure
The deployer generates the final output structure. See Deployer-Specific Output for details per deployer type.
Stage 5: Deployment (GitOps Integration)
Deployer Framework
After bundlers generate artifacts, the deployer framework transforms them into deployment-specific formats based on the --deployer flag.
Deployment Order Flow
The deploymentOrder field in recipes specifies component deployment sequence. Each deployer implements ordering differently:
Deployer-Specific Output
Helm Deployer (default):
Argo CD Deployer:
Argo CD Application with multi-source:
Deployer Data Flow
Data Serialization
Formats Supported
JSON:
YAML:
Table (Human-readable):
Serialization Pipeline
API Server Data Flow
Request Processing
Response Headers
Data Storage
Embedded Data
Recipe Data:
- Location:
recipes/overlays/*.yaml(includingbase.yaml),recipes/mixins/*.yaml - Embedded at compile time via
//go:embeddirectives - Loaded once per process, cached in memory
- TTL: 5 minutes (in-memory cache)
Bundle Templates:
- Location:
pkg/bundler/*/templates/*.tmpl - Embedded at compile time:
//go:embed templates/*.tmpl - Parsed once per bundler initialization
No External Dependencies:
- No database
- No configuration files
- No network calls (except Kubernetes API for snapshots)
- Fully self-contained binaries
Performance Characteristics
Snapshot Collection
- Parallel: All collectors run concurrently
- Timeout: 30 seconds per collector
- Memory: ~10-50MB depending on cluster size
- Duration: 1-5 seconds typical
Recipe Generation
- Cached: Recipe data cached in memory (5min TTL)
- Overlay Matching: O(n) where n = number of overlays
- Memory: <1MB per request
- Duration: <100ms typical (in-memory only)
Bundle Generation
- Parallel: All bundlers run concurrently
- Template Rendering: Minimal overhead (<10ms per template)
- File I/O: ~10-20 files per bundler
- Duration: <1 second typical
API Server
- Concurrency: 100 req/s sustained, 200 burst
- Latency: p50: 50ms, p95: 150ms, p99: 300ms
- Memory: ~100MB baseline + 1MB per concurrent request
- CPU: Minimal (<5% single core at 100 req/s)
Data Validation
Input Validation
Query Parameters:
- Type checking (string, int, bool)
- Enum validation (eks, gke, aks, etc.)
- Version format validation (regex)
- Range validation (if applicable)
Snapshot Files:
- YAML/JSON schema validation
- Required fields presence
- Type consistency
- Measurement structure validation
Output Validation
Recipes:
- Valid apiVersion and kind
- Metadata with version and timestamp
- Criteria properly populated
- ComponentRefs have required fields (name, version)
Bundles:
- All required files generated
- Templates rendered successfully
- Checksums computed
- File permissions correct (scripts executable)
See Also
- Data Architecture - Recipe data architecture
- API Reference - API endpoint details
- Automation - CI/CD integration patterns
- CONTRIBUTING.md - Developer guide