Production Considerations#
This page covers operational guidance for running the AI-Q blueprint in production environments.
Database#
Use Managed PostgreSQL#
The default compose stack includes a PostgreSQL container, but for production workloads consider a managed database service:
Amazon RDS for PostgreSQL
Google Cloud SQL for PostgreSQL
Azure Database for PostgreSQL
Set the following environment variables to point to your managed database:
Variable |
Driver |
Example |
|---|---|---|
|
|
|
|
|
|
|
|
|
Database Initialization#
When using a managed database, you must run the initialization SQL manually (or as a migration step) since the init-db.sql Docker entrypoint script only executes on a fresh PostgreSQL container volume. The script:
Creates the
aiq_checkpointsdatabase.Grants permissions to the application user.
Creates the
job_infotable with performance indices inaiq_jobs.
Refer to deploy/compose/init-db.sql for the full schema.
Backup Strategy#
Back up the following databases regularly:
aiq_jobs– Contains thejob_infotable (job metadata) andjob_eventstable (event stream). This is the critical operational data store.aiq_checkpoints– Contains LangGraph agent state checkpoints. These allow resumption of interrupted research workflows.
For managed databases, enable automated daily backups with at least 7 days of retention. For self-managed PostgreSQL, use pg_dump on a schedule:
pg_dump -U aiq -d aiq_jobs > aiq_jobs_$(date +%Y%m%d).sql
pg_dump -U aiq -d aiq_checkpoints > aiq_checkpoints_$(date +%Y%m%d).sql
Scaling#
Horizontal Backend Scaling#
The backend is stateless apart from database connections, so it can be horizontally scaled behind a load balancer.
Docker Compose: Run multiple backend containers by scaling the service and using a reverse proxy (such as Traefik or NGINX) in front:
docker compose --env-file ../.env -f docker-compose.yaml up -d --scale aiq-agent=3
Note that each scaled instance starts its own embedded Dask scheduler and worker. For a shared Dask cluster, deploy Dask separately and set NAT_DASK_SCHEDULER_ADDRESS to point to the external scheduler.
Dask Workers#
Each backend container runs an embedded Dask scheduler with a configurable number of workers and threads:
Variable |
Default |
Guidance |
|---|---|---|
|
|
Increase for higher job throughput. Each worker consumes memory proportional to the research workflow depth. |
|
|
Increase for I/O-bound workloads (web searches, API calls). |
Resource Requirements#
Deep research workflows are memory- and compute-intensive due to multi-phase LLM calls. Recommended minimums:
Component |
CPU |
Memory |
Notes |
|---|---|---|---|
Backend |
2 cores |
4 GB |
Increase for deep research or multiple concurrent users. |
Frontend |
0.5 cores |
512 MB |
Lightweight Next.js server. |
PostgreSQL |
1 core |
2 GB |
Increase for high write throughput. |
Security#
Non-Root Execution#
The Docker image runs as a non-root user (aiq, UID 1000) in both dev and release targets. The NVIDIA distroless base image has no shell and no package manager, reducing the attack surface.
Read-Only Configuration Mounts#
The compose stack mounts configs/ as read-only (:ro), preventing the application from modifying its own configuration at runtime.
Secrets Management#
Store API keys in deploy/.env and ensure the file is not committed to version control (it is listed in .gitignore). Never embed keys in configuration files or Dockerfiles.
Monitoring#
Health Endpoint#
The backend exposes a health endpoint at /health for liveness and readiness probes.
curl http://localhost:8000/health
Log Tailing#
Backend logs show agent execution, tool calls, LLM interactions, and job lifecycle events.
docker logs aiq-agent -f
Set LOG_LEVEL=DEBUG for verbose output during troubleshooting. Use LOG_LEVEL=WARNING in production to reduce log volume.
Tracing#
The backend supports OpenTelemetry-compatible tracing. See Observability for setup guides covering Phoenix, LangSmith, Weave, and the OTEL Collector with privacy redaction.
Metrics to Watch#
Metric |
Source |
What to look for |
|---|---|---|
Backend response time |
Health endpoint, access logs |
Increasing latency indicates resource pressure or LLM API slowdowns. |
Job queue depth |
|
Growing backlog means Dask workers cannot keep up. |
Database connections |
PostgreSQL |
Connection exhaustion from too many backend replicas. |
Container restarts |
Docker |
Frequent restarts indicate OOM kills or startup failures. |
Dask worker memory |
Dask dashboard (port 8787) |
Memory growth in workers during deep research. |