PostgreSQL Provider for NVSentinel
Overview
NVSentinel supports PostgreSQL as an alternative datastore to MongoDB. The PostgreSQL provider offers similar functionality with a relational database backend, providing flexibility in deployment options.
Features
- Full feature parity with MongoDB provider
- TLS/SSL support with client certificate authentication
- Change stream emulation using triggers and polling
- JSONB storage for flexible document-like data structures
- Automatic schema management with tables, indexes, and triggers
- Production-ready with connection pooling and error handling
When to Use PostgreSQL
Choose PostgreSQL if you:
- Already have PostgreSQL infrastructure in your organization
- Prefer relational databases for operational reasons
- Need to leverage existing PostgreSQL expertise
- Want to use PostgreSQL-specific features or tools
- Have compliance requirements that favor PostgreSQL
Architecture
Database Schema
The PostgreSQL provider uses the following tables:
- health_events - Stores GPU and system health events
- maintenance_events - Stores CSP maintenance events
- datastore_changelog - Tracks changes for change stream emulation
- resume_tokens - Stores resume tokens for change stream clients
Each table uses JSONB columns to store document data while extracting key fields for indexing.
Change Stream Emulation
PostgreSQL doesn’t have native change streams like MongoDB. The provider emulates this using:
- Triggers on tables to capture INSERT/UPDATE/DELETE operations
- Changelog table to store change events
- Polling mechanism to detect new changes
- Resume tokens to track client progress and enable resumption
Configuration
Helm Chart Deployment
Using values-postgresql.yaml
Deploy NVSentinel with PostgreSQL using the provided values file:
Custom Configuration
Create your own values file with PostgreSQL settings:
TLS/SSL Configuration
PostgreSQL requires TLS for secure connections. The Helm chart automatically configures:
- cert-manager resources for certificate generation
- Client certificates for application authentication
- Server certificates for PostgreSQL TLS
- InitContainers to fix certificate permissions
SSL Modes
disable- No SSL (not recommended for production)require- SSL required but no certificate verificationverify-ca- Verify server certificate against CAverify-full- Full verification including hostname (recommended)
Environment Variables
The provider recognizes these environment variables:
DATASTORE_PROVIDER- Set to “postgresql”DATASTORE_HOST- PostgreSQL hostnameDATASTORE_PORT- PostgreSQL port (default: 5432)DATASTORE_DATABASE- Database nameDATASTORE_USERNAME- Database usernameDATASTORE_SSLMODE- SSL modeDATASTORE_SSLCERT- Client certificate pathDATASTORE_SSLKEY- Client key pathDATASTORE_SSLROOTCERT- CA certificate pathPOSTGRESQL_CLIENT_CERT_MOUNT_PATH- Certificate directory path
Alternatively, use MONGODB_CLIENT_CERT_MOUNT_PATH for backward compatibility - the SDK handles both.
Development with Tilt
Using PostgreSQL in Tilt
Set the environment variable before running Tilt:
This will:
- Load
values-tilt-postgresql.yamlautomatically - Deploy PostgreSQL instead of MongoDB
- Generate PostgreSQL certificates via cert-manager
- Configure all services to use PostgreSQL
Switching Back to MongoDB
Simply unset the variable and restart:
Development Values
The values-tilt-postgresql.yaml file includes:
- Single-replica PostgreSQL for faster startup
- Development password (
nvsentinel-dev) - Auto-initialized schema with tables and triggers
- Control plane node selector for PostgreSQL pod
Migration Guide
From MongoDB to PostgreSQL
-
Export data from MongoDB (if preserving data):
-
Deploy PostgreSQL:
-
Import data to PostgreSQL (if needed):
Testing the Migration
Verify PostgreSQL deployment:
Schema Management
Automatic Initialization
The PostgreSQL subchart automatically runs initialization scripts that:
- Create tables if they don’t exist
- Create indexes for query performance
- Set up triggers for change tracking
- Create helper functions
Manual Schema Updates
If you need to modify the schema:
-
Connect to PostgreSQL:
-
Run your schema updates:
Schema Reference
See postgresql-schema.sql for the complete schema definition.
Performance Tuning
Connection Pooling
The provider uses Go’s database/sql package with connection pooling:
Indexing Strategy
Key indexes for performance:
node_name- Queries by nodestatus- Status filteringcreated_at/event_timestamp- Time-based queriesagent/csp- Event source filteringis_fatal- Fatal event queries
Query Optimization
The provider extracts frequently-queried fields from JSONB:
- Reduces JSONB query overhead
- Enables efficient B-tree index usage
- Maintains flexibility for additional JSONB fields
Troubleshooting
Connection Issues
Problem: “connection refused” or “no route to host”
Solutions:
- Verify PostgreSQL service is running:
kubectl get svc nvsentinel-postgresql -n nvsentinel - Check network policies aren’t blocking access
- Verify DNS resolution:
nslookup nvsentinel-postgresql.nvsentinel.svc.cluster.local
Certificate Issues
Problem: “certificate verify failed” or permission denied
Solutions:
- Check cert-manager created certificates:
kubectl get certificate -n nvsentinel - Verify certificate secret exists:
kubectl get secret postgresql-client-cert -n nvsentinel - Check initContainer logs:
kubectl logs -n nvsentinel deployment/fault-quarantine -c fix-cert-permissions - Ensure certificate permissions are correct (key: 600, cert/ca: 644)
SSL Mode Issues
Problem: “SSL required” or “SSL not supported”
Solutions:
- Verify
sslmodeis set correctly in values - Check PostgreSQL server has TLS enabled
- Verify server certificate is valid
Change Stream Not Working
Problem: Services not receiving updates
Solutions:
- Check triggers are created: Connect to DB and run
\d health_eventsto see triggers - Verify changelog table has entries:
SELECT COUNT(*) FROM datastore_changelog; - Check resume tokens are being updated:
SELECT * FROM resume_tokens; - Review component logs for polling errors
Performance Issues
Problem: Slow queries or high CPU
Solutions:
- Check indexes exist:
\d health_eventsin psql - Analyze slow queries: Enable PostgreSQL slow query log
- Increase connection pool size if needed
- Consider vacuuming:
VACUUM ANALYZE health_events;
Monitoring
Metrics
Monitor these PostgreSQL metrics:
- Connection pool utilization
- Query latency (P50, P95, P99)
- Transaction rate
- Table sizes and growth
- Index hit ratio
- Replication lag (if using replicas)
Health Checks
Verify datastore health:
Security Considerations
Authentication
- Client certificates - Mutual TLS authentication required
- Password authentication - Not recommended for production
- SCRAM-SHA-256 - Supported for password-based auth
Authorization
Configure PostgreSQL user permissions:
Network Security
- Use NetworkPolicies to restrict access
- Enable TLS for all connections
- Use private networks when possible
- Rotate certificates regularly
Backup and Recovery
Backup Strategy
The PostgreSQL subchart supports automated backups:
Manual Backup
Recovery
Comparison: PostgreSQL vs MongoDB
References
- PostgreSQL Official Documentation
- PostgreSQL JSONB Documentation
- Bitnami PostgreSQL Helm Chart
- NVSentinel PostgreSQL Schema
Support
For issues or questions:
- Check this documentation and troubleshooting section
- Review logs:
kubectl logs -n nvsentinel <pod-name> - File an issue: https://github.com/nvidia/nvsentinel/issues