PostgreSQL Provider for NVSentinel

View as Markdown

Overview

NVSentinel supports PostgreSQL as an alternative datastore to MongoDB. The PostgreSQL provider offers similar functionality with a relational database backend, providing flexibility in deployment options.

Features

  • Full feature parity with MongoDB provider
  • TLS/SSL support with client certificate authentication
  • Change stream emulation using triggers and polling
  • JSONB storage for flexible document-like data structures
  • Automatic schema management with tables, indexes, and triggers
  • Production-ready with connection pooling and error handling

When to Use PostgreSQL

Choose PostgreSQL if you:

  • Already have PostgreSQL infrastructure in your organization
  • Prefer relational databases for operational reasons
  • Need to leverage existing PostgreSQL expertise
  • Want to use PostgreSQL-specific features or tools
  • Have compliance requirements that favor PostgreSQL

Architecture

Database Schema

The PostgreSQL provider uses the following tables:

  1. health_events - Stores GPU and system health events
  2. maintenance_events - Stores CSP maintenance events
  3. datastore_changelog - Tracks changes for change stream emulation
  4. resume_tokens - Stores resume tokens for change stream clients

Each table uses JSONB columns to store document data while extracting key fields for indexing.

Change Stream Emulation

PostgreSQL doesn’t have native change streams like MongoDB. The provider emulates this using:

  • Triggers on tables to capture INSERT/UPDATE/DELETE operations
  • Changelog table to store change events
  • Polling mechanism to detect new changes
  • Resume tokens to track client progress and enable resumption

Configuration

Helm Chart Deployment

Using values-postgresql.yaml

Deploy NVSentinel with PostgreSQL using the provided values file:

$helm install nvsentinel ./distros/kubernetes/nvsentinel \
> -f distros/kubernetes/nvsentinel/values-postgresql.yaml \
> --namespace nvsentinel \
> --create-namespace

Custom Configuration

Create your own values file with PostgreSQL settings:

1global:
2 datastore:
3 provider: "postgresql"
4 connection:
5 host: "your-postgresql-host"
6 port: 5432
7 database: "nvsentinel"
8 username: "postgres"
9 sslmode: "verify-full"
10 sslcert: "/etc/ssl/client-certs/tls.crt"
11 sslkey: "/etc/ssl/client-certs/tls.key"
12 sslrootcert: "/etc/ssl/client-certs/ca.crt"
13
14 # Disable MongoDB when using PostgreSQL
15 mongodbStore:
16 enabled: false
17
18# Enable PostgreSQL subchart
19postgresql:
20 enabled: true
21 auth:
22 postgresPassword: "your-secure-password"
23 database: "nvsentinel"

TLS/SSL Configuration

PostgreSQL requires TLS for secure connections. The Helm chart automatically configures:

  1. cert-manager resources for certificate generation
  2. Client certificates for application authentication
  3. Server certificates for PostgreSQL TLS
  4. InitContainers to fix certificate permissions

SSL Modes

  • disable - No SSL (not recommended for production)
  • require - SSL required but no certificate verification
  • verify-ca - Verify server certificate against CA
  • verify-full - Full verification including hostname (recommended)

Environment Variables

The provider recognizes these environment variables:

  • DATASTORE_PROVIDER - Set to “postgresql”
  • DATASTORE_HOST - PostgreSQL hostname
  • DATASTORE_PORT - PostgreSQL port (default: 5432)
  • DATASTORE_DATABASE - Database name
  • DATASTORE_USERNAME - Database username
  • DATASTORE_SSLMODE - SSL mode
  • DATASTORE_SSLCERT - Client certificate path
  • DATASTORE_SSLKEY - Client key path
  • DATASTORE_SSLROOTCERT - CA certificate path
  • POSTGRESQL_CLIENT_CERT_MOUNT_PATH - Certificate directory path

Alternatively, use MONGODB_CLIENT_CERT_MOUNT_PATH for backward compatibility - the SDK handles both.

Development with Tilt

Using PostgreSQL in Tilt

Set the environment variable before running Tilt:

$export USE_POSTGRESQL=1
$tilt up

This will:

  • Load values-tilt-postgresql.yaml automatically
  • Deploy PostgreSQL instead of MongoDB
  • Generate PostgreSQL certificates via cert-manager
  • Configure all services to use PostgreSQL

Switching Back to MongoDB

Simply unset the variable and restart:

$unset USE_POSTGRESQL
$tilt down
$tilt up

Development Values

The values-tilt-postgresql.yaml file includes:

  • Single-replica PostgreSQL for faster startup
  • Development password (nvsentinel-dev)
  • Auto-initialized schema with tables and triggers
  • Control plane node selector for PostgreSQL pod

Migration Guide

From MongoDB to PostgreSQL

  1. Export data from MongoDB (if preserving data):

    $# Export collections to JSON
    $mongoexport --uri="mongodb://..." --db=nvsentinel --collection=health_events --out=health_events.json
    $mongoexport --uri="mongodb://..." --db=nvsentinel --collection=maintenance_events --out=maintenance_events.json
  2. Deploy PostgreSQL:

    $helm upgrade nvsentinel ./distros/kubernetes/nvsentinel \
    > -f values-postgresql.yaml \
    > --namespace nvsentinel
  3. Import data to PostgreSQL (if needed):

    $# Connect to PostgreSQL and import JSONB data
    $# (Custom import script required based on your data)

Testing the Migration

Verify PostgreSQL deployment:

$# Check PostgreSQL is running
$kubectl get statefulset nvsentinel-postgresql -n nvsentinel
$
$# Check pods are ready
$kubectl get pods -n nvsentinel | grep postgresql
$
$# Verify services are using PostgreSQL
$kubectl logs -n nvsentinel deployment/fault-quarantine | grep -i postgres

Schema Management

Automatic Initialization

The PostgreSQL subchart automatically runs initialization scripts that:

  • Create tables if they don’t exist
  • Create indexes for query performance
  • Set up triggers for change tracking
  • Create helper functions

Manual Schema Updates

If you need to modify the schema:

  1. Connect to PostgreSQL:

    $kubectl exec -it nvsentinel-postgresql-0 -n nvsentinel -- psql -U postgres -d nvsentinel
  2. Run your schema updates:

    1-- Example: Add a new index
    2CREATE INDEX idx_custom ON health_events ((document->>'customField'));

Schema Reference

See postgresql-schema.sql for the complete schema definition.

Performance Tuning

Connection Pooling

The provider uses Go’s database/sql package with connection pooling:

1// Default settings (can be tuned via environment variables)
2MaxOpenConns: 25
3MaxIdleConns: 25
4ConnMaxLifetime: 5 minutes

Indexing Strategy

Key indexes for performance:

  • node_name - Queries by node
  • status - Status filtering
  • created_at / event_timestamp - Time-based queries
  • agent / csp - Event source filtering
  • is_fatal - Fatal event queries

Query Optimization

The provider extracts frequently-queried fields from JSONB:

  • Reduces JSONB query overhead
  • Enables efficient B-tree index usage
  • Maintains flexibility for additional JSONB fields

Troubleshooting

Connection Issues

Problem: “connection refused” or “no route to host”

Solutions:

  • Verify PostgreSQL service is running: kubectl get svc nvsentinel-postgresql -n nvsentinel
  • Check network policies aren’t blocking access
  • Verify DNS resolution: nslookup nvsentinel-postgresql.nvsentinel.svc.cluster.local

Certificate Issues

Problem: “certificate verify failed” or permission denied

Solutions:

  • Check cert-manager created certificates: kubectl get certificate -n nvsentinel
  • Verify certificate secret exists: kubectl get secret postgresql-client-cert -n nvsentinel
  • Check initContainer logs: kubectl logs -n nvsentinel deployment/fault-quarantine -c fix-cert-permissions
  • Ensure certificate permissions are correct (key: 600, cert/ca: 644)

SSL Mode Issues

Problem: “SSL required” or “SSL not supported”

Solutions:

  • Verify sslmode is set correctly in values
  • Check PostgreSQL server has TLS enabled
  • Verify server certificate is valid

Change Stream Not Working

Problem: Services not receiving updates

Solutions:

  • Check triggers are created: Connect to DB and run \d health_events to see triggers
  • Verify changelog table has entries: SELECT COUNT(*) FROM datastore_changelog;
  • Check resume tokens are being updated: SELECT * FROM resume_tokens;
  • Review component logs for polling errors

Performance Issues

Problem: Slow queries or high CPU

Solutions:

  • Check indexes exist: \d health_events in psql
  • Analyze slow queries: Enable PostgreSQL slow query log
  • Increase connection pool size if needed
  • Consider vacuuming: VACUUM ANALYZE health_events;

Monitoring

Metrics

Monitor these PostgreSQL metrics:

  • Connection pool utilization
  • Query latency (P50, P95, P99)
  • Transaction rate
  • Table sizes and growth
  • Index hit ratio
  • Replication lag (if using replicas)

Health Checks

Verify datastore health:

$# Check PostgreSQL pod health
$kubectl get pods -n nvsentinel -l app.kubernetes.io/name=postgresql
$
$# Check service connectivity from a pod
$kubectl exec -it deployment/fault-quarantine -n nvsentinel -- \
> psql "postgresql://postgres:password@nvsentinel-postgresql:5432/nvsentinel?sslmode=require"
$
$# Check for errors in logs
$kubectl logs -n nvsentinel deployment/fault-quarantine | grep -i error

Security Considerations

Authentication

  • Client certificates - Mutual TLS authentication required
  • Password authentication - Not recommended for production
  • SCRAM-SHA-256 - Supported for password-based auth

Authorization

Configure PostgreSQL user permissions:

1-- Create limited user for applications
2CREATE USER nvsentinel_app WITH PASSWORD 'secure-password';
3GRANT CONNECT ON DATABASE nvsentinel TO nvsentinel_app;
4GRANT SELECT, INSERT, UPDATE, DELETE ON ALL TABLES IN SCHEMA public TO nvsentinel_app;

Network Security

  • Use NetworkPolicies to restrict access
  • Enable TLS for all connections
  • Use private networks when possible
  • Rotate certificates regularly

Backup and Recovery

Backup Strategy

The PostgreSQL subchart supports automated backups:

1postgresql:
2 backup:
3 enabled: true
4 cronjob:
5 schedule: "0 2 * * *" # Daily at 2 AM
6 storage:
7 size: 10Gi

Manual Backup

$# Backup all data
$kubectl exec nvsentinel-postgresql-0 -n nvsentinel -- \
> pg_dump -U postgres nvsentinel > backup.sql
$
$# Backup specific table
$kubectl exec nvsentinel-postgresql-0 -n nvsentinel -- \
> pg_dump -U postgres -t health_events nvsentinel > health_events_backup.sql

Recovery

$# Restore from backup
$kubectl exec -i nvsentinel-postgresql-0 -n nvsentinel -- \
> psql -U postgres nvsentinel < backup.sql

Comparison: PostgreSQL vs MongoDB

FeaturePostgreSQLMongoDB
Data ModelRelational with JSONBDocument-oriented
Change StreamsEmulated via triggersNative
TransactionsFull ACIDMulti-document ACID
Query LanguageSQLMQL
IndexingB-tree, GiST, GINB-tree, Geospatial
ScalabilityVertical + read replicasHorizontal sharding
MaturityVery mature (30+ years)Mature (15+ years)
Ops ComplexityWell-known, standardSpecialized knowledge
Change Stream Latency~1s (polling)<100ms (native)

References

Support

For issues or questions:

  1. Check this documentation and troubleshooting section
  2. Review logs: kubectl logs -n nvsentinel <pod-name>
  3. File an issue: https://github.com/nvidia/nvsentinel/issues