Runbook: Health Event Analyzer High Error Rate
Runbook: Health Event Analyzer High Error Rate
Runbook: Health Event Analyzer High Error Rate
Prometheus Alert: HealthEventAnalyzerHighErrorRate
Metric: High ratio of health_event_analyzer_event_processing_errors to health_event_analyzer_events_received_total
Common Causes:
The Health Events Analyzer (HEA) processes health events from MongoDB and applies rules defined in the health-events-analyzer-config ConfigMap. These rules contain MongoDB aggregation pipeline queries. Errors can occur due to:
generatedtimestap instead of generatedtimestamp)$notequal instead of $ne)The Health Events Analyzer executes rules defined in the configuration file. These rules contain queries written in MongoDB aggregation pipeline syntax. Typos or syntax errors will cause processing failures every time a rule is evaluated for processing event.
Diagnosis:
Solution:
The Health Events Analyzer establishes a connection to MongoDB to listen for inserted events and to publish new health events. Connection errors prevent event processing and increase health_event_analyzer_event_processing_errors metric for execute_pipeline_error error type.
Follow the MongoDB Connection Error runbook to diagnose and resolve MongoDB connection issues.
If MongoDB pods are in a healthy state, check for connectivity errors in the health-events-analyzer pod: