Runbook: Node Event Creation Failures
Runbook: Node Event Creation Failures
Overview
Node events provide visibility into non-fatal hardware problems. When creation fails, warning signs are hidden from operators.
Key points:
- Node events are for non-fatal health issues (warnings)
- Failures typically indicate API server issues
Symptoms
- Metric
nvsentinel_node_event_operations_total{operation="create", status="failed"}is increasing - Health events in MongoDB but not visible in
kubectl describe node
Procedure
1. Check Platform-Connector Logs
Look for error codes:
- 429 → API server throttling
- 403 → RBAC permission denied
- Connection refused/timeout → API server unreachable
- 409 → Conflict (should auto-resolve with retries)
2. Verify API Server is Reachable
If pods are in CrashLoopBackOff or Error, API connectivity may be broken.
3. Verify RBAC Permissions
Should return yes. If no, check the ClusterRole:
Should include create, update, list verbs for events resource.