Log Rotation Failures
Runbook: Log Rotation Failures
Symptoms
- Prometheus alert:
LogRotationFailuresorFileServerLowDiskSpace - Metric
fileserver_log_rotation_failed_totalincreasing - Metric
fileserver_disk_space_free_bytesdecreasing to critical levels - Old log files not being deleted
- File server pod disk full
Diagnosis Steps
1. Check Cleanup Service Status
2. Check Disk Usage
3. Check Rotation Configuration
4. Check Metrics
Common Failure Causes and Solutions
Issue 1: Cleanup Container Not Running
Error: Cleanup container is missing or crashed
Solution:
Issue 2: Permission Errors
Error in logs:
Cause: Incorrect file permissions or security context
Solution:
Issue 3: Timeout During Cleanup
Error in logs:
Cause: Too many files to process within timeout period
Solution:
Issue 4: Incorrect Retention Configuration
Error: Files not being deleted at expected age
Solution:
Issue 5: Disk Full - Emergency Response
Critical: Disk is full and blocking new uploads
Immediate action:
Issue 6: Cleanup Running But Files Not Deleted
Error: Cleanup reports success but files remain
Solution:
Resolution Steps
-
Immediate triage:
-
Fix root cause using solutions above
-
Verify cleanup is working:
-
Test with a dry run:
-
Force cleanup cycle: