Runbooks#
A service runbook is a concise, structured document that outlines the standard operating procedures for running, troubleshooting, and maintaining various services.
Its purpose is to give engineers (especially on-call or incident responders) the exact steps needed to quickly understand the service, handle common problems, and recover from failures — without having to dig through large amounts of documentation.
- Caches Runbook
- DDCS: No Active Shards
- DDCS: Cache Misses and Performance Degradation
- DDCS: Disk Space Exhaustion
- DDCS: RocksDB Corruption or Failures
- DDCS: Network Bandwidth or Latency Bottlenecks
- UCC: Connection Saturation and High Response Times
- UCC: Metadata Cache Undersizing
- UCC: Data Disk Bandwidth Bottlenecks
- UCC: Upstream S3 Connection Spikes and High Connect Time
- UCC: Network Bandwidth Saturation