Best Practices for Root Cause Analysis


Root cause analysis is about understanding not just what happened but why it happened. It’s about how our assumptions about a system or services are different from reality, so that fixes address the underlying cause instead of simply rolling back the latest deployment.

This guide provides the best practices to effectively use root cause analysis to understand why an outage happened in the first place so that teams can prevent this from occurring in the future.

You'll also learn how to:

  • Understand the context of a problem and learn from outages
  • Effectively and efficiently find the root cause of problems
  • Approach root cause analysis with the right mindset