The Complete Guide to Observability

A practical guide for developers

Observability helps developers and operators (“DevOps”) understand distributed systems: what’s slow, what’s broken, and what needs to be done to improve performance.

But distributed systems present unique and often difficult operational and maintenance challenges. When something breaks, it can be difficult to restore service quickly, or even know where to begin.

In order to manage and understand multi-layered architectures, we need more than traditional logs and infrastructure metrics.

In this guide, we cover:
  • Common observability challenges in distributed systems
  • Understanding telemetry data: logs, metrics, and traces
  • The “three pillars of observability”
  • Requirements for effective observability solutions
  • Managing observability with SLAs, SLOs, and SLIs
James Burns, LightStep Head of Research