Observability of Distributed Systems

1 Øbservability of Distributed Systems Øredev 2019 Photo by Daniel
Cossio

2 Expedia Group Proprietary and Confidential About me - Software
Engineer at Expedia Group - Zipkin core team member and open source contributor for observability projects @jcchavezs - #oredev2019

3 Expedia Group Proprietary and Confidential Distributed Systems & Complexity
@jcchavezs - #oredev2019 Photo by Claudio Testa

4 Expedia Group Proprietary and Confidential Distributed systems @jcchavezs -
#oredev2019 A collection of independent components appears to its users as a single coherent system. Image source: https://link.medium.com/jey42ga7p1

5 Expedia Group Proprietary and Confidential Complexity (noun) 1. the
state of having many parts and being difficult to understand or find an answer to. Cambridge Dictionary @jcchavezs - #oredev2019

6 Expedia Group Proprietary and Confidential The three body problem
(1687) Given the initial positions and velocities of three masses find their subsequent paths of motion, according to laws of motion and universal gravitation. TL;DR - Known initial conditions - Unpredictable state of the system at given time @jcchavezs - #oredev2019

7 Expedia Group Proprietary and Confidential Distributed systems are complex
System complexity can be described as a measure of how understandable a system is and how difficult it is to understand an operation in the system. Sources of complexity in systems: - Task-Structure Complexity - Unpredictability - Size Complexity - Chaotic Complexity - Algorithmic Complexity @jcchavezs - #oredev2019

8 Expedia Group Proprietary and Confidential Why is it hard
to operate a Distributed System? - Systems change all the time - Things fail in unexpected ways - Unknown unknowns - Most problems are the convergence of many different things failing at once - Everyone in the team is supposed to respond with the same level of confidence and tools no matter experience or expertise and the more components, the less individuals know about them @jcchavezs - #oredev2019

9 Expedia Group Proprietary and Confidential Distributed systems are never
"up"; they exist in a constant state of partially degraded service. Source: https://opensource.com/article/17/7/state-systems-administration

10 Expedia Group Proprietary and Confidential Observability @jcchavezs - #oredev2019
Photo by Toa Heftiba

11 Expedia Group Proprietary and Confidential What is Observability? [...]
is a measure of how well internal states of a system can be inferred from knowledge of its external outputs. The observability and controllability of a system are mathematical duals...one can determine the behavior of the entire system from the system’s outputs. If a system is not observable, this means that the current values of some of its state variables cannot be determined through output sensors. This implies that their value is unknown to the controller (although they can be estimated by various means). Wikipedia @jcchavezs - #oredev2019

12 Expedia Group Proprietary and Confidential What is Observability? Observability
is the property of the system that allows to understand internal states from its inputs and output signals, in a way that actions can be distilled from that understanding. That means: - Observability is not tooling - It is fundamentally tied to control - Signals are not data but measurements connected to something we need to know @jcchavezs - #oredev2019

13 Expedia Group Proprietary and Confidential What is Observability? Source:
https://twitter.com/popsysdig/status/1139505998299877377 @jcchavezs - #oredev2019

14 Expedia Group Proprietary and Confidential Three pillars of observability
@jcchavezs - #oredev2019 Image source: https://twitter.com/autoletics/status/1163345131128401920

15 Expedia Group Proprietary and Confidential Three aggregates for signals
@jcchavezs - #oredev2019

16 Expedia Group Proprietary and Confidential Why should we invest
in observability? - Gives real-time feedback from signals - Helps to understand unknown-unknowns - Eases the debugging task by providing context and scope for signals - Improves resilience of systems by giving visibility to baseline failure modes in development cycle @jcchavezs - #oredev2019

17 Expedia Group Proprietary and Confidential Building observable systems

18 Expedia Group Proprietary and Confidential - On develop make
sure your system can emit meaningful signals. - When testing make sure actionable failure modes can be surfaced. - At deploy time, use observability signals to understand the impact of the changes been released. @jcchavezs - #oredev2019 Image source: https://link.medium.com/zvm1AfYvy0 Observability as part of the software lifecycle

19 Expedia Group Proprietary and Confidential - When operating a
system, use signals to: - understand health - detect anomalies - triage problems - evolve the system - When in support, you can re-scope the issues based on the signal context @jcchavezs - #oredev2019 Image source: https://link.medium.com/zvm1AfYvy0 Observability as part of the software lifecycle

20 Expedia Group Proprietary and Confidential Building an observability culture

21 Expedia Group Proprietary and Confidential Ownership Landing observability in
an engineering department needs champions who: - Raise awareness about the problems that can be solved by introducing observability - Understand teams’ pains when it comes to operate and triage the system and decide the right tools for those pains - Set practices, evolve them and help to replicate them among teams Building an observability culture @jcchavezs - #oredev2019

22 Expedia Group Proprietary and Confidential Tooling Observability is not
tooling but tooling is key to achieve a good observability, what is needed: - Suitable observability platforms and instrumentation in place - Tools and dashboards that connect the dots among stakeholders - Automated checks that make sure signal outputs make sense after a deploy - Right processes to make sure Personally Identifiable Information (PII) is safe Building an observability culture @jcchavezs - #oredev2019

23 Expedia Group Proprietary and Confidential Business value Observability can
also be beneficial for other stakeholders of the system: - Helping to achieve SLOs by improving the triage experience. - Giving support teams and engineers a common context to understand and fix problems in production. - Improving support teams awareness by foresee trends when it comes to failures. Building an observability culture @jcchavezs - #oredev2019

24 Expedia Group Proprietary and Confidential Summary - Systems are
complex and will be, observability helps us to understand better failure modes. - Observability is not a goal itself, it is only important if we close the cycle by the actions we take from the observations. - Observability will not only benefit developers and operators but all stakeholders of the system. - Like everything else in software industry, building the culture is more important than the code, infrastructure and tooling. @jcchavezs - #oredev2019

25 Expedia Group Proprietary and Confidential Thank you Q&A

26 Expedia Group Proprietary and Confidential See also - Does
software understand complexity? - Michael Feathers - What is the Complexity of a Distributed System? - Anand Ranganathan, Roy H. Campbell - Observability: The significant parts - William Louth - Observations on observability - Colin Breck - Observability 3 ways: Logging, Metrics & Tracing - Adrian Cole @jcchavezs - #oredev2019

Observability of Distributed Systems

Observability of Distributed Systems

José Carlos Chávez

More Decks by José Carlos Chávez

Other Decks in Programming

Featured

Transcript

1 Øbservability of Distributed Systems Øredev 2019 Photo by Daniel

2 Expedia Group Proprietary and Confidential About me - Software

3 Expedia Group Proprietary and Confidential Distributed Systems & Complexity

4 Expedia Group Proprietary and Confidential Distributed systems @jcchavezs -

5 Expedia Group Proprietary and Confidential Complexity (noun) 1. the

6 Expedia Group Proprietary and Confidential The three body problem

7 Expedia Group Proprietary and Confidential Distributed systems are complex

8 Expedia Group Proprietary and Confidential Why is it hard

9 Expedia Group Proprietary and Confidential Distributed systems are never

10 Expedia Group Proprietary and Confidential Observability @jcchavezs - #oredev2019

11 Expedia Group Proprietary and Confidential What is Observability? [...]

12 Expedia Group Proprietary and Confidential What is Observability? Observability

13 Expedia Group Proprietary and Confidential What is Observability? Source:

14 Expedia Group Proprietary and Confidential Three pillars of observability

15 Expedia Group Proprietary and Confidential Three aggregates for signals

16 Expedia Group Proprietary and Confidential Why should we invest

17 Expedia Group Proprietary and Confidential Building observable systems

18 Expedia Group Proprietary and Confidential - On develop make

19 Expedia Group Proprietary and Confidential - When operating a

20 Expedia Group Proprietary and Confidential Building an observability culture

21 Expedia Group Proprietary and Confidential Ownership Landing observability in

22 Expedia Group Proprietary and Confidential Tooling Observability is not

23 Expedia Group Proprietary and Confidential Business value Observability can

24 Expedia Group Proprietary and Confidential Summary - Systems are

25 Expedia Group Proprietary and Confidential Thank you Q&A

26 Expedia Group Proprietary and Confidential See also - Does