Slide 1

Slide 1 text

Map & Territory a story of visibility

Slide 2

Slide 2 text

Pierre-Yves @pyr https://github.com/pyr

Slide 3

Slide 3 text

https://exoscale.ch

Slide 4

Slide 4 text

Visibility

Slide 5

Slide 5 text

How do we work ?

Slide 6

Slide 6 text

No content

Slide 7

Slide 7 text

How do we improve?

Slide 8

Slide 8 text

Avoid Shortcuts!

Slide 9

Slide 9 text

No content

Slide 10

Slide 10 text

We want lower defect rates

Slide 11

Slide 11 text

We want to make informed decisions

Slide 12

Slide 12 text

Design Build Live

Slide 13

Slide 13 text

Visibility

Slide 14

Slide 14 text

Extracting meaningful state data from heterogeneous event sources, over time

Slide 15

Slide 15 text

Meaningful (relates to business value)

Slide 16

Slide 16 text

State Data (structured payload)

Slide 17

Slide 17 text

Heterogeneous (everyone is involved)

Slide 18

Slide 18 text

Over time (tracking)

Slide 19

Slide 19 text

How does it help my system's lifecycle ?

Slide 20

Slide 20 text

Map =/= Territory

Slide 21

Slide 21 text

Break out of our mental model

Slide 22

Slide 22 text

"I'll push this minor change, it cannot do any harm"

Slide 23

Slide 23 text

No content

Slide 24

Slide 24 text

"I'll just add this static route"

Slide 25

Slide 25 text

No content

Slide 26

Slide 26 text

Better lifecycle Informed decisions Better maps

Slide 27

Slide 27 text

Systems are (increasingly) complex

Slide 28

Slide 28 text

Web Infrastructure circa 00 (2 servers)

Slide 29

Slide 29 text

Visibility Circa '00

Slide 30

Slide 30 text

Web Infrastructure circa '12 (27 nodes)

Slide 31

Slide 31 text

No content

Slide 32

Slide 32 text

Visibility Circa '12

Slide 33

Slide 33 text

Q: how is business doing today ? A:

Slide 34

Slide 34 text

Q: how is business doing today ? A: based on these key metrics we're looking good

Slide 35

Slide 35 text

Figure out those key metrics

Slide 36

Slide 36 text

We need appropriate tooling

Slide 37

Slide 37 text

events across: system, components, software

Slide 38

Slide 38 text

The event stream approach

Slide 39

Slide 39 text

Plenty of small producers Few big consumers

Slide 40

Slide 40 text

Production: Anything that happens or moves (logs too!): Normalize & Stream

Slide 41

Slide 41 text

Consumption: Aggregate Correlate Decide

Slide 42

Slide 42 text

Aggregation compute compound metrics (ratios, sums)

Slide 43

Slide 43 text

Correlation

Slide 44

Slide 44 text

Decision track, alert, ignore, scale

Slide 45

Slide 45 text

Implementing on premise, saas or in between ?

Slide 46

Slide 46 text

SaaS loggly, papertrail, librato, datadog, ...

Slide 47

Slide 47 text

On Premise collectd, logstash, graphite, statsd, riemann

Slide 48

Slide 48 text

The path to visibility: Find key metrics Find the right tools Rely on an event stream Involve everyone Challenge your mental model Hopefully, improve quality and lower defect rates in the process!

Slide 49

Slide 49 text

Questions ?