Slide 1

Slide 1 text

Unified Observability of Distributed Systems Aditya Mukerjee Systems Engineer at Stripe @chimeracoder

Slide 2

Slide 2 text

@chimeracoder

Slide 3

Slide 3 text

Why are we here? @chimeracoder

Slide 4

Slide 4 text

It’s 3:07 AM @chimeracoder

Slide 5

Slide 5 text

Dashboard Count: 1 @chimeracoder

Slide 6

Slide 6 text

Dashboard Count: 2 @chimeracoder

Slide 7

Slide 7 text

Dashboard Count: 3 @chimeracoder

Slide 8

Slide 8 text

@chimeracoder

Slide 9

Slide 9 text

Dashboard Count: 4 @chimeracoder

Slide 10

Slide 10 text

@chimeracoder

Slide 11

Slide 11 text

@chimeracoder

Slide 12

Slide 12 text

What tools can we use? Metrics/dashboards? Logs? Request traces? No context! Hard to aggregate! Require planning! @chimeracoder

Slide 13

Slide 13 text

Monitoring information is only as good as developers’ ability to predict the future @chimeracoder

Slide 14

Slide 14 text

@chimeracoder

Slide 15

Slide 15 text

@chimeracoder

Slide 16

Slide 16 text

@chimeracoder

Slide 17

Slide 17 text

@chimeracoder Application

Slide 18

Slide 18 text

What’s the difference? •If you squint, it’s hard to tell them apart •A log is a metric with “longer” information •A trace is a metric that allows “inner joins” @chimeracoder

Slide 19

Slide 19 text

What if we could have all three, all the time? @chimeracoder

Slide 20

Slide 20 text

Standard Sensor Format @chimeracoder

Slide 21

Slide 21 text

@chimeracoder

Slide 22

Slide 22 text

@chimeracoder

Slide 23

Slide 23 text

@chimeracoder Application

Slide 24

Slide 24 text

Integrated Views @chimeracoder

Slide 25

Slide 25 text

@chimeracoder

Slide 26

Slide 26 text

@chimeracoder

Slide 27

Slide 27 text

@chimeracoder

Slide 28

Slide 28 text

Flexibility and Data Migrations @chimeracoder

Slide 29

Slide 29 text

@chimeracoder Application B A C

Slide 30

Slide 30 text

“Because we’re in control of our pipeline, we could add a new data backend or migrate vendors without having to touch our application code at all.” “Having ownership over our pipeline gave us trust in our data. It made us confident that we we hadn’t overlooked any parts of the migration process.” @chimeracoder

Slide 31

Slide 31 text

Tradeoffs: Stacking the Deck @chimeracoder

Slide 32

Slide 32 text

Distributed Collection @chimeracoder host1 host2 host3 Dashboard Tool

Slide 33

Slide 33 text

Aggregation @chimeracoder host1 host2 host3 Global Aggregator Dashboard Tool

Slide 34

Slide 34 text

Distributed Aggregation @chimeracoder host1 host2 host3 Dashboard Tool

Slide 35

Slide 35 text

Stacking the Deck Histogram: t-digests @chimeracoder

Slide 36

Slide 36 text

Trying out Veneur •Free and open source! http://github.com/stripe/veneur •Six-week release cycle • Drop-in support for statsd, Graphite, Datadog, SignalFx, Prometheus, and more •Native Kubernetes support •Public images on Docker Hub @chimeracoder

Slide 37

Slide 37 text

@chimeracoder

Slide 38

Slide 38 text

@chimeracoder

Slide 39

Slide 39 text

@chimeracoder

Slide 40

Slide 40 text

Veneur in 2017 • High availability • Host-local metrics • Global aggregate metrics • Sketching data structures • … and more! Veneur in 2018 • Automatic cardinality detection • Expanded cross-dashboard integration • Unified client instrumentation • … help us decide the rest! @chimeracoder

Slide 41

Slide 41 text

Let’s build the world we want to see @chimeracoder

Slide 42

Slide 42 text

Thank you! https://github.com/stripe/veneur #veneur on Freenode Aditya Mukerjee @chimeracoder @chimeracoder

Slide 43

Slide 43 text

References @chimeracoder