Slide 1

Slide 1 text

Tracing Production Services at Stripe Aditya Mukerjee Systems Engineer at Stripe @chimeracoder

Slide 2

Slide 2 text

Tracing is about more than HTTP requests @chimeracoder

Slide 3

Slide 3 text

No content

Slide 4

Slide 4 text

https://veneur.org

Slide 5

Slide 5 text

@chimeracoder

Slide 6

Slide 6 text

It’s 3:07 AM @chimeracoder

Slide 7

Slide 7 text

Dashboard Count: 1 @chimeracoder

Slide 8

Slide 8 text

Dashboard Count: 2 @chimeracoder

Slide 9

Slide 9 text

Dashboard Count: 3 @chimeracoder

Slide 10

Slide 10 text

Dashboard Count: 4 @chimeracoder

Slide 11

Slide 11 text

@chimeracoder

Slide 12

Slide 12 text

If you need to look at logs, there’s a gap in your observability tools @chimeracoder

Slide 13

Slide 13 text

Dashboard Count: 5 @chimeracoder

Slide 14

Slide 14 text

Metrics/dashboards? Logs? Request traces? No context! Hard to aggregate! Require planning! @chimeracoder

Slide 15

Slide 15 text

Monitoring information is only as good as developers’ ability to predict the future @chimeracoder

Slide 16

Slide 16 text

@chimeracoder

Slide 17

Slide 17 text

@chimeracoder

Slide 18

Slide 18 text

@chimeracoder

Slide 19

Slide 19 text

@chimeracoder Application

Slide 20

Slide 20 text

What’s the difference? •If you squint, it’s hard to tell them apart •A log is a metric with “longer” information •A trace is a metric that allows “inner joins” @chimeracoder

Slide 21

Slide 21 text

What if we could have all three, all the time? @chimeracoder

Slide 22

Slide 22 text

Standard Sensor Format @chimeracoder

Slide 23

Slide 23 text

@chimeracoder

Slide 24

Slide 24 text

@chimeracoder

Slide 25

Slide 25 text

@chimeracoder Application

Slide 26

Slide 26 text

Tradeoffs: Stacking the Deck @chimeracoder

Slide 27

Slide 27 text

Distributed Collection @chimeracoder host1 host2 host3 Dashboard Tool

Slide 28

Slide 28 text

Aggregation @chimeracoder host1 host2 host3 Global Aggregator Dashboard Tool

Slide 29

Slide 29 text

Distributed Aggregation @chimeracoder host1 host2 host3 Dashboard Tool

Slide 30

Slide 30 text

Stacking the Deck Histogram: t-digests @chimeracoder

Slide 31

Slide 31 text

Let’s build the world we want to see @chimeracoder

Slide 32

Slide 32 text

It’s 3:07 AM @chimeracoder

Slide 33

Slide 33 text

@chimeracoder

Slide 34

Slide 34 text

Veneur in 2017 •High availability •Host-local metrics •Global aggregate metrics •Probabilistic data structures •… and more! Veneur in 2018 •Automatic cardinality detection •Cross-dashboard integration •Unified client instrumentation •… help us decide the rest! @chimeracoder

Slide 35

Slide 35 text

Thank you! https://veneur.org #veneur on Freenode Aditya Mukerjee @chimeracoder