Slide 1

Slide 1 text

@rakyll monitoring and debugging containerized systems Jaana B. Dogan, Google jbd@google.com

Slide 2

Slide 2 text

@rakyll me overly frustrated engineer 15+ years in networking systems making systems more reliable

Slide 3

Slide 3 text

@rakyll the new old monitoring? (maybe)

Slide 4

Slide 4 text

@rakyll systems are growing... and you are not in control

Slide 5

Slide 5 text

@rakyll bare metal kernel network stack cloud stack libraries frameworks your code

Slide 6

Slide 6 text

@rakyll

Slide 7

Slide 7 text

@rakyll complexity is inevitable

Slide 8

Slide 8 text

@rakyll container

Slide 9

Slide 9 text

@rakyll container

Slide 10

Slide 10 text

@rakyll container container

Slide 11

Slide 11 text

@rakyll container container

Slide 12

Slide 12 text

@rakyll container container message queue

Slide 13

Slide 13 text

@rakyll container container storage/database

Slide 14

Slide 14 text

@rakyll container container load balancer location=us-west location=europe-central

Slide 15

Slide 15 text

@rakyll host host container container load balancer

Slide 16

Slide 16 text

@rakyll container container container container container orchestrated hot mess

Slide 17

Slide 17 text

@rakyll areas of issues: - lack of locality - networking - scheduling - dependencies

Slide 18

Slide 18 text

@rakyll bare metal kernel network stack cloud stack libraries frameworks your code

Slide 19

Slide 19 text

@rakyll “my job is done here”

Slide 20

Slide 20 text

@rakyll after going to production... 1. monitor 2. alert 3. troubleshoot 4. fix

Slide 21

Slide 21 text

@rakyll

Slide 22

Slide 22 text

@rakyll load balancer

Slide 23

Slide 23 text

@rakyll load balancer critical path

Slide 24

Slide 24 text

@rakyll discovering critical paths making them reliable then fast making them debuggable

Slide 25

Slide 25 text

@rakyll

Slide 26

Slide 26 text

@rakyll Latency Numbers Every Programmer Should Know by Jeff Dean

Slide 27

Slide 27 text

@rakyll

Slide 28

Slide 28 text

@rakyll ping pong pongservice:6996 project: ping the pong server.

Slide 29

Slide 29 text

@rakyll opencensus.io

Slide 30

Slide 30 text

@rakyll not my team!

Slide 31

Slide 31 text

@rakyll where is the source code?

Slide 32

Slide 32 text

@rakyll who to page?

Slide 33

Slide 33 text

@rakyll who to page?

Slide 34

Slide 34 text

@rakyll give me the logs, runtime events, profiles...

Slide 35

Slide 35 text

@rakyll

Slide 36

Slide 36 text

@rakyll

Slide 37

Slide 37 text

@rakyll

Slide 38

Slide 38 text

@rakyll http://server:9999/tracez

Slide 39

Slide 39 text

@rakyll challenges...

Slide 40

Slide 40 text

@rakyll no wire standards

Slide 41

Slide 41 text

@rakyll

Slide 42

Slide 42 text

@rakyll traceparent: --- Example: traceparent: 00-0af7651916cd43dd8448eb211c80319c-b7ad6b7169203331-01

Slide 43

Slide 43 text

@rakyll no export standards

Slide 44

Slide 44 text

@rakyll areas of issues: - locality - networking - scheduling - dependencies

Slide 45

Slide 45 text

@rakyll fin jbd@google.com