Slide 1

Slide 1 text

No content

Slide 2

Slide 2 text

No content

Slide 3

Slide 3 text

No content

Slide 4

Slide 4 text

systems? who does that?

Slide 5

Slide 5 text

jaana b. dogan 6+ years at Google, touched many projects

Slide 6

Slide 6 text

early days (of a company)

Slide 7

Slide 7 text

No content

Slide 8

Slide 8 text

growing...

Slide 9

Slide 9 text

what growth looks like service A service B service C service D service E service auth email

Slide 10

Slide 10 text

one becomes many failure in isolation who to ping in failure?

Slide 11

Slide 11 text

and it goes larger...

Slide 12

Slide 12 text

good guy jeff

Slide 13

Slide 13 text

No content

Slide 14

Slide 14 text

No content

Slide 15

Slide 15 text

code search

Slide 16

Slide 16 text

go_library( name = "logs", srcs = ["logs.go"], visibility = ["//visibility:public"], deps = [ …. ], ) References (641 occurrences) - //source/ads/monitoring/BUILD - //source/ads/analysis/BUILD - //source/ads/mobile/BUILD ...

Slide 17

Slide 17 text

frontend server authentication users images memcache blobservice memcache memcache (metadata) (disks) load balancer

Slide 18

Slide 18 text

frontend server authentication users images memcache blobservice memcache memcache (metadata) (disks) load balancer critical path

Slide 19

Slide 19 text

cpdd (critical path driven development)

Slide 20

Slide 20 text

discover the critical paths make them reliable and fast make them debuggable

Slide 21

Slide 21 text

how do we get there? events or tracing

Slide 22

Slide 22 text

why? why? why? why? why?

Slide 23

Slide 23 text

GET /timeline edge-lb sched api-server auth.Auth cache.Get mysql.Query user.Profile cache.Get mysql.Query images.Filter blobstore.Get

Slide 24

Slide 24 text

bare metal kernel process scheduler network stack cloud stack user process frameworks your code

Slide 25

Slide 25 text

GET /timeline edge-lb sched api-server auth.Auth cache.Get mysql.Query user.Profile cache.Get mysql.Query images.Filter blobstore.Get not my fault

Slide 26

Slide 26 text

GET /timeline auth.Auth cache.Get mysql.Query user.Profile cache.Get mysql.Query images.Filter blobstore.Get cache.Get mysql.Query blob.Get where is the source code?

Slide 27

Slide 27 text

GET /timeline auth.Auth cache.Get mysql.Query user.Profile cache.Get mysql.Query images.Filter blobstore.Get cache.Get mysql.Query blob.Get who to call?

Slide 28

Slide 28 text

GET /timeline auth.Auth cache.Get mysql.Query user.Profile cache.Get mysql.Query images.Filter blobstore.Get cache.Get mysql.Query blob.Get give me the logs, runtime events, profiles...

Slide 29

Slide 29 text

challenges...

Slide 30

Slide 30 text

this is an organizational problem CPDD CHALLENGE #1:

Slide 31

Slide 31 text

github.com/w3c/distributed-tracing

Slide 32

Slide 32 text

engineers don’t know where to start CPDD CHALLENGE #2:

Slide 33

Slide 33 text

infra is still a blackbox CPDD CHALLENGE #3:

Slide 34

Slide 34 text

instrumentation is expensive CPDD CHALLENGE #4:

Slide 35

Slide 35 text

dynamic capabilities are underestimated CPDD CHALLENGE #5:

Slide 36

Slide 36 text

No content

Slide 37

Slide 37 text

cpdd: a tool to close knowledge gaps (which we don’t talk about)

Slide 38

Slide 38 text