Wide Event Analytics (LISA19)

Slide 1

Slide 1 text

wide event analytics @igorwhilefalse

Slide 2

Slide 2 text

hello!

Slide 3

Slide 3 text

@igorwhilefalse

Slide 4

Slide 4 text

gentle constructive rant

Slide 5

Slide 5 text

debugging large scale systems using events

Slide 6

Slide 6 text

understanding system behaviour

Slide 7

Slide 7 text

events column store analytical queries { k: v } SELECT ... GROUP BY users app

Slide 8

Slide 8 text

events column store analytical queries { k: v } SELECT ... GROUP BY users app you are here

Slide 9

Slide 9 text

software is becoming increasingly complex

Slide 10

Slide 10 text

72 3. WSC HARDWARE BUILDING BLOCKS D/F D/F D/F D/F D/F D/F DRAM DRAM DRAM L1$ + L2$ L1$ + L2$ LAST-LEVEL CACHE L1$ + L2$ L1$ + L2$ LAST-LEVEL CACHE LOCAL DRAM RACK SWITCH DATACENTER FABRIC DISK/FL ASH DRAM DRAM DRAM DRAM D/F D/F D/F D/F D/F D/F D/F D/F DRAM DRAM DRAM DRAM DRAM DRAM DRAM D/F D/F P P P P DRAM ONE SERVER DRAM: 256GB, 100ns, 150GB/s DISK: 80TB, 10ms, 800MB/s FLASH: 4TB, 100us, 3GB/s DRAM LOCAL RACK (40 SERVERS) DRAM: 10TB, 20us, 5GB/s DISK: 3.2PB, 10ms, 5GB/s FLASH: 160TB, 120us, 5GB/s DRAM CLUSTER (125 RACKS) DRAM: 1.28PB, 50us, 1.2GB/s DISK: 400PB, 10ms, 1.2GB/s FLASH: 20PB, 150us, 1.2GB/s Figure 3.15: Storage hierarchy of a WSC. The Datacenter as a Computer, Barroso et al

Slide 11

Slide 11 text

Jaeger, Uber

Slide 12

Slide 12 text

Philippe M Desveaux

Slide 13

Slide 13 text

Alexandre Baron

Slide 14

Slide 14 text

logs vs metrics: a false dichotomy Nick Stenning

Slide 15

Slide 15 text

10.2.3.4 - - [1/Jan/1970:18:32:20 +0000] "GET / HTTP/1.1" 200 5324 "-" "curl/7.54.0" "-"

Slide 16

Slide 16 text

Honeycomb

Slide 17

Slide 17 text

we can derive metrics from log streams

Slide 18

Slide 18 text

$ cat access.log | grep ... | awk ... | sort | uniq -c

Slide 19

Slide 19 text

{ time = "1970-01-01T18:32:20" status = 200 method = "GET" path = ... host = "i-123456af" client_ip = "10.2.3.4" user_agent = "curl/7.54.0" request_dur_ms = 325 request_bytes = 2456 response_bytes = 5324 }

Slide 20

Slide 20 text

structured logs summary events canonical log lines arbitrarily wide data blobs

Slide 21

Slide 21 text

~ events ~

Slide 22

Slide 22 text

a metric is an aggregation of events

Slide 23

Slide 23 text

why do we aggregate?

Slide 24

Slide 24 text

No content

Slide 25

Slide 25 text

No content

Slide 26

Slide 26 text

count p50 p99 max histogram

Slide 27

Slide 27 text

events column store analytical queries { k: v } SELECT ... GROUP BY users app you are here

Slide 28

Slide 28 text

prometheus and the problem with metrics

Slide 29

Slide 29 text

No content

Slide 30

Slide 30 text

domaso

Slide 31

Slide 31 text

"it's slow"

Slide 32

Slide 32 text

Honeycomb

Slide 33

Slide 33 text

p99(request_latency) > 1000ms

Slide 34

Slide 34 text

300 requests were slow  ... which ones?!

Slide 35

Slide 35 text

group by

Slide 36

Slide 36 text

most monitoring questions are ✨top-k

Slide 37

Slide 37 text

top traffic by IP address top resource usage by customer top latency by country top error count by host top request size by client

Slide 38

Slide 38 text

how many users are impacted?

Slide 39

Slide 39 text

SELECT user_id, COUNT(*) FROM requests WHERE request_latency >= 1000 GROUP BY user_id

Slide 40

Slide 40 text

metrics will not tell you this

Slide 41

Slide 41 text

✨ cardinality

Slide 42

Slide 42 text

Honeycomb

Slide 43

Slide 43 text

Honeycomb

Slide 44

Slide 44 text

http_requests_total{status=200} http_requests_total{status=201} http_requests_total{status=301} http_requests_total{status=304} ... http_requests_total{status=503} 10

Slide 45

Slide 45 text

user_id 10k

Slide 46

Slide 46 text

ip address space = 2^32  4 billion possible values 100k

Slide 47

Slide 47 text

kubectl get pods 100

Slide 48

Slide 48 text

build_id 100

Slide 49

Slide 49 text

the curse of dimensionality

Slide 50

Slide 50 text

{ status = 200 method = "GET" path = ... host = "i-123456af" zone = "eu-central-1a" client_ip = "10.2.3.4" user_agent = "curl/7.54.0" client_country = "de" user_id = 30032 partition_id = 31  build_id = "9045e1" customer_plan = "platinum" endpoint = "tweet_detail" }

Slide 51

Slide 51 text

Slide 52

Slide 52 text

10 5 300 20 5 ✖ ✖ ✖ ✖ = 172'800'000'000 000'000'000 1k 300 20 1k 32 10 3 20 ✖ ✖ ✖ ✖ ✖ ✖ ✖ ✖

Slide 53

Slide 53 text

TheUjulala

Slide 54

Slide 54 text

events column store analytical queries { k: v } SELECT ... GROUP BY users app you are here

Slide 55

Slide 55 text

recording events

Slide 56

Slide 56 text

No content

Slide 57

Slide 57 text

{ time = "1970-01-01T18:32:20" status = 200 method = "GET" path = ... host = "i-123456af" region = "eu-central-1" zone = "eu-central-1a" client_ip = "10.2.3.4" user_agent = "curl/7.54.0" client_country = "de" kernel = "5.0.0-1018-aws" user_id = 30032 tweet_id = 2297111098 partition_id = 31  build_id = "9045e1"  request_id = "f2a3bdc4" customer_plan = "platinum" feature_blub = true cache = "miss" endpoint = "tweet_detail" request_dur_ms = 325 db_dur_ms = 5 db_pool_dur_ms = 3 db_query_count = 63 cache_dur_ms = 2 svc_a_dur_ms = 32 svc_b_dur_ms = 90 request_bytes = 2456 response_bytes = 5324 }

Slide 58

Slide 58 text

Slide 59

Slide 59 text

{ user_id = 30032 tweet_id = 2297111098 partition_id = 31  build_id = "9045e1"  request_id = "f2a3bdc4" customer_plan = "platinum" feature_blub = true cache = "miss" endpoint = "tweet_detail" }

Slide 60

Slide 60 text

{ request_dur_ms = 325 db_dur_ms = 5 db_pool_dur_ms = 3 db_query_count = 63 cache_dur_ms = 2 svc_a_dur_ms = 32 svc_b_dur_ms = 90 request_bytes = 2456 response_bytes = 5324 }

Slide 61

Slide 61 text

Jaeger, Uber

Slide 62

Slide 62 text

traces vs events: a false dichotomy

Slide 63

Slide 63 text

we can derive events from traces

Slide 64

Slide 64 text

Canopy Events (a) Engineers instrument Facebook components using a range of dierent Canopy instrumentation APIs ( ). At runtime, requests traverse components ( ) and propagate aTraceID ( ); when requests trigger instrumentation, Canopy generates and emits events ( ). Event Aggregation Model Construction Feature Extraction Query Evaluation Query Results, Visualizations, Graphs, etc. Raw Trace Events Trace Datasets Trace Model Canopy Engineers Feature Lambdas Performance Engineers Dataset Queries Any Facebook Engineer (b) Canopy’s tailer aggregates events ( ), constructs model-based traces ( ), evaluates user-supplied feature extraction functions ( ), and pipes output to user-defined datasets (). Users subsequently run queries, view dashboards and explore datasets ( ,). traces that unies the dier APIs used by Facebook dev user-supplied feature lambd interesting features from ea their feature lambdas with a es predicates for ltering un where to output the extracte are piped to Scuba [], an in performance data. Finally, Facebook enginee view visualizations and das ( ). In addition to user-con several shared datasets and v high-level features, plus tools lying traces if deeper inspec . Instrumentation API Instrumentation broadly com the TraceID alongside reques formance data generated by d the request structure, e.g. wh between threads and compo tion; and ) capturing usefu statements, performance cou Each Canopy instrumenta Canopy, Facebook

Slide 65

Slide 65 text

No content

Slide 66

Slide 66 text

stick those events in kafka

Slide 67

Slide 67 text

events column store analytical queries { k: v } SELECT ... GROUP BY users app you are here

Slide 68

Slide 68 text

columnar storage changed my life

Slide 69

Slide 69 text

[Dea09 ]. These rough operation latencies help engineers reason about throughput, latency, and ca- pacity within a first-order approximation. We have updated the numbers here to reflect technology and hardware changes in WSC. Table 2.3: Latency numbers that every WSC engineer should know. (Updated version of table from [Dea09 ].) Operation Time L1 cache reference 1.5 ns L2 cache reference 5 ns Branch misprediction 6 ns Uncontended mutex lock/unlock 20 ns L3 cache reference 25 ns Main memory reference 100 ns Decompress 1 KB with Snappy [Sna] 500 ns “Far memory”/Fast NVM reference 1,000 ns (1us) Compress 1 KB with Snappy [Sna] 2,000 ns (2us) Read 1 MB sequentially from memory 12,000 ns (12 us) SSD Random Read 100,000 ns (100 us) Read 1 MB bytes sequentially from SSD 500,000 ns (500 us) Read 1 MB sequentially from 10Gbps network 1,000,000 ns (1 ms) Read 1 MB sequentially from disk 10,000,000 ns (10 ms) Disk seek 10,000,000 ns (10 ms) Send packet California→Netherlands→California 150,000,000 ns (150 ms) The Datacenter as a Computer, Barroso et al

Slide 70

Slide 70 text

• 1TB Hitachi Deskstar 7K1000 • disk seek time = 14ms • transfer rate = 69MB/s • 62.5 billion rows (= 1TB / 16 bytes) • 28 years (= 62.5 billion rows * 14 ms/row / 32×10^9 ms/year) The Trouble with Point Queries, Bradley C. Kuszmaul

Slide 71

Slide 71 text

• 1TB Hitachi Deskstar 7K1000 • transfer rate = 69MB/s • 4 hours (= 1.000.000MB / 69MB/s / 3600 s/hour)

Slide 72

Slide 72 text

• SSD • transfer rate = 1GB/s • 15 minutes (= 1.000GB / 1GB/s / 60 s/min)

Slide 73

Slide 73 text

10GB

Slide 74

Slide 74 text

Dremel: Interactive Analysis of Web-Scale Datasets, Google

Slide 75

Slide 75 text

10 GB / 8 bytes per data point = 1.3 billion events

Slide 76

Slide 76 text

status 200 200 200 200 404 200 200 200 404 200 status 4 * 200 404 3 * 200 404 200

Slide 77

Slide 77 text

time-based partitioning

Slide 78

Slide 78 text

dynamic sampling

Slide 79

Slide 79 text

it's lossy, but that's fine

Slide 80

Slide 80 text

vectorized processing

Slide 81

Slide 81 text

Scuba: Diving into Data at Facebook, Facebook

Slide 82

Slide 82 text

sequential scans ✖ columnar layout ✖ time-based partitioning ✖ compression / sampling ✖ vectorized processing ✖ sharding

Slide 83

Slide 83 text

No content

Slide 84

Slide 84 text

putting it all together

Slide 85

Slide 85 text

events column store analytical queries { k: v } SELECT ... GROUP BY users app

Slide 86

Slide 86 text

we need more of this in the monitoring space!

Slide 87

Slide 87 text

SELECT user_id, COUNT(*) FROM requests WHERE status >= 500 GROUP BY user_id ORDER BY COUNT(*) DESC LIMIT 10

Slide 88

Slide 88 text

✨ top-k ✨ cardinality ✨ events

Slide 89

Slide 89 text

No content

Slide 90

Slide 90 text

• Dremel: Interactive Analysis of Web-Scale Datasets from Google, 2010 • Scuba: Diving into Data at Facebook from Facebook, 2016 • Canopy: An End-to-End Performance Tracing And Analysis System from Facebook, 2017 • Look at Your Data by John Rauser, Velocity 2011 • Observability for Emerging Infra by Charity Majors, Strange Loop 2017 • Why We Built Our Own Distributed Column Store by Sam Stokes, Strange Loop 2017 • The Design and Implementation of Modern Column-Oriented Database Systems by Abadi et al, 2013 • Designing Data-Intensive Applications by Martin Kleppmann, 2017 • Monitoring in the time of Cloud Native by Cindy Sridharan, 2017 • Logs vs. metrics: a false dichotomy by Nick Stenning, 2019 • Using Canonical Log Lines for Online Visibility by Brandur Leach, 2016 • The Datacenter as a Computer: Designing Warehouse-Scale Machines by Barroso et al, 2018

Slide 91

Slide 91 text

@igorwhilefalse hi@igor .io