Charity Majors on Scuba: Diving into Data at Facebook

by Papers_We_Love

Slide 1

Slide 1 text

Scuba: Diving Into Data at Facebook http://db.disi.unitn.eu/pages/VLDBProgram/pdf/industry/p767-wiener.pdf

Slide 2

Slide 2 text

@mipsytipsy engineer, cofounder, CEO

Slide 3

Slide 3 text

No content

Slide 4

Slide 4 text

Slide 5

Slide 5 text

Scuba at Facebook Overview Limitations and tradeoffs What’s changed Why you should care

Slide 6

Slide 6 text

Scuba at Facebook (Requirements) It must be *fast* It must be (ridiculously) ﬂexible Fast and nearly-right is inﬁnitely better than slow and perfectly-right Should be usable by non-engineers Results must be live in a few seconds.

Slide 7

Slide 7 text

Scuba at Facebook (Implementation, Examples) Started sometime in 2011, to improve dire mysql visibility hack(athon) upon hack(athon) Less than 10k lines of C++ Designed Evolved Changed my life (and not just me)

Slide 8

Slide 8 text

Paper overview (Introduction) “We used to rely on metrics and pre aggregated time series, but our databases are exploding (and we don’t know why)” — Facebook, 2011

Slide 9

Slide 9 text

Embedded in this whitepaper are lots of clues about how to do event-driven or distsys debugging at scale.

Slide 10

Slide 10 text

Paper overview (Use Cases) Site Reliability Bug Monitoring Performance Tuning Trend Analysis Feature Adoption Ad Impressions, Clicks, Revenue A/B Testing Egress/Ingress Bug Report Monitoring Real Time Post Content Monitoring Pattern Mining … etc

Slide 11

Slide 11 text

This makes it … interactive. Exploratory. Ad hoc.

Slide 12

Slide 12 text

Paper overview (Ingestion/Distribution) Accept structured events. Timestamped by client, sample_rate per row. No schema, no indexes. Wide and sparse ‘tables’. Pick two leaves for incoming write, send the batch to the leaf with more free memory. (load variance goes from O(log N) to O(1)) Delete old data at the same rate as new data is written.

Slide 13

Slide 13 text

Paper overview (Ingestion Diagram)

Slide 14

Slide 14 text

Paper overview (Query Execution) Root/Intermediate/Leaf aggregators & Leaf servers. Root: parse and validate query, fanout to self+4 Intermediate: fanout to self+4 (until only talking to local Leaf Aggregator based on sum of records to tally) consolidate records and pass them back up to the Root Leaf server: full table scan lol

Slide 15

Slide 15 text

Paper overview (Query Execution)

Slide 16

Slide 16 text

Paper overview (Performance Model & Client Experiments) “We wanted to write a whitepaper, so we slapped some sorta formal-looking stuff on here lol”

Slide 17

Slide 17 text

Paper overview (Performance Model & Client Experiments) “We wanted to write a whitepaper, so we slapped some sorta formal-looking stuff on here lol”

Slide 18

Slide 18 text

Ways in which we violate good computer science: many

Slide 19

Slide 19 text

Fucks given: few

Slide 20

Slide 20 text

Why you should care In the future, every system will be a distributed system You don’t know what you don’t know You can’t predict what data you will need You NEED high cardinality tooling You need exploratory, ad hoc analysis for unknown unknowns Everything is a tradeoff, but these are better tradeoﬀs in the future.

Slide 21

Slide 21 text

<- scuba honeycomb ->

Slide 22

Slide 22 text

welcome to the future :)

Slide 23

Slide 23 text

Resources: 1. http://db.disi.unitn.eu/pages/VLDBProgram/pdf/industry/p767-wiener.pdf 2. https://research.fb.com/publications/scuba-diving-into-data-at-facebook/ 3. https://www.facebook.com/notes/facebook-engineering/under-the-hood- data-diving-with-scuba/10150599692628920/ 4. https://news.ycombinator.com/item?id=13463016 5. https://honeycomb.io, https://interana.com (related startups)

Slide 24

Slide 24 text

Charity Majors @mipsytipsy