Charity Majors on Scuba: Diving into Data at Facebook

Scuba: Diving Into Data at Facebook http://db.disi.unitn.eu/pages/VLDBProgram/pdf/industry/p767-wiener.pdf

@mipsytipsy engineer, cofounder, CEO

Scuba at Facebook Overview Limitations and tradeoffs What’s changed Why
you should care

Scuba at Facebook (Requirements) It must be *fast* It must
be (ridiculously) ﬂexible Fast and nearly-right is inﬁnitely better than slow and perfectly-right Should be usable by non-engineers Results must be live in a few seconds.

Scuba at Facebook (Implementation, Examples) Started sometime in 2011, to
improve dire mysql visibility hack(athon) upon hack(athon) Less than 10k lines of C++ Designed Evolved Changed my life (and not just me)

Paper overview (Introduction) “We used to rely on metrics and
pre aggregated time series, but our databases are exploding (and we don’t know why)” — Facebook, 2011

Embedded in this whitepaper are lots of clues about how
to do event-driven or distsys debugging at scale.

Paper overview (Use Cases) Site Reliability Bug Monitoring Performance Tuning
Trend Analysis Feature Adoption Ad Impressions, Clicks, Revenue A/B Testing Egress/Ingress Bug Report Monitoring Real Time Post Content Monitoring Pattern Mining … etc

This makes it … interactive. Exploratory. Ad hoc.

Paper overview (Ingestion/Distribution) Accept structured events. Timestamped by client, sample_rate
per row. No schema, no indexes. Wide and sparse ‘tables’. Pick two leaves for incoming write, send the batch to the leaf with more free memory. (load variance goes from O(log N) to O(1)) Delete old data at the same rate as new data is written.

Paper overview (Ingestion Diagram)

Paper overview (Query Execution) Root/Intermediate/Leaf aggregators & Leaf servers. Root:
parse and validate query, fanout to self+4 Intermediate: fanout to self+4 (until only talking to local Leaf Aggregator based on sum of records to tally) consolidate records and pass them back up to the Root Leaf server: full table scan lol

Paper overview (Query Execution)

Paper overview (Performance Model & Client Experiments) “We wanted to
write a whitepaper, so we slapped some sorta formal-looking stuff on here lol”

Ways in which we violate good computer science: many

Fucks given: few

Why you should care In the future, every system will
be a distributed system You don’t know what you don’t know You can’t predict what data you will need You NEED high cardinality tooling You need exploratory, ad hoc analysis for unknown unknowns Everything is a tradeoff, but these are better tradeoﬀs in the future.

<- scuba honeycomb ->

welcome to the future :)

Resources: 1. http://db.disi.unitn.eu/pages/VLDBProgram/pdf/industry/p767-wiener.pdf 2. https://research.fb.com/publications/scuba-diving-into-data-at-facebook/ 3. https://www.facebook.com/notes/facebook-engineering/under-the-hood- data-diving-with-scuba/10150599692628920/ 4. https://news.ycombinator.com/item?id=13463016
5. https://honeycomb.io, https://interana.com (related startups)

Charity Majors @mipsytipsy

Charity Majors on Scuba: Diving into Data at Fa...

Charity Majors on Scuba: Diving into Data at Facebook

Papers_We_Love

More Decks by Papers_We_Love

Other Decks in Technology

Featured

Transcript

Scuba: Diving Into Data at Facebook http://db.disi.unitn.eu/pages/VLDBProgram/pdf/industry/p767-wiener.pdf

@mipsytipsy engineer, cofounder, CEO

Me

Scuba at Facebook Overview Limitations and tradeoffs What’s changed Why

Scuba at Facebook (Requirements) It must be fast It must

Scuba at Facebook (Implementation, Examples) Started sometime in 2011, to

Paper overview (Introduction) “We used to rely on metrics and

Embedded in this whitepaper are lots of clues about how

Paper overview (Use Cases) Site Reliability Bug Monitoring Performance Tuning

This makes it … interactive. Exploratory. Ad hoc.

Paper overview (Ingestion/Distribution) Accept structured events. Timestamped by client, sample_rate