Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Charity Majors on Scuba: Diving into Data at Facebook

Charity Majors on Scuba: Diving into Data at Facebook

Facebook takes performance monitoring seriously. Performance issues can impact over one billion users so we track thousands of servers, hundreds of PB of daily network traffic, hundreds of daily code changes, and many other metrics. We require latencies of under a minute from events occuring (a client request on a phone, a bug report filed, a code change checked in) to graphs showing those events on developers’ monitors.

Scuba is the data management system Facebook uses for most real-time analysis. Scuba is a fast, scalable, distributed, in-memory database built at Facebook. It currently ingests millions of rows (events) per second and expires data at the same rate. Scuba stores data completely in memory on hundreds of servers each with 144 GB RAM. To process each query, Scuba aggregates data from all servers. Scuba processes almost a million queries per day. Scuba is used extensively for interactive, ad hoc, analysis queries that run in under a second over live data. In addition, Scuba is the workhorse behind Facebook’s code regression analysis, bug report monitoring, ads revenue monitoring, and performance debugging.

Papers_We_Love

June 26, 2017
Tweet

More Decks by Papers_We_Love

Other Decks in Technology

Transcript

  1. Scuba: Diving Into Data at Facebook
    http://db.disi.unitn.eu/pages/VLDBProgram/pdf/industry/p767-wiener.pdf

    View full-size slide

  2. @mipsytipsy
    engineer, cofounder, CEO

    View full-size slide

  3. Scuba at Facebook
    Overview
    Limitations and tradeoffs
    What’s changed
    Why you should care

    View full-size slide

  4. Scuba at Facebook (Requirements)
    It must be *fast*
    It must be (ridiculously) flexible
    Fast and nearly-right is infinitely better than slow and perfectly-right
    Should be usable by non-engineers
    Results must be live in a few seconds.

    View full-size slide

  5. Scuba at Facebook (Implementation, Examples)
    Started sometime in 2011, to improve dire mysql visibility
    hack(athon) upon hack(athon)
    Less than 10k lines of C++
    Designed Evolved
    Changed my life (and not just me)

    View full-size slide

  6. Paper overview (Introduction)
    “We used to rely on metrics and pre aggregated
    time series, but our databases are exploding (and
    we don’t know why)” — Facebook, 2011

    View full-size slide

  7. Embedded in this whitepaper are lots of clues about
    how to do event-driven or distsys debugging at scale.

    View full-size slide

  8. Paper overview (Use Cases)
    Site Reliability
    Bug Monitoring
    Performance Tuning
    Trend Analysis
    Feature Adoption
    Ad Impressions, Clicks, Revenue
    A/B Testing
    Egress/Ingress
    Bug Report Monitoring
    Real Time Post Content Monitoring
    Pattern Mining
    … etc

    View full-size slide

  9. This makes it … interactive. Exploratory. Ad hoc.

    View full-size slide

  10. Paper overview (Ingestion/Distribution)
    Accept structured events. Timestamped by client, sample_rate per row.
    No schema, no indexes. Wide and sparse ‘tables’.
    Pick two leaves for incoming write, send the batch to the leaf with
    more free memory. (load variance goes from O(log N) to O(1))
    Delete old data at the same rate as new data is written.

    View full-size slide

  11. Paper overview (Ingestion Diagram)

    View full-size slide

  12. Paper overview (Query Execution)
    Root/Intermediate/Leaf aggregators & Leaf servers.
    Root: parse and validate query, fanout to self+4
    Intermediate: fanout to self+4 (until only talking to local Leaf Aggregator based
    on sum of records to tally)
    consolidate records and pass them back up to the Root
    Leaf server: full table scan lol

    View full-size slide

  13. Paper overview (Query Execution)

    View full-size slide

  14. Paper overview (Performance Model & Client Experiments)
    “We wanted to write a whitepaper, so we slapped
    some sorta formal-looking stuff on here lol”

    View full-size slide

  15. Paper overview (Performance Model & Client Experiments)
    “We wanted to write a whitepaper, so we slapped
    some sorta formal-looking stuff on here lol”

    View full-size slide

  16. Ways in which we violate good
    computer science: many

    View full-size slide

  17. Fucks given: few

    View full-size slide

  18. Why you should care
    In the future, every system will be a distributed system
    You don’t know what you don’t know
    You can’t predict what data you will need
    You NEED high cardinality tooling
    You need exploratory, ad hoc analysis for unknown unknowns
    Everything is a tradeoff, but these are better tradeoffs in the future.

    View full-size slide

  19. <- scuba
    honeycomb ->

    View full-size slide

  20. welcome to the future :)

    View full-size slide

  21. Resources:
    1. http://db.disi.unitn.eu/pages/VLDBProgram/pdf/industry/p767-wiener.pdf

    2. https://research.fb.com/publications/scuba-diving-into-data-at-facebook/

    3. https://www.facebook.com/notes/facebook-engineering/under-the-hood-
    data-diving-with-scuba/10150599692628920/

    4. https://news.ycombinator.com/item?id=13463016

    5. https://honeycomb.io, https://interana.com (related startups)

    View full-size slide

  22. Charity Majors
    @mipsytipsy

    View full-size slide