Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Gwen Shapira on Realtime Data Processing at Facebook

Gwen Shapira on Realtime Data Processing at Facebook

Realtime data processing powers many use cases at Facebook, including realtime reporting of the aggregated, anonymized voice of Facebook users, analytics for mobile applications, and insights for Facebook page administrators. Many companies have developed their own systems; we have a realtime data processing ecosystem at Facebook that handles hundreds of Gigabytes per second across hundreds of data pipelines.

Many decisions must be made while designing a realtime stream processing system. In this paper, we identify five important design decisions that affect their ease of use, performance, fault tolerance, scalability, and correctness. We compare the alternative choices for each decision and contrast what we built at Facebook to other published systems.

Our main decision was targeting seconds of latency, not milliseconds. Seconds is fast enough for all of the use cases we support and it allows us to use a persistent message bus for data transport. This data transport mechanism then paved the way for fault tolerance, scalability, and multiple options for correctness in our stream processing systems Puma, Swift, and Stylus...

Papers_We_Love

June 26, 2017
Tweet

More Decks by Papers_We_Love

Other Decks in Technology

Transcript

  1. 5 This is NOT The one true architecture . Please

    don’t cargo-cult this paper
  2. 6 Few real-time systems at Facebook • Chorus – aggregate

    trends • Realtime feedback for mobile app developers • Page analytics – likes, engagement… • Offload CPU-intensive dashboard queries
  3. 7

  4. 8

  5. 9

  6. 13 Decision #1 – Language Paradigm • Declarative (SQL) –

    easy & limited • Functional • Procedural (C++, Java, Python) – most flexibility, control, performance. Longer dev cycle.
  7. 14 Decision #1 – Language Paradigm • Declarative (SQL) –

    easy & limited • Functional • Procedural (C++, Java, Python) – most flexibility, control, performance. Longer dev cycle.
  8. 15 Decision #2: Data Transfer • RPC (Millwheel, Flink, SparkStreaming)

    • All about speed • Message-forwarding broker (Heron) • Applies back-pressure, multiplex • Persistent stream storage (Samza, Kafka’s Stream API) • Most reliable • Decouples processors
  9. 17 Love Song to Scribe Independent stream processing nodes And

    storing inputs / outputs Made everything great
  10. 19 Decision #3 – Processing Semantics Facebook Verdict: It depends

    on requirements • Ranker writes to idempotent system – at least once • Scuba can lose data, but not handle duplicates – at most once • …. Exactly once is REALLY HARD and requires transactions
  11. 20 Don’t miss the side-note on side-effects • Exactly once

    means writing output + offsets to a transactional system • This takes time • Why just wait when you can deserialize? And maybe do other stateless stuff?
  12. 21 Decision #4 – State Saving • In-memory state with

    replication (Old VoltDB) • Requires lots of hardware and network • Local database (Samza, Kafka Streams API) • Remote database (Millwheel) • Upstream (i.e. replay everything on failure) • Global consistent snapshot (Flink)
  13. 23 Best Part of the Paper – by far How

    to efficiently work with state in remote DB?
  14. 24 Decision #5 - Reprocessing • Stream only – requires

    long retention in the stream store • Maintain both batch and stream systems • Develop systems that can run in streams and batch (Flink, Spark)
  15. 25 Decision #5 - Reprocessing • Stream only – requires

    long retention in the stream store • Maintain both batch and stream systems • Develop systems that can run in streams and batch (Flink, Spark) Facebook Verdict: SQL runs everywhere And binary generation FTW
  16. 27 Lessons Learned! The biggest win is pipelines composed of

    independent processors • Mixing multiple systems let us move fast • High level abstractions let us improve implementation • Ease of debugging – Independent nodes and ability to replay • Ease of deployment – Puma as-a-service • Ease of monitoring – Lag is the most important metric. Everything is instrumented out of the box. • In the future – auto-scale based on lag