Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Pat Helland and me: How to build stateful distributed applications that can scale almost infinitely

Pat Helland and me: How to build stateful distributed applications that can scale almost infinitely

In 2007, Pat Helland published “Life Beyond Distributed Transactions: An Apostate’s Opinion,” in which he conducts a thought experiment on how to design a distributed database that can scale almost infinitely. While the paper explicitly addresses distributed database design, Sean Allen shows that the ideas are far more widely applicable, particularly in scaling stateful applications. Sean explores some of Helland’s ideas through practical examples from his experience building data processing systems using tools like Apache Storm and, more recently, developing a stateful distributed stream processor at Wallaroo Labs.

3c53e91d2a6ceb1b7f202d709f638b1b?s=128

Sean T Allen

June 14, 2018
Tweet

Transcript

  1. PAT HELLAND AND ME HOW TO BUILD STATEFUL DISTRIBUTED APPLICATIONS

    THAT CAN SCALE ALMOST INFINITELY
  2. PAT HELLAND AND ME

  3. SEAN T. ALLEN VP OF ENGINEERING AT WALLAROO LABS MEMBER

    OF THE PONY CORE TEAM AUTHOR OF “STORM APPLIED” LOVER OF FRENCH STREET ART @SEANTALLEN @WALLAROOLABS @PONYLANG
  4. DATABASES APPARENTLY, I LIKE TO STICK THEM IN THINGS…

  5. SOME AXIOMS…

  6. TO SCALE INFINITELY, WE HAVE TO SCALE HORIZONTALLY

  7. TO SCALE INFINITELY, WE MUST AVOID COORDINATION

  8. DISTRIBUTED TRANSACTIONS ARE A FORM OF COORDINATION

  9. THEREFORE… TO SCALE INFINITELY, WE CAN’T USE TRANSACTIONS

  10. WELCOME TO DISTRIBUTED SYSTEMS! O, BY THE WAY, ALL THE

    RULES HAVE CHANGED
  11. WHAT IS SCALING?

  12. MORE AND MORE THINGS BUT, THEY DON’T GET BIGGER. THERE’S

    JUST… MORE OF THEM. LOTS MORE.
  13. WE SCALE ENTITIES ENTITIES: LIVE ON A SINGLE MACHINE AND

    ARE MANIPULATED INDIVIDUALLY
  14. WHAT IS AN ENTITY?

  15. ENTITIES ARE BOUNDARIES OF ATOMICITY

  16. Bob 6 3 5 8 Alice 4 2 7 1

  17. Bob 6 3 5 8 Alice 4 2 7 1

  18. Bob 6 3 5 8 Alice 4 7 1 2

  19. Bob 6 3 5 8 Alice 4 2 7 1

  20. Bob 6 3 5 8 Alice 4 2 7 1

  21. DENORMALIZE.. ALL THE THINGS!

  22. TWO LAYER ARCHITECTURE

  23. scale-agnostic scale-aware API

  24. scale-agnostic scale-aware API

  25. scale-agnostic scale-aware API

  26. scale-agnostic scale-aware API

  27. scale-agnostic scale-aware API

  28. scale-agnostic scale-aware API

  29. TO SCALE INFINITELY, YOUR BUSINESS LOGIC HAS TO BE INDEPENDENT

    OF SCALE
  30. WALLAROO SCALE INDEPENDENT COMPUTING FOR PYTHON

  31. AND IT’S NOT A DATABASE

  32. ENTITIES BUT WE CALL THEM… “STATE OBJECTS”

  33. TWO LAYER ARCHITECTURE BUT WE CALL IT… “SCALE INDEPENDENCE”

  34. user supplied logic Wallaroo runtime Wallaroo API

  35. user supplied logic Wallaroo runtime Wallaroo API

  36. user supplied logic Wallaroo runtime Wallaroo API

  37. user supplied logic Wallaroo runtime Wallaroo API

  38. WHAT’S HARD? ALL OF IT? YOU’RE BUILDING A DISTRIBUTED SYSTEMS

    *FRAMEWORK*
  39. CAP THEOREM CONSISTENCY VS AVAILABILITY… YOU CAN’T ESCAPE IT.

  40. MESSAGE DELIVERY AT-MOST-ONCE? AT-LEAST- ONCE? EFFECTIVELY-ONCE? EXACTLY-ONCE?

  41. MESSAGE ORDERING WILL YOU MAINTAIN THE ORDERING AS YOU RECEIVED

    IT?
  42. LOCAL KNOWLEDGE YOU HAVE TO WORK HARD TO AVOID COORDINATION.

  43. PROGRAMMING MODEL YOU CAN GET WITH THIS, OR YOU CAN

    GET WITH THAT.
  44. PERFORMANCE IT’S A WORD IN THE DICTIONARY

  45. NETWORK OVERHEAD YOU AREN’T IN LOCAL MEMORY ANYMORE

  46. DATA SERIALIZATION LE SIGH…

  47. VERIFICATION LET’S NOT GO THERE… THAT’S AN ENTIRE LECTURE SERIES.

  48. BTW… YOUR MULTI-CORE COMPUTER IS ALSO A DISTRIBUTED SYSTEM. BUT

    THAT’S A STORY FOR ANOTHER DAY.
  49. LEARN MORE GITHUB.COM/SEANTALLEN/ PAT-HELLAND-AND-ME