$30 off During Our Annual Pro Sale. View Details »

Pat Helland and me: How to build stateful distributed applications that can scale almost infinitely

Pat Helland and me: How to build stateful distributed applications that can scale almost infinitely

In 2007, Pat Helland published “Life Beyond Distributed Transactions: An Apostate’s Opinion,” in which he conducts a thought experiment on how to design a distributed database that can scale almost infinitely. While the paper explicitly addresses distributed database design, Sean Allen shows that the ideas are far more widely applicable, particularly in scaling stateful applications. Sean explores some of Helland’s ideas through practical examples from his experience building data processing systems using tools like Apache Storm and, more recently, developing a stateful distributed stream processor at Wallaroo Labs.

Sean T Allen

June 14, 2018
Tweet

More Decks by Sean T Allen

Other Decks in Technology

Transcript

  1. PAT HELLAND AND ME
    HOW TO BUILD STATEFUL DISTRIBUTED APPLICATIONS THAT CAN SCALE ALMOST INFINITELY

    View Slide

  2. PAT HELLAND
    AND ME

    View Slide

  3. SEAN T. ALLEN
    VP OF ENGINEERING AT WALLAROO LABS
    MEMBER OF THE PONY CORE TEAM
    AUTHOR OF “STORM APPLIED”
    LOVER OF FRENCH STREET ART
    @SEANTALLEN
    @WALLAROOLABS
    @PONYLANG

    View Slide

  4. DATABASES
    APPARENTLY, I LIKE TO
    STICK THEM IN THINGS…

    View Slide

  5. SOME AXIOMS…

    View Slide

  6. TO SCALE INFINITELY,
    WE HAVE TO SCALE HORIZONTALLY

    View Slide

  7. TO SCALE INFINITELY,
    WE MUST AVOID COORDINATION

    View Slide

  8. DISTRIBUTED TRANSACTIONS ARE
    A FORM OF COORDINATION

    View Slide

  9. THEREFORE…
    TO SCALE INFINITELY,
    WE CAN’T USE TRANSACTIONS

    View Slide

  10. WELCOME TO
    DISTRIBUTED
    SYSTEMS!
    O, BY THE WAY, ALL THE
    RULES HAVE CHANGED

    View Slide

  11. WHAT IS SCALING?

    View Slide

  12. MORE AND
    MORE THINGS
    BUT, THEY DON’T GET
    BIGGER. THERE’S JUST…
    MORE OF THEM. LOTS MORE.

    View Slide

  13. WE SCALE
    ENTITIES
    ENTITIES:
    LIVE ON A SINGLE MACHINE
    AND ARE MANIPULATED
    INDIVIDUALLY

    View Slide

  14. WHAT IS AN ENTITY?

    View Slide

  15. ENTITIES ARE BOUNDARIES OF
    ATOMICITY

    View Slide

  16. Bob
    6 3
    5
    8
    Alice
    4 2

    7
    1

    View Slide

  17. Bob
    6 3
    5
    8
    Alice
    4 2

    7
    1

    View Slide

  18. Bob
    6 3
    5
    8
    Alice
    4
    7
    1
    2

    View Slide

  19. Bob
    6 3
    5
    8
    Alice
    4 2

    7
    1

    View Slide

  20. Bob
    6 3
    5
    8
    Alice
    4 2

    7
    1

    View Slide

  21. DENORMALIZE..
    ALL THE THINGS!

    View Slide

  22. TWO LAYER ARCHITECTURE

    View Slide

  23. scale-agnostic
    scale-aware
    API

    View Slide

  24. scale-agnostic
    scale-aware
    API

    View Slide

  25. scale-agnostic
    scale-aware
    API

    View Slide

  26. scale-agnostic
    scale-aware
    API

    View Slide

  27. scale-agnostic
    scale-aware
    API

    View Slide

  28. scale-agnostic
    scale-aware
    API

    View Slide

  29. TO SCALE INFINITELY,
    YOUR BUSINESS LOGIC HAS TO BE
    INDEPENDENT OF SCALE

    View Slide

  30. WALLAROO
    SCALE INDEPENDENT
    COMPUTING FOR PYTHON

    View Slide

  31. AND IT’S NOT A DATABASE

    View Slide

  32. ENTITIES
    BUT WE CALL THEM…
    “STATE OBJECTS”

    View Slide

  33. TWO LAYER
    ARCHITECTURE
    BUT WE CALL IT…
    “SCALE INDEPENDENCE”

    View Slide

  34. user supplied logic
    Wallaroo runtime
    Wallaroo API

    View Slide

  35. user supplied logic
    Wallaroo runtime
    Wallaroo API

    View Slide

  36. user supplied logic
    Wallaroo runtime
    Wallaroo API

    View Slide

  37. user supplied logic
    Wallaroo runtime
    Wallaroo API

    View Slide

  38. WHAT’S HARD?
    ALL OF IT? YOU’RE BUILDING
    A DISTRIBUTED SYSTEMS
    *FRAMEWORK*

    View Slide

  39. CAP THEOREM
    CONSISTENCY VS
    AVAILABILITY…
    YOU CAN’T ESCAPE IT.

    View Slide

  40. MESSAGE
    DELIVERY
    AT-MOST-ONCE? AT-LEAST-
    ONCE? EFFECTIVELY-ONCE?
    EXACTLY-ONCE?

    View Slide

  41. MESSAGE
    ORDERING
    WILL YOU MAINTAIN THE
    ORDERING AS YOU RECEIVED
    IT?

    View Slide

  42. LOCAL
    KNOWLEDGE
    YOU HAVE TO WORK HARD
    TO AVOID COORDINATION.

    View Slide

  43. PROGRAMMING
    MODEL
    YOU CAN GET WITH THIS, OR
    YOU CAN GET WITH THAT.

    View Slide

  44. PERFORMANCE
    IT’S A WORD IN THE
    DICTIONARY

    View Slide

  45. NETWORK
    OVERHEAD
    YOU AREN’T IN LOCAL
    MEMORY ANYMORE

    View Slide

  46. DATA
    SERIALIZATION
    LE SIGH…

    View Slide

  47. VERIFICATION
    LET’S NOT GO THERE…
    THAT’S AN ENTIRE LECTURE
    SERIES.

    View Slide

  48. BTW…
    YOUR
    MULTI-CORE
    COMPUTER
    IS ALSO A DISTRIBUTED
    SYSTEM.
    BUT THAT’S A STORY FOR
    ANOTHER DAY.

    View Slide

  49. LEARN MORE
    GITHUB.COM/SEANTALLEN/
    PAT-HELLAND-AND-ME

    View Slide