$30 off During Our Annual Pro Sale. View Details »

Pat Helland and Me: How to Build Stateful Distributed Applications That Can Scale Almost Infinitely - Velocity NY October 2018

Pat Helland and Me: How to Build Stateful Distributed Applications That Can Scale Almost Infinitely - Velocity NY October 2018

In 2007, Pat Helland published “Life Beyond Distributed Transactions: An Apostate’s Opinion,” in which he conducts a thought experiment on how to design a distributed database that can scale almost infinitely. While the paper explicitly addresses distributed database design, Sean Allen shows that the ideas are far more widely applicable, particularly in scaling stateful applications. Sean explores some of Helland’s ideas through practical examples from his experience building data processing systems using tools like Apache Storm and, more recently, developing a stateful distributed stream processor at Wallaroo Labs.

Sean T Allen

October 03, 2018
Tweet

More Decks by Sean T Allen

Other Decks in Technology

Transcript

  1. PAT HELLAND AND ME
    HOW TO BUILD STATEFUL DISTRIBUTED APPLICATIONS THAT CAN SCALE ALMOST INFINITELY

    View Slide

  2. SEAN T. ALLEN
    VP OF ENGINEERING AT WALLAROO LABS
    MEMBER OF THE PONY CORE TEAM
    AUTHOR OF “STORM APPLIED”
    LOVER OF FRENCH STREET ART
    @SEANTALLEN
    @WALLAROOLABS
    @PONYLANG

    View Slide

  3. PAT HELLAND
    AND ME

    View Slide

  4. PAT HELLAND
    WRITER OF PAPERS I LOVE

    View Slide

  5. PAT HELLAND
    LIFE BEYOND DISTRIBUTED
    TRANSACTIONS

    View Slide

  6. WHAT’S IN THIS TALK…

    View Slide

  7. WHAT IS SCALING?

    View Slide

  8. MORE AND MORE
    THINGS
    BUT, THEY DON’T GET BIGGER.
    THERE’S JUST…
    MORE OF THEM. LOTS MORE.

    View Slide

  9. SOME AXIOMS…

    View Slide

  10. TO SCALE INFINITELY,
    WE HAVE TO SCALE HORIZONTALLY

    View Slide

  11. TO SCALE INFINITELY,
    WE MUST AVOID COORDINATION

    View Slide

  12. DISTRIBUTED TRANSACTIONS ARE
    A FORM OF COORDINATION

    View Slide

  13. THEREFORE…
    TO SCALE INFINITELY,
    WE CAN’T USE TRANSACTIONS

    View Slide

  14. WE SCALE ENTITIES
    ENTITIES:
    LIVE ON A SINGLE MACHINE
    AND ARE MANIPULATED INDIVIDUALLY

    View Slide

  15. WHAT IS AN ENTITY?

    View Slide

  16. ENTITIES ARE BOUNDARIES OF ATOMICITY

    View Slide

  17. Bob
    6 3
    5
    8
    Alice
    4 2

    7
    1

    View Slide

  18. Bob
    6 3
    5
    8
    Alice
    4 2

    7
    1

    View Slide

  19. Bob
    6 3
    5
    8
    Alice
    4
    7
    1
    2

    View Slide

  20. Bob
    6 3
    5
    8
    Alice
    4 2

    7
    1

    View Slide

  21. Bob
    6 3
    5
    8
    Alice
    4 2

    7
    1

    View Slide

  22. DENORMALIZE..
    ALL THE THINGS!

    View Slide

  23. TWO-LAYER ARCHITECTURE

    View Slide

  24. scale-agnostic
    scale-aware
    API

    View Slide

  25. scale-agnostic
    scale-aware
    API

    View Slide

  26. scale-agnostic
    scale-aware
    API

    View Slide

  27. scale-agnostic
    scale-aware
    API

    View Slide

  28. scale-agnostic
    scale-aware
    API

    View Slide

  29. scale-agnostic
    scale-aware
    API

    View Slide

  30. TO SCALE INFINITELY,
    YOUR BUSINESS LOGIC HAS TO BE
    INDEPENDENT OF SCALE

    View Slide

  31. TWO BIG IDEAS
    A WORLD OF POSSIBILITIES

    View Slide

  32. WALLAROO
    SCALE INDEPENDENT COMPUTING
    FOR PYTHON

    View Slide

  33. ENTITIES
    BUT WE CALL THEM…
    “STATE OBJECTS”

    View Slide

  34. TWO-LAYER
    ARCHITECTURE
    BUT WE CALL IT…
    “SCALE INDEPENDENCE”

    View Slide

  35. user supplied logic
    Wallaroo runtime
    Wallaroo API

    View Slide

  36. user supplied logic
    Wallaroo runtime
    Wallaroo API

    View Slide

  37. user supplied logic
    Wallaroo runtime
    Wallaroo API

    View Slide

  38. user supplied logic
    Wallaroo runtime
    Wallaroo API

    View Slide

  39. WALLAROO API
    MARKET SPREAD EXAMPLE

    View Slide

  40. MARKET SPREAD
    REAL-TIME “SOMETHING AIN’T RIGHT” TRADE CHECKS
    Market Spread
    State
    Market
    Data
    Orders
    Update
    APPL
    Check
    MSFT
    Rejections

    View Slide

  41. MARKET SPREAD
    TWO SOURCES OF DATA
    Market Spread
    State
    Market
    Data
    Orders
    Update
    APPL
    Check
    MSFT
    Rejections

    View Slide

  42. MARKET SPREAD
    ONE SINK
    Market Spread
    State
    Market
    Data
    Orders
    Update
    APPL
    Check
    MSFT
    Rejections

    View Slide

  43. MARKET SPREAD
    ORDER PIPELINE
    Market Spread
    State
    Market
    Data
    Orders
    Update
    APPL
    Check
    MSFT
    Rejections

    View Slide

  44. MARKET SPREAD
    MARKET DATA PIPELINE
    Market Spread
    State
    Market
    Data
    Orders
    Update
    APPL
    Check
    MSFT
    Rejections

    View Slide

  45. MARKET SPREAD
    APPLICATION DEFINITION

    View Slide

  46. APPLICATION DEFINITION
    FLOW OF DATA FROM SOURCE TO SINK

    View Slide

  47. TWO DATA PIPELINES
    ORDERS

    View Slide

  48. TWO DATA PIPELINES
    MARKET DATA

    View Slide

  49. DEFINE OUR SOURCES
    1 PER PIPELINE

    View Slide

  50. DEFINE OUR OPERATIONS
    1 PER PIPELINE

    View Slide

  51. DEFINE OUR OPERATIONS
    CHECK ORDER AGAINST SYMBOL DATA

    View Slide

  52. DEFINE OUR OPERATIONS
    UPDATE SYMBOL DATA WITH LATEST MARKET DATA

    View Slide

  53. DEFINE OUR SINKS
    1 PER PIPELINE

    View Slide

  54. DEFINE OUR SINKS
    ORDERS PIPELINE MIGHT HAVE OUTPUT

    View Slide

  55. DEFINE OUR SINKS
    MARKET DATA ONLY UPDATES SYMBOL DATA- NO OUTPUT

    View Slide

  56. SCALE INDEPENDENT
    ONLY FLOW OF DATA AND OPERATIONS

    View Slide

  57. USER SUPPLIED
    LOGIC

    View Slide

  58. UPDATE MARKET DATA STATE COMPUTATION
    UPDATES SYMBOL DATA STATE

    View Slide

  59. WALLAROO
    RUNTIME
    MESH NETWORK OF COOPERATING
    PROCESSES

    View Slide

  60. STATE OBJECTS
    ONE BIG MAP?

    View Slide

  61. STATE OBJECTS
    CONCEPTUALLY ITS LIKE A BIG MAP
    Market
    Data Update State

    View Slide

  62. STATE OBJECTS
    WITH A KEY FOR EACH OBJECT
    APPL IBM
    MSFT AMZN
    INTC NVDA
    Market
    Data Update

    View Slide

  63. STATE OBJECTS
    WHERE WE MAY FROM INCOMING DATA’S KEY
    APPL IBM
    MSFT AMZN
    INTC NVDA
    Market
    Data MSFT

    View Slide

  64. STATE OBJECTS
    TO THE STATE OBJECT FOR THAT KEY
    APPL IBM
    MSFT AMZN
    INTC NVDA
    Market
    Data MSFT

    View Slide

  65. HASH PARTITIONING
    DISTRIBUTING STATE OBJECTS
    ACROSS A CLUSTER

    View Slide

  66. SINGLE WORKER
    ALL SYMBOLS TOGETHER
    APPL
    AMZN
    MSFT
    IBM

    View Slide

  67. SINGLE WORKER
    ALL SYMBOLS TOGETHER
    APPL
    AMZN
    MSFT
    IBM

    View Slide

  68. ADD ANOTHER WORKER
    STATE OBJECTS WILL BE REDISTRIBUTED ACROSS THE CLUSTER
    APPL
    AMZN
    MSFT
    IBM

    View Slide

  69. ADD ANOTHER WORKER
    STATE OBJECTS WILL BE REDISTRIBUTED ACROSS THE CLUSTER
    APPL
    AMZN
    MSFT
    IBM

    View Slide

  70. ADD ANOTHER WORKER
    STATE OBJECTS WILL BE REDISTRIBUTED ACROSS THE CLUSTER
    APPL
    AMZN
    MSFT
    IBM

    View Slide

  71. ADD ANOTHER WORKER
    STATE OBJECTS WILL BE REDISTRIBUTED ACROSS THE CLUSTER
    APPL
    AMZN
    MSFT
    IBM

    View Slide

  72. ADD ANOTHER WORKER
    STATE OBJECTS WILL BE REDISTRIBUTED ACROSS THE CLUSTER
    APPL
    AMZN
    MSFT
    IBM

    View Slide

  73. ADD ANOTHER WORKER
    STATE OBJECTS WILL BE REDISTRIBUTED ACROSS THE CLUSTER
    APPL
    AMZN
    IBM
    MSFT

    View Slide

  74. ADD ANOTHER WORKER
    STATE OBJECTS WILL BE REDISTRIBUTED ACROSS THE CLUSTER
    APPL
    AMZN
    IBM
    MSFT

    View Slide

  75. STATE OBJECTS

    View Slide

  76. A WALLAROO STATE OBJECT
    PLAIN OLD PYTHON

    View Slide

  77. LEARN MORE
    GITHUB.COM/SEANTALLEN/
    PAT-HELLAND-AND-ME

    View Slide