$30 off During Our Annual Pro Sale. View Details »

Pat Helland and me: How to build stateful distributed applications that can scale almost infinitely - Salesforce July 2018

Pat Helland and me: How to build stateful distributed applications that can scale almost infinitely - Salesforce July 2018

In 2007, Pat Helland published “Life Beyond Distributed Transactions: An Apostate’s Opinion,” in which he conducts a thought experiment on how to design a distributed database that can scale almost infinitely. While the paper explicitly addresses distributed database design, Sean Allen shows that the ideas are far more widely applicable, particularly in scaling stateful applications. Sean explores some of Helland’s ideas through practical examples from his experience building data processing systems using tools like Apache Storm and, more recently, developing a stateful distributed stream processor at Wallaroo Labs.

Sean T Allen

July 23, 2018
Tweet

More Decks by Sean T Allen

Other Decks in Technology

Transcript

  1. PAT HELLAND AND ME
    HOW TO BUILD STATEFUL DISTRIBUTED APPLICATIONS THAT CAN SCALE ALMOST INFINITELY

    View Slide

  2. SEAN T. ALLEN
    VP OF ENGINEERING AT WALLAROO LABS
    MEMBER OF THE PONY CORE TEAM
    AUTHOR OF “STORM APPLIED”
    LOVER OF FRENCH STREET ART
    @SEANTALLEN
    @WALLAROOLABS
    @PONYLANG

    View Slide

  3. PAT HELLAND
    AND ME

    View Slide

  4. SOME AXIOMS…

    View Slide

  5. TO SCALE INFINITELY,
    WE HAVE TO SCALE HORIZONTALLY

    View Slide

  6. TO SCALE INFINITELY,
    WE MUST AVOID COORDINATION

    View Slide

  7. DISTRIBUTED TRANSACTIONS ARE
    A FORM OF COORDINATION

    View Slide

  8. THEREFORE…
    TO SCALE INFINITELY,
    WE CAN’T USE TRANSACTIONS

    View Slide

  9. WHAT IS SCALING?

    View Slide

  10. MORE AND
    MORE THINGS
    BUT, THEY DON’T GET
    BIGGER. THERE’S JUST…
    MORE OF THEM. LOTS MORE.

    View Slide

  11. WE SCALE
    ENTITIES
    ENTITIES:
    LIVE ON A SINGLE MACHINE
    AND ARE MANIPULATED
    INDIVIDUALLY

    View Slide

  12. WHAT IS AN ENTITY?

    View Slide

  13. ENTITIES ARE BOUNDARIES OF
    ATOMICITY

    View Slide

  14. Bob
    6 3
    5
    8
    Alice
    4 2

    7
    1

    View Slide

  15. Bob
    6 3
    5
    8
    Alice
    4 2

    7
    1

    View Slide

  16. Bob
    6 3
    5
    8
    Alice
    4
    7
    1
    2

    View Slide

  17. Bob
    6 3
    5
    8
    Alice
    4 2

    7
    1

    View Slide

  18. Bob
    6 3
    5
    8
    Alice
    4 2

    7
    1

    View Slide

  19. DENORMALIZE..
    ALL THE THINGS!

    View Slide

  20. TWO-LAYER ARCHITECTURE

    View Slide

  21. scale-agnostic
    scale-aware
    API

    View Slide

  22. scale-agnostic
    scale-aware
    API

    View Slide

  23. scale-agnostic
    scale-aware
    API

    View Slide

  24. scale-agnostic
    scale-aware
    API

    View Slide

  25. scale-agnostic
    scale-aware
    API

    View Slide

  26. scale-agnostic
    scale-aware
    API

    View Slide

  27. TO SCALE INFINITELY,
    YOUR BUSINESS LOGIC HAS TO BE
    INDEPENDENT OF SCALE

    View Slide

  28. WALLAROO
    SCALE INDEPENDENT
    COMPUTING FOR PYTHON

    View Slide

  29. ENTITIES
    BUT WE CALL THEM…
    “STATE OBJECTS”

    View Slide

  30. TWO-LAYER
    ARCHITECTURE
    BUT WE CALL IT…
    “SCALE INDEPENDENCE”

    View Slide

  31. user supplied logic
    Wallaroo runtime
    Wallaroo API

    View Slide

  32. user supplied logic
    Wallaroo runtime
    Wallaroo API

    View Slide

  33. user supplied logic
    Wallaroo runtime
    Wallaroo API

    View Slide

  34. user supplied logic
    Wallaroo runtime
    Wallaroo API

    View Slide

  35. WALLAROO API
    MARKET SPREAD EXAMPLE

    View Slide

  36. MARKET SPREAD
    REAL-TIME “SOMETHING AIN’T RIGHT” TRADE CHECKS
    Market Spread
    State
    Market
    Data
    Orders
    Update
    APPL
    Check
    MSFT
    Rejections

    View Slide

  37. MARKET SPREAD
    TWO SOURCES OF DATA
    Market Spread
    State
    Market
    Data
    Orders
    Update
    APPL
    Check
    MSFT
    Rejections

    View Slide

  38. MARKET SPREAD
    ONE SINK
    Market Spread
    State
    Market
    Data
    Orders
    Update
    APPL
    Check
    MSFT
    Rejections

    View Slide

  39. MARKET SPREAD
    ORDER PIPELINE
    Market Spread
    State
    Market
    Data
    Orders
    Update
    APPL
    Check
    MSFT
    Rejections

    View Slide

  40. MARKET SPREAD
    MARKET DATA PIPELINE
    Market Spread
    State
    Market
    Data
    Orders
    Update
    APPL
    Check
    MSFT
    Rejections

    View Slide

  41. MARKET SPREAD
    APPLICATION DEFINITION

    View Slide

  42. APPLICATION DEFINITION
    FLOW OF DATA FROM SOURCE TO SINK

    View Slide

  43. TWO DATA PIPELINES
    ORDERS

    View Slide

  44. TWO DATA PIPELINES
    MARKET DATA

    View Slide

  45. DEFINE OUR SOURCES
    1 PER PIPELINE

    View Slide

  46. DEFINE OUR OPERATIONS
    1 PER PIPELINE

    View Slide

  47. DEFINE OUR OPERATIONS
    CHECK ORDER AGAINST SYMBOL DATA

    View Slide

  48. DEFINE OUR OPERATIONS
    UPDATE SYMBOL DATA WITH LATEST MARKET DATA

    View Slide

  49. DEFINE OUR SINKS
    1 PER PIPELINE

    View Slide

  50. DEFINE OUR SINKS
    ORDERS PIPELINE MIGHT HAVE OUTPUT

    View Slide

  51. DEFINE OUR SINKS
    MARKET DATA ONLY UPDATES SYMBOL DATA- NO OUTPUT

    View Slide

  52. SCALE INDEPENDENT
    ONLY FLOW OF DATA AND OPERATIONS

    View Slide

  53. USER SUPPLIED
    LOGIC

    View Slide

  54. UPDATE MARKET DATA STATE COMPUTATION
    UPDATES SYMBOL DATA STATE

    View Slide

  55. WALLAROO
    RUNTIME
    MESH NETWORK OF
    COOPERATING PROCESSES

    View Slide

  56. STATE OBJECTS
    ONE BIG MAP?

    View Slide

  57. STATE OBJECTS
    CONCEPTUALLY ITS LIKE A BIG MAP
    Market
    Data Update State

    View Slide

  58. STATE OBJECTS
    WITH A KEY FOR EACH OBJECT
    APPL IBM
    MSFT AMZN
    INTC NVDA
    Market
    Data Update

    View Slide

  59. STATE OBJECTS
    WHERE WE MAY FROM INCOMING DATA’S KEY
    APPL IBM
    MSFT AMZN
    INTC NVDA
    Market
    Data MSFT

    View Slide

  60. STATE OBJECTS
    TO THE STATE OBJECT FOR THAT KEY
    APPL IBM
    MSFT AMZN
    INTC NVDA
    Market
    Data MSFT

    View Slide

  61. HASH
    PARTITIONING
    DISTRIBUTING STATE
    OBJECTS ACROSS A
    CLUSTER

    View Slide

  62. SINGLE WORKER
    ALL SYMBOLS TOGETHER
    APPL
    AMZN
    MSFT
    IBM

    View Slide

  63. SINGLE WORKER
    ALL SYMBOLS TOGETHER
    APPL
    AMZN
    MSFT
    IBM

    View Slide

  64. ADD ANOTHER WORKER
    STATE OBJECTS WILL BE REDISTRIBUTED ACROSS THE CLUSTER
    APPL
    AMZN
    MSFT
    IBM

    View Slide

  65. ADD ANOTHER WORKER
    STATE OBJECTS WILL BE REDISTRIBUTED ACROSS THE CLUSTER
    APPL
    AMZN
    MSFT
    IBM

    View Slide

  66. ADD ANOTHER WORKER
    STATE OBJECTS WILL BE REDISTRIBUTED ACROSS THE CLUSTER
    APPL
    AMZN
    MSFT
    IBM

    View Slide

  67. ADD ANOTHER WORKER
    STATE OBJECTS WILL BE REDISTRIBUTED ACROSS THE CLUSTER
    APPL
    AMZN
    MSFT
    IBM

    View Slide

  68. ADD ANOTHER WORKER
    STATE OBJECTS WILL BE REDISTRIBUTED ACROSS THE CLUSTER
    APPL
    AMZN
    MSFT
    IBM

    View Slide

  69. ADD ANOTHER WORKER
    STATE OBJECTS WILL BE REDISTRIBUTED ACROSS THE CLUSTER
    APPL
    AMZN
    IBM
    MSFT

    View Slide

  70. ADD ANOTHER WORKER
    STATE OBJECTS WILL BE REDISTRIBUTED ACROSS THE CLUSTER
    APPL
    AMZN
    IBM
    MSFT

    View Slide

  71. DATA MODEL
    YOU CAN GET WITH THIS, OR
    YOU CAN GET WITH THAT.

    View Slide

  72. A WALLAROO STATE OBJECT
    PLAIN OLD PYTHON

    View Slide

  73. PERFORMANCE
    THERE’S NO REAL WORLD
    SCALING WITHOUT IT

    View Slide

  74. LEARN MORE
    GITHUB.COM/SEANTALLEN/
    PAT-HELLAND-AND-ME

    View Slide