Pat Helland and me: How to build stateful distributed applications that can scale almost infinitely - Salesforce July 2018

Pat Helland and me: How to build stateful distributed applications that can scale almost infinitely - Salesforce July 2018

In 2007, Pat Helland published “Life Beyond Distributed Transactions: An Apostate’s Opinion,” in which he conducts a thought experiment on how to design a distributed database that can scale almost infinitely. While the paper explicitly addresses distributed database design, Sean Allen shows that the ideas are far more widely applicable, particularly in scaling stateful applications. Sean explores some of Helland’s ideas through practical examples from his experience building data processing systems using tools like Apache Storm and, more recently, developing a stateful distributed stream processor at Wallaroo Labs.

3c53e91d2a6ceb1b7f202d709f638b1b?s=128

Sean T Allen

July 23, 2018
Tweet

Transcript

  1. PAT HELLAND AND ME HOW TO BUILD STATEFUL DISTRIBUTED APPLICATIONS

    THAT CAN SCALE ALMOST INFINITELY
  2. SEAN T. ALLEN VP OF ENGINEERING AT WALLAROO LABS MEMBER

    OF THE PONY CORE TEAM AUTHOR OF “STORM APPLIED” LOVER OF FRENCH STREET ART @SEANTALLEN @WALLAROOLABS @PONYLANG
  3. PAT HELLAND AND ME

  4. SOME AXIOMS…

  5. TO SCALE INFINITELY, WE HAVE TO SCALE HORIZONTALLY

  6. TO SCALE INFINITELY, WE MUST AVOID COORDINATION

  7. DISTRIBUTED TRANSACTIONS ARE A FORM OF COORDINATION

  8. THEREFORE… TO SCALE INFINITELY, WE CAN’T USE TRANSACTIONS

  9. WHAT IS SCALING?

  10. MORE AND MORE THINGS BUT, THEY DON’T GET BIGGER. THERE’S

    JUST… MORE OF THEM. LOTS MORE.
  11. WE SCALE ENTITIES ENTITIES: LIVE ON A SINGLE MACHINE AND

    ARE MANIPULATED INDIVIDUALLY
  12. WHAT IS AN ENTITY?

  13. ENTITIES ARE BOUNDARIES OF ATOMICITY

  14. Bob 6 3 5 8 Alice 4 2 7 1

  15. Bob 6 3 5 8 Alice 4 2 7 1

  16. Bob 6 3 5 8 Alice 4 7 1 2

  17. Bob 6 3 5 8 Alice 4 2 7 1

  18. Bob 6 3 5 8 Alice 4 2 7 1

  19. DENORMALIZE.. ALL THE THINGS!

  20. TWO-LAYER ARCHITECTURE

  21. scale-agnostic scale-aware API

  22. scale-agnostic scale-aware API

  23. scale-agnostic scale-aware API

  24. scale-agnostic scale-aware API

  25. scale-agnostic scale-aware API

  26. scale-agnostic scale-aware API

  27. TO SCALE INFINITELY, YOUR BUSINESS LOGIC HAS TO BE INDEPENDENT

    OF SCALE
  28. WALLAROO SCALE INDEPENDENT COMPUTING FOR PYTHON

  29. ENTITIES BUT WE CALL THEM… “STATE OBJECTS”

  30. TWO-LAYER ARCHITECTURE BUT WE CALL IT… “SCALE INDEPENDENCE”

  31. user supplied logic Wallaroo runtime Wallaroo API

  32. user supplied logic Wallaroo runtime Wallaroo API

  33. user supplied logic Wallaroo runtime Wallaroo API

  34. user supplied logic Wallaroo runtime Wallaroo API

  35. WALLAROO API MARKET SPREAD EXAMPLE

  36. MARKET SPREAD REAL-TIME “SOMETHING AIN’T RIGHT” TRADE CHECKS Market Spread

    State Market Data Orders Update APPL Check MSFT Rejections
  37. MARKET SPREAD TWO SOURCES OF DATA Market Spread State Market

    Data Orders Update APPL Check MSFT Rejections
  38. MARKET SPREAD ONE SINK Market Spread State Market Data Orders

    Update APPL Check MSFT Rejections
  39. MARKET SPREAD ORDER PIPELINE Market Spread State Market Data Orders

    Update APPL Check MSFT Rejections
  40. MARKET SPREAD MARKET DATA PIPELINE Market Spread State Market Data

    Orders Update APPL Check MSFT Rejections
  41. MARKET SPREAD APPLICATION DEFINITION

  42. APPLICATION DEFINITION FLOW OF DATA FROM SOURCE TO SINK

  43. TWO DATA PIPELINES ORDERS

  44. TWO DATA PIPELINES MARKET DATA

  45. DEFINE OUR SOURCES 1 PER PIPELINE

  46. DEFINE OUR OPERATIONS 1 PER PIPELINE

  47. DEFINE OUR OPERATIONS CHECK ORDER AGAINST SYMBOL DATA

  48. DEFINE OUR OPERATIONS UPDATE SYMBOL DATA WITH LATEST MARKET DATA

  49. DEFINE OUR SINKS 1 PER PIPELINE

  50. DEFINE OUR SINKS ORDERS PIPELINE MIGHT HAVE OUTPUT

  51. DEFINE OUR SINKS MARKET DATA ONLY UPDATES SYMBOL DATA- NO

    OUTPUT
  52. SCALE INDEPENDENT ONLY FLOW OF DATA AND OPERATIONS

  53. USER SUPPLIED LOGIC

  54. UPDATE MARKET DATA STATE COMPUTATION UPDATES SYMBOL DATA STATE

  55. WALLAROO RUNTIME MESH NETWORK OF COOPERATING PROCESSES

  56. STATE OBJECTS ONE BIG MAP?

  57. STATE OBJECTS CONCEPTUALLY ITS LIKE A BIG MAP Market Data

    Update State
  58. STATE OBJECTS WITH A KEY FOR EACH OBJECT APPL IBM

    MSFT AMZN INTC NVDA Market Data Update
  59. STATE OBJECTS WHERE WE MAY FROM INCOMING DATA’S KEY APPL

    IBM MSFT AMZN INTC NVDA Market Data MSFT
  60. STATE OBJECTS TO THE STATE OBJECT FOR THAT KEY APPL

    IBM MSFT AMZN INTC NVDA Market Data MSFT
  61. HASH PARTITIONING DISTRIBUTING STATE OBJECTS ACROSS A CLUSTER

  62. SINGLE WORKER ALL SYMBOLS TOGETHER APPL AMZN MSFT IBM

  63. SINGLE WORKER ALL SYMBOLS TOGETHER APPL AMZN MSFT IBM

  64. ADD ANOTHER WORKER STATE OBJECTS WILL BE REDISTRIBUTED ACROSS THE

    CLUSTER APPL AMZN MSFT IBM
  65. ADD ANOTHER WORKER STATE OBJECTS WILL BE REDISTRIBUTED ACROSS THE

    CLUSTER APPL AMZN MSFT IBM
  66. ADD ANOTHER WORKER STATE OBJECTS WILL BE REDISTRIBUTED ACROSS THE

    CLUSTER APPL AMZN MSFT IBM
  67. ADD ANOTHER WORKER STATE OBJECTS WILL BE REDISTRIBUTED ACROSS THE

    CLUSTER APPL AMZN MSFT IBM
  68. ADD ANOTHER WORKER STATE OBJECTS WILL BE REDISTRIBUTED ACROSS THE

    CLUSTER APPL AMZN MSFT IBM
  69. ADD ANOTHER WORKER STATE OBJECTS WILL BE REDISTRIBUTED ACROSS THE

    CLUSTER APPL AMZN IBM MSFT
  70. ADD ANOTHER WORKER STATE OBJECTS WILL BE REDISTRIBUTED ACROSS THE

    CLUSTER APPL AMZN IBM MSFT
  71. DATA MODEL YOU CAN GET WITH THIS, OR YOU CAN

    GET WITH THAT.
  72. A WALLAROO STATE OBJECT PLAIN OLD PYTHON

  73. PERFORMANCE THERE’S NO REAL WORLD SCALING WITHOUT IT

  74. LEARN MORE GITHUB.COM/SEANTALLEN/ PAT-HELLAND-AND-ME