Pat Helland and me: A talk about “Life Beyond Distributed Transactions: An Apostate’s Opinion”

Pat Helland and me: A talk about “Life Beyond Distributed Transactions: An Apostate’s Opinion”

In 2007, Pat Helland published “Life Beyond Distributed Transactions: An Apostate’s Opinion,” in which he conducts a thought experiment on how to design a distributed database that can scale almost infinitely. While the paper explicitly addresses distributed database design, Sean T. Allen shows that the ideas are far more widely applicable, particularly in scaling stateful applications. Sean explores some of Helland’s ideas through practical examples from his experience building data processing systems using tools like Apache Storm and, more recently, developing a stateful distributed stream processor at Wallaroo Labs.

3c53e91d2a6ceb1b7f202d709f638b1b?s=128

Sean T Allen

July 26, 2018
Tweet

Transcript

  1. PAT HELLAND AND ME A TALK ABOUT “LIFE BEYOND DISTRIBUTED

    TRANSACTIONS: AN APOSTATE’S OPINION”
  2. SEAN T. ALLEN VP OF ENGINEERING AT WALLAROO LABS MEMBER

    OF THE PONY CORE TEAM AUTHOR OF “STORM APPLIED” LOVER OF FRENCH STREET ART @SEANTALLEN @WALLAROOLABS @PONYLANG
  3. PAT HELLAND AND ME

  4. PAT HELLAND WRITER OF PAPERS I LOVE

  5. PAT HELLAND LIFE BEYOND DISTRIBUTED TRANSACTIONS

  6. WHAT’S IN THIS TALK…

  7. SOME AXIOMS…

  8. TO SCALE INFINITELY, WE HAVE TO SCALE HORIZONTALLY

  9. TO SCALE INFINITELY, WE MUST AVOID COORDINATION

  10. DISTRIBUTED TRANSACTIONS ARE A FORM OF COORDINATION

  11. THEREFORE… TO SCALE INFINITELY, WE CAN’T USE TRANSACTIONS

  12. WHAT IS SCALING?

  13. MORE AND MORE THINGS BUT, THEY DON’T GET BIGGER. THERE’S

    JUST… MORE OF THEM. LOTS MORE.
  14. WE SCALE ENTITIES ENTITIES: LIVE ON A SINGLE MACHINE AND

    ARE MANIPULATED INDIVIDUALLY
  15. WHAT IS AN ENTITY?

  16. ENTITIES ARE BOUNDARIES OF ATOMICITY

  17. Bob 6 3 5 8 Alice 4 2 7 1

  18. Bob 6 3 5 8 Alice 4 2 7 1

  19. Bob 6 3 5 8 Alice 4 7 1 2

  20. Bob 6 3 5 8 Alice 4 2 7 1

  21. Bob 6 3 5 8 Alice 4 2 7 1

  22. DENORMALIZE.. ALL THE THINGS!

  23. TWO-LAYER ARCHITECTURE

  24. scale-agnostic scale-aware API

  25. scale-agnostic scale-aware API

  26. scale-agnostic scale-aware API

  27. scale-agnostic scale-aware API

  28. scale-agnostic scale-aware API

  29. scale-agnostic scale-aware API

  30. TO SCALE INFINITELY, YOUR BUSINESS LOGIC HAS TO BE INDEPENDENT

    OF SCALE
  31. TWO BIG IDEAS A WORLD OF POSSIBILITIES

  32. WALLAROO SCALE INDEPENDENT COMPUTING FOR PYTHON

  33. ENTITIES BUT WE CALL THEM… “STATE OBJECTS”

  34. TWO-LAYER ARCHITECTURE BUT WE CALL IT… “SCALE INDEPENDENCE”

  35. user supplied logic Wallaroo runtime Wallaroo API

  36. user supplied logic Wallaroo runtime Wallaroo API

  37. user supplied logic Wallaroo runtime Wallaroo API

  38. user supplied logic Wallaroo runtime Wallaroo API

  39. WALLAROO API MARKET SPREAD EXAMPLE

  40. MARKET SPREAD REAL-TIME “SOMETHING AIN’T RIGHT” TRADE CHECKS Market Spread

    State Market Data Orders Update APPL Check MSFT Rejections
  41. MARKET SPREAD TWO SOURCES OF DATA Market Spread State Market

    Data Orders Update APPL Check MSFT Rejections
  42. MARKET SPREAD ONE SINK Market Spread State Market Data Orders

    Update APPL Check MSFT Rejections
  43. MARKET SPREAD ORDER PIPELINE Market Spread State Market Data Orders

    Update APPL Check MSFT Rejections
  44. MARKET SPREAD MARKET DATA PIPELINE Market Spread State Market Data

    Orders Update APPL Check MSFT Rejections
  45. MARKET SPREAD APPLICATION DEFINITION

  46. APPLICATION DEFINITION FLOW OF DATA FROM SOURCE TO SINK

  47. TWO DATA PIPELINES ORDERS

  48. TWO DATA PIPELINES MARKET DATA

  49. DEFINE OUR SOURCES 1 PER PIPELINE

  50. DEFINE OUR OPERATIONS 1 PER PIPELINE

  51. DEFINE OUR OPERATIONS CHECK ORDER AGAINST SYMBOL DATA

  52. DEFINE OUR OPERATIONS UPDATE SYMBOL DATA WITH LATEST MARKET DATA

  53. DEFINE OUR SINKS 1 PER PIPELINE

  54. DEFINE OUR SINKS ORDERS PIPELINE MIGHT HAVE OUTPUT

  55. DEFINE OUR SINKS MARKET DATA ONLY UPDATES SYMBOL DATA- NO

    OUTPUT
  56. SCALE INDEPENDENT ONLY FLOW OF DATA AND OPERATIONS

  57. USER SUPPLIED LOGIC

  58. UPDATE MARKET DATA STATE COMPUTATION UPDATES SYMBOL DATA STATE

  59. WALLAROO RUNTIME MESH NETWORK OF COOPERATING PROCESSES

  60. STATE OBJECTS ONE BIG MAP?

  61. STATE OBJECTS CONCEPTUALLY ITS LIKE A BIG MAP Market Data

    Update State
  62. STATE OBJECTS WITH A KEY FOR EACH OBJECT APPL IBM

    MSFT AMZN INTC NVDA Market Data Update
  63. STATE OBJECTS WHERE WE MAY FROM INCOMING DATA’S KEY APPL

    IBM MSFT AMZN INTC NVDA Market Data MSFT
  64. STATE OBJECTS TO THE STATE OBJECT FOR THAT KEY APPL

    IBM MSFT AMZN INTC NVDA Market Data MSFT
  65. HASH PARTITIONING DISTRIBUTING STATE OBJECTS ACROSS A CLUSTER

  66. SINGLE WORKER ALL SYMBOLS TOGETHER APPL AMZN MSFT IBM

  67. SINGLE WORKER ALL SYMBOLS TOGETHER APPL AMZN MSFT IBM

  68. ADD ANOTHER WORKER STATE OBJECTS WILL BE REDISTRIBUTED ACROSS THE

    CLUSTER APPL AMZN MSFT IBM
  69. ADD ANOTHER WORKER STATE OBJECTS WILL BE REDISTRIBUTED ACROSS THE

    CLUSTER APPL AMZN MSFT IBM
  70. ADD ANOTHER WORKER STATE OBJECTS WILL BE REDISTRIBUTED ACROSS THE

    CLUSTER APPL AMZN MSFT IBM
  71. ADD ANOTHER WORKER STATE OBJECTS WILL BE REDISTRIBUTED ACROSS THE

    CLUSTER APPL AMZN MSFT IBM
  72. ADD ANOTHER WORKER STATE OBJECTS WILL BE REDISTRIBUTED ACROSS THE

    CLUSTER APPL AMZN MSFT IBM
  73. ADD ANOTHER WORKER STATE OBJECTS WILL BE REDISTRIBUTED ACROSS THE

    CLUSTER APPL AMZN IBM MSFT
  74. ADD ANOTHER WORKER STATE OBJECTS WILL BE REDISTRIBUTED ACROSS THE

    CLUSTER APPL AMZN IBM MSFT
  75. STATE OBJECTS

  76. A WALLAROO STATE OBJECT PLAIN OLD PYTHON

  77. LEARN MORE GITHUB.COM/SEANTALLEN/ PAT-HELLAND-AND-ME