Upgrade to Pro — share decks privately, control downloads, hide ads and more …

designing for concurrency with riak

designing for concurrency with riak

Mathias Meyer

May 29, 2012
Tweet

More Decks by Mathias Meyer

Other Decks in Programming

Transcript

  1. designing for
    concurrency
    with riak
    nosql matters
    mathias meyer, @roidrage

    View full-size slide

  2. http://riakhandbook.com

    View full-size slide

  3. design for concurrency?

    View full-size slide

  4. design data for concurrency

    View full-size slide

  5. data starts out simple

    View full-size slide

  6. single source of truth

    View full-size slide

  7. always consistent

    View full-size slide

  8. mostly consistent

    View full-size slide

  9. increase number of sources

    View full-size slide

  10. ID Username Email
    1 roidrage [email protected]
    2 thomas [email protected]
    3 karen [email protected]
    ID Username Email
    1 roidrage [email protected]
    2 thomas [email protected]
    3 karen [email protected]

    View full-size slide

  11. ID Username Email
    1 roidrage [email protected]
    2 thomas [email protected]
    3 karen [email protected]
    ID Username Email
    1 roidrage [email protected]
    2 thomas [email protected]
    3 karen [email protected]
    ID Username Email
    1 roidrage [email protected]
    2 thomas [email protected]
    3 karen [email protected]

    View full-size slide

  12. ID Username Email
    1 roidrage [email protected]
    2 thomas [email protected]
    3 karen [email protected]
    ID Username Email
    1 roidrage [email protected]
    2 thomas [email protected]
    3 karen [email protected]
    ID Username Email
    1 roidrage [email protected]
    2 thomas [email protected]
    3 karen [email protected]
    ID Username Email
    1 roidrage [email protected]
    2 thomas [email protected]
    3 karen [email protected]
    ID Username Email
    1 roidrage [email protected]
    2 thomas [email protected]
    3 karen [email protected]
    ID Username Email
    1 roidrage [email protected]
    2 thomas [email protected]
    3 karen [email protected]

    View full-size slide

  13. eventual consistency*
    * if no new updates are made to the object, eventually all accesses will return the last updated value.
    werner vogels, 2008, http://queue.acm.org/detail.cfm?id=1466448

    View full-size slide

  14. multiple clients

    View full-size slide

  15. ID Username Email
    1 roidrage [email protected]
    2 thomas [email protected]
    3 karen [email protected]
    ID Username Email
    1 roidrage [email protected]
    2 thomas [email protected]
    3 karen [email protected]
    Client 1
    Client 2
    PUT
    PUT

    View full-size slide

  16. conflicting writes

    View full-size slide

  17. data diverges

    View full-size slide

  18. the challenge

    View full-size slide

  19. determine the winner

    View full-size slide

  20. determine order

    View full-size slide

  21. designing data
    for concurrency

    View full-size slide

  22. designing data
    for non-monotonic writes

    View full-size slide

  23. no atomicity in riak

    View full-size slide

  24. no coordination

    View full-size slide

  25. all state is in the data

    View full-size slide

  26. (eventual) consistency and
    logical monoticity
    * hellerstein: the declarative imperative: experiences and conjectures in distributed logic (2010)

    View full-size slide

  27. designing data
    with conflicts in mind

    View full-size slide

  28. write now, converge later

    View full-size slide

  29. rethink the data structures

    View full-size slide

  30. ID Username Email
    1 roidrage [email protected]
    {
       "id":  1,
       "username":  "roidrage",
       "email":  "[email protected]"
    }

    View full-size slide

  31. track updates

    View full-size slide

  32. {
       "id":  1,
       "username":  "roidrage",
       "email":  "[email protected]"
       "changes":  [
           {
               "client":  "client-­‐1",
               "timestamp":  1337001337,
               "updates":  [
                   "firstname":  "Mathias",
                   "lastname":  "Meyer"
               ]
           }
       ]
    }

    View full-size slide

  33. {
       "id":  1,
       "username":  "roidrage",
       "email":  "[email protected]"
       "changes":  [
     {
               "client":  "client-­‐2",
               "timestamp":  1337001337,
               "updates":  [
                   "email":  "[email protected]"
               ]
           }
       ]
    }

    View full-size slide

  34. apply all updates
    ordered by time

    View full-size slide

  35. what about removing data?

    View full-size slide

  36. {
       "id":  1,
       "username":  "roidrage",
       "email":  "[email protected]"
       "changes":  [{
           "client":  "client-­‐1",
           "timestamp":  1337001337,
           "updates":  [
     {
                   "_op":  "delete",
                   "attribute":  "email"
               }
           ]
       }]
    }

    View full-size slide

  37. {
       "id":  1,
       "username":  "roidrage",
       "email":  "[email protected]"
       "changes":  [{
           "client":  "client-­‐2",
           "timestamp":  1337001337,
           "updates":  [
               {
                   "_op":  "add",
                   "attribute":  "email",
                   "value":  "[email protected]"
               }
           ]
       }]
    }

    View full-size slide

  38. keep a changelog

    View full-size slide

  39. client converges data

    View full-size slide

  40. time as a means of ordering*
    * leslie lamport, et. al.: time, clocks and the ordering of events in a distributed system (1977)

    View full-size slide

  41. time is not a guarantee
    for uniqueness

    View full-size slide

  42. vector clocks?

    View full-size slide

  43. {
       "id":  1,
       "username":  "roidrage",
       "email":  "[email protected]"
       "changes":  [{
           "id":  "ca0cb932-­‐a74e-­‐11e1-­‐9ce4-­‐1093e90b5d80",
           "timestamp":  1337001337,
           "updates":  [
               {
                   "_op":  "delete",
                   "attribute":  "email"
               }
           ]
       ]
    }

    View full-size slide

  44. timelines*
    * riak at yammer: http://basho.com/blog/technical/2011/03/28/Riak-and-Scala-at-Yammer/

    View full-size slide

  45. time-ordered series of events

    View full-size slide

  46. kept per user

    View full-size slide

  47. {
       "events":  [
           {
               "id":  "ca0cb932-­‐a74e-­‐11e1-­‐9ce4-­‐1093e90b5d80",
               "timestamp":  1337001337,
               "event":  {
                   "type":  "push",
                   "repository":  "rails/rails",
                   "sha1":  "0ea43bf"
               }
           },  {
               "id":  "e018f024-­‐a74e-­‐11e1-­‐9feb-­‐1093e90b5d80",
    "timestamp":  1337001337,
    "event":  {
       "type":  "pull_request",
       "repository":  "rails/rails",
       "sha1":  "84efda0"
    }
             }
       ]
    }

    View full-size slide

  48. clients dedup, sort and
    truncate

    View full-size slide

  49. observation:
    clients manage the data

    View full-size slide

  50. sets, counters, graphs

    View full-size slide

  51. monotonic data structures

    View full-size slide

  52. an unordered bag
    of unique items

    View full-size slide

  53. simplest thing that could
    possibly work...in riak

    View full-size slide

  54. secondary indexes

    View full-size slide

  55. X-­‐Riak-­‐Index-­‐tags_bin:  nosql,  cloud,  infrastructure
    {
       "id":  1,
       "username":  "roidrage",
       "email":  "[email protected]"
    }

    View full-size slide

  56. always unique

    View full-size slide

  57. useful for simple things

    View full-size slide

  58. useful for object associations

    View full-size slide

  59. set:
    time-ordered list of
    operations

    View full-size slide

  60. {
       "set":  [
           {
               "id":  "e018f024-­‐a74e-­‐11e1-­‐9feb-­‐1093e90b5d80",
               "timestamp":  1337001337,
               "op":  "add",
               "value":  "roidrage"
           }
       ]
    }

    View full-size slide

  61. {
       "set":  [
           {
               "id":  "e018f024-­‐a74e-­‐11e1-­‐9feb-­‐1093e90b5d80",
               "timestamp":  1337001337,
               "op":  "add",
               "value":  "roidrage"
           },  {
               "id":  "56707cee-­‐a757-­‐11e1-­‐8e1b-­‐1093e90b5d80",
               "timestamp":  1337001339,
               "op":  "add",
               "value":  "josh"
           }
       ]
    }

    View full-size slide

  62. {
       "set":  [
           {
               "id":  "e018f024-­‐a74e-­‐11e1-­‐9feb-­‐1093e90b5d80",
               "timestamp":  1337001337,
               "op":  "add",
               "value":  "roidrage"
           },  {
               "id":  "56707cee-­‐a757-­‐11e1-­‐8e1b-­‐1093e90b5d80",
               "timestamp":  1337001339,
               "op":  "add",
               "value":  "josh"
           },  {
               "id":  "a525f16c-­‐a968-­‐11e1-­‐8b07-­‐1093e90b5d80",
               "timestamp":  1337001343,
               "op":  "remove",
               "value":  "josh"
           }
       ]
    }

    View full-size slide

  63. slightly inefficient

    View full-size slide

  64. 2-phase set*
    * https://github.com/aphyr/meangirls

    View full-size slide

  65. {
       "set":  {
           "adds":  ["roidrage",  "josh"],
           "removes":  ["josh"]
       }
    }

    View full-size slide

  66. increment, decrement

    View full-size slide

  67. {
       "counter":  [
           {
               "id":  "e018f024-­‐a74e-­‐11e1-­‐9feb-­‐1093e90b5d80",
               "timestamp":  1337001337,
               "op":  "incr",
               "value":  4
           }
       ],
    }

    View full-size slide

  68. g-counters*
    *a comprehensive study of convergent and commutative replicated data types
    http://hal.inria.fr/docs/00/55/55/88/PDF/techreport.pdf

    View full-size slide

  69. {
       "elements":  {
           "client-­‐1":  1,
           "client-­‐2":  3,
           "client-­‐3":  5
       }
    }
    value = 1 + 3 + 5 = 9

    View full-size slide

  70. counters are easy
    when you increment only

    View full-size slide

  71. convergent replicated
    data types
    *shapiro et. al.: a comprehensive study of convergent and commutative replicated data types
    http://hal.inria.fr/docs/00/55/55/88/PDF/techreport.pdf

    View full-size slide

  72. statebox for erlang*
    * https://github.com/mochi/statebox

    View full-size slide

  73. knockbox for clojure*
    * https://github.com/reiddraper/knockbox

    View full-size slide

  74. data represents state

    View full-size slide

  75. state-based means growth

    View full-size slide

  76. data increases
    with lots of updates

    View full-size slide

  77. dealing with growth

    View full-size slide

  78. roll up, discard

    View full-size slide

  79. {
       "counter":  [{
           "id":  "458f5936-­‐a752-­‐11e1-­‐a876-­‐1093e90b5d80",
           "timestamp":  1337001347,
           "op":  "inc",
           "value":  1
       }],
       "value":  2
    }

    View full-size slide

  80. garbage collection

    View full-size slide

  81. not easy with riak

    View full-size slide

  82. not easy with
    stateful data

    View full-size slide

  83. garbage collection
    requires coordination

    View full-size slide

  84. network partitions
    cause stale data

    View full-size slide

  85. the solution?

    View full-size slide

  86. trade off
    data size vs. consistency

    View full-size slide

  87. commutative replicated
    data types*
    *shapiro et. al.: a comprehensive study of convergent and commutative replicated data types
    http://hal.inria.fr/docs/00/55/55/88/PDF/techreport.pdf

    View full-size slide

  88. operations instead of state

    View full-size slide

  89. not yet possible with riak

    View full-size slide

  90. eventual consistency is hard

    View full-size slide