Upgrade to Pro — share decks privately, control downloads, hide ads and more …

designing for concurrency with riak

designing for concurrency with riak

4d9dd9bd8d3d4d0ba8af2acc41d14006?s=128

Mathias Meyer

May 29, 2012
Tweet

Transcript

  1. designing for concurrency with riak nosql matters mathias meyer, @roidrage

  2. None
  3. http://riakhandbook.com

  4. design for concurrency?

  5. design data for concurrency

  6. data starts out simple

  7. ID Username Email 1 roidrage meyer@paperplanes.de 2 thomas thomas@example.com 3

    karen karen@example.com
  8. single source of truth

  9. always consistent

  10. mostly consistent

  11. monotonic

  12. increase number of sources

  13. replication

  14. ID Username Email 1 roidrage meyer@paperplanes.de 2 thomas thomas@example.com 3

    karen karen@example.com ID Username Email 1 roidrage meyer@paperplanes.de 2 thomas thomas@example.com 3 karen karen@example.com
  15. ID Username Email 1 roidrage meyer@paperplanes.de 2 thomas thomas@example.com 3

    karen karen@example.com ID Username Email 1 roidrage meyer@paperplanes.de 2 thomas thomas@example.com 3 karen karen@example.com ID Username Email 1 roidrage meyer@paperplanes.de 2 thomas thomas@example.com 3 karen karen@example.com
  16. ID Username Email 1 roidrage meyer@paperplanes.de 2 thomas thomas@example.com 3

    karen karen@example.com ID Username Email 1 roidrage meyer@paperplanes.de 2 thomas thomas@example.com 3 karen karen@example.com ID Username Email 1 roidrage meyer@paperplanes.de 2 thomas thomas@example.com 3 karen karen@example.com ID Username Email 1 roidrage meyer@paperplanes.de 2 thomas thomas@example.com 3 karen karen@example.com ID Username Email 1 roidrage meyer@paperplanes.de 2 thomas thomas@example.com 3 karen karen@example.com ID Username Email 1 roidrage meyer@paperplanes.de 2 thomas thomas@example.com 3 karen karen@example.com
  17. eventual consistency* * if no new updates are made to

    the object, eventually all accesses will return the last updated value. werner vogels, 2008, http://queue.acm.org/detail.cfm?id=1466448
  18. multiple clients

  19. ID Username Email 1 roidrage meyer@paperplanes.de 2 thomas thomas@example.com 3

    karen karen@example.com ID Username Email 1 roidrage meyer@paperplanes.de 2 thomas thomas@example.com 3 karen karen@example.com Client 1 Client 2 PUT PUT
  20. conflicting writes

  21. siblings

  22. data diverges

  23. the challenge

  24. determine the winner

  25. determine order

  26. designing data for concurrency

  27. designing data for non-monotonic writes

  28. no atomicity in riak

  29. no coordination

  30. all state is in the data

  31. (eventual) consistency and logical monoticity * hellerstein: the declarative imperative:

    experiences and conjectures in distributed logic (2010)
  32. designing data with conflicts in mind

  33. write now, converge later

  34. rethink the data structures

  35. ID Username Email 1 roidrage meyer@paperplanes.de {    "id":  1,

       "username":  "roidrage",    "email":  "meyer@paperplanes.de" }
  36. track updates

  37. {    "id":  1,    "username":  "roidrage",    "email":  "meyer@paperplanes.de"

       "changes":  [        {            "client":  "client-­‐1",            "timestamp":  1337001337,            "updates":  [                "firstname":  "Mathias",                "lastname":  "Meyer"            ]        }    ] }
  38. {    "id":  1,    "username":  "roidrage",    "email":  "meyer@paperplanes.de"

       "changes":  [  {            "client":  "client-­‐2",            "timestamp":  1337001337,            "updates":  [                "email":  "info@riakhandbook.com"            ]        }    ] }
  39. apply all updates ordered by time

  40. what about removing data?

  41. {    "id":  1,    "username":  "roidrage",    "email":  "meyer@paperplanes.de"

       "changes":  [{        "client":  "client-­‐1",        "timestamp":  1337001337,        "updates":  [  {                "_op":  "delete",                "attribute":  "email"            }        ]    }] }
  42. {    "id":  1,    "username":  "roidrage",    "email":  "meyer@paperplanes.de"

       "changes":  [{        "client":  "client-­‐2",        "timestamp":  1337001337,        "updates":  [            {                "_op":  "add",                "attribute":  "email",                "value":  "info@paperplanes.de"            }        ]    }] }
  43. keep a changelog

  44. client converges data

  45. time as a means of ordering* * leslie lamport, et.

    al.: time, clocks and the ordering of events in a distributed system (1977)
  46. time is not a guarantee for uniqueness

  47. vector clocks?

  48. {    "id":  1,    "username":  "roidrage",    "email":  "meyer@paperplanes.de"

       "changes":  [{        "id":  "ca0cb932-­‐a74e-­‐11e1-­‐9ce4-­‐1093e90b5d80",        "timestamp":  1337001337,        "updates":  [            {                "_op":  "delete",                "attribute":  "email"            }        ]    ] }
  49. timelines* * riak at yammer: http://basho.com/blog/technical/2011/03/28/Riak-and-Scala-at-Yammer/

  50. time-ordered series of events

  51. kept per user

  52. {    "events":  [        {    

           "id":  "ca0cb932-­‐a74e-­‐11e1-­‐9ce4-­‐1093e90b5d80",            "timestamp":  1337001337,            "event":  {                "type":  "push",                "repository":  "rails/rails",                "sha1":  "0ea43bf"            }        },  {            "id":  "e018f024-­‐a74e-­‐11e1-­‐9feb-­‐1093e90b5d80", "timestamp":  1337001337, "event":  {    "type":  "pull_request",    "repository":  "rails/rails",    "sha1":  "84efda0" }          }    ] }
  53. clients dedup, sort and truncate

  54. observation: clients manage the data

  55. sets, counters, graphs

  56. monotonic data structures

  57. sets

  58. an unordered bag of unique items

  59. simplest thing that could possibly work...in riak

  60. secondary indexes

  61. X-­‐Riak-­‐Index-­‐tags_bin:  nosql,  cloud,  infrastructure {    "id":  1,    "username":

     "roidrage",    "email":  "meyer@paperplanes.de" }
  62. always unique

  63. useful for simple things

  64. useful for object associations

  65. add-only

  66. set: time-ordered list of operations

  67. {    "set":  [        {    

           "id":  "e018f024-­‐a74e-­‐11e1-­‐9feb-­‐1093e90b5d80",            "timestamp":  1337001337,            "op":  "add",            "value":  "roidrage"        }    ] }
  68. {    "set":  [        {    

           "id":  "e018f024-­‐a74e-­‐11e1-­‐9feb-­‐1093e90b5d80",            "timestamp":  1337001337,            "op":  "add",            "value":  "roidrage"        },  {            "id":  "56707cee-­‐a757-­‐11e1-­‐8e1b-­‐1093e90b5d80",            "timestamp":  1337001339,            "op":  "add",            "value":  "josh"        }    ] }
  69. {    "set":  [        {    

           "id":  "e018f024-­‐a74e-­‐11e1-­‐9feb-­‐1093e90b5d80",            "timestamp":  1337001337,            "op":  "add",            "value":  "roidrage"        },  {            "id":  "56707cee-­‐a757-­‐11e1-­‐8e1b-­‐1093e90b5d80",            "timestamp":  1337001339,            "op":  "add",            "value":  "josh"        },  {            "id":  "a525f16c-­‐a968-­‐11e1-­‐8b07-­‐1093e90b5d80",            "timestamp":  1337001343,            "op":  "remove",            "value":  "josh"        }    ] }
  70. slightly inefficient

  71. 2-phase set* * https://github.com/aphyr/meangirls

  72. {    "set":  {        "adds":  ["roidrage",  "josh"],

           "removes":  ["josh"]    } }
  73. counters

  74. increment, decrement

  75. {    "counter":  [        {    

           "id":  "e018f024-­‐a74e-­‐11e1-­‐9feb-­‐1093e90b5d80",            "timestamp":  1337001337,            "op":  "incr",            "value":  4        }    ], }
  76. g-counters* *a comprehensive study of convergent and commutative replicated data

    types http://hal.inria.fr/docs/00/55/55/88/PDF/techreport.pdf
  77. {    "elements":  {        "client-­‐1":  1,  

         "client-­‐2":  3,        "client-­‐3":  5    } } value = 1 + 3 + 5 = 9
  78. counters are easy when you increment only

  79. convergent replicated data types *shapiro et. al.: a comprehensive study

    of convergent and commutative replicated data types http://hal.inria.fr/docs/00/55/55/88/PDF/techreport.pdf
  80. statebox for erlang* * https://github.com/mochi/statebox

  81. knockbox for clojure* * https://github.com/reiddraper/knockbox

  82. data represents state

  83. state-based means growth

  84. data increases with lots of updates

  85. dealing with growth

  86. truncate

  87. roll up, discard

  88. {    "counter":  [{        "id":  "458f5936-­‐a752-­‐11e1-­‐a876-­‐1093e90b5d80",  

         "timestamp":  1337001347,        "op":  "inc",        "value":  1    }],    "value":  2 }
  89. garbage collection

  90. not easy with riak

  91. not easy with stateful data

  92. garbage collection requires coordination

  93. network partitions cause stale data

  94. the solution?

  95. trade off data size vs. consistency

  96. commutative replicated data types* *shapiro et. al.: a comprehensive study

    of convergent and commutative replicated data types http://hal.inria.fr/docs/00/55/55/88/PDF/techreport.pdf
  97. operations instead of state

  98. not yet possible with riak

  99. eventual consistency is hard

  100. thanks