Riak Overview

Riak Overview

Riak is key-value datastore. I hosted the 1st Stockholm Riak Meetup and started of with presenting an overview of Riak.

Approx 45 minutes.

A204e1fe2002bc6d087391759c3dfab0?s=128

Mårten Gustafson

March 31, 2011
Tweet

Transcript

  1. 1.

    Overview Mårten Gustafson Stockholm Riak Meetup #1 2011-03-31 1 ->

    Don’t forget to start the demo instance 2 -> Reset Chrome and move to proper space
  2. 2.

    What is Riak and what’s the agenda? Decentralized key-value store

    A database ideally suited for web applications A flexible map/reduce engine
  3. 3.

    What is Riak and what’s the agenda? Decentralized key-value store

    A database ideally suited for web applications A flexible map/reduce engine
  4. 4.

    http://www.flickr.com/photos/linneberg/4481309196/ Buckets Buckets contain things Things are keys, that have

    values A bucket is like a namespace (and it’s cheap - could even be free)
  5. 5.

    Operations Riak Key/Value Store Bucket Bucket Bucket Item Item Item

    Item Item Item Item Item HTTP Client PUT /bucket/key GET /bucket/key DELETE /bucket/key
  6. 7.

    An entry • lives in a bucket • has a

    key • has a value bucket arbitrary name
  7. 8.

    An entry • lives in a bucket • has a

    key • has a value bucket key arbitrary name arbitrary name
  8. 9.

    An entry • lives in a bucket • has a

    key • has a value bucket key arbitrary name arbitrary name forms the path to the value
  9. 10.

    An entry • lives in a bucket • has a

    key • has a value bucket key value arbitrary name arbitrary name a binary blob and mime type forms the path to the value
  10. 11.

    An entry • lives in a bucket • has a

    key • has a value bucket key value arbitrary name arbitrary name forms the path to the value a binary blob and mime type = Store anything, yay!
  11. 14.

    Store anything <html><body> <h1> <a href=”bar.html”>foo</a> </h1> </body></html> <html><body> <h1>bar!</h1>

    </body></html> key foo.html mime text/html key bar.html mime text/html bucket meetup
  12. 15.

    Store anything http://127.0.0.1:8088/riak/meetup/foo.html <html><body> <h1> <a href=”bar.html”>foo</a> </h1> </body></html> <html><body>

    <h1>bar!</h1> </body></html> key foo.html mime text/html key bar.html mime text/html bucket meetup
  13. 16.

    What is Riak and what’s the agenda? Decentralized key-value store

    A database ideally suited for web applications A flexible map/reduce engine
  14. 17.

    What is Riak and what’s the agenda? Decentralized key-value store

    A database ideally suited for web applications A flexible map/reduce engine Database -> Persistence -> Durability -> Storage backends
  15. 18.

    Storage Riak Key/Value Store Pluggable back-ends, per bucket Bitcask keeps

    the key-set in memory (ie, in some (rare) cases it might not fit, then look to InnoDB) ETS - Built in erlang storage, DETS is ETS on disk
  16. 19.

    Storage Riak Key/Value Store Bitcask InnoDB DETS ETS Balanced trees

    File system LRU Pluggable back-ends, per bucket Bitcask keeps the key-set in memory (ie, in some (rare) cases it might not fit, then look to InnoDB) ETS - Built in erlang storage, DETS is ETS on disk
  17. 20.

    Storage Riak Key/Value Store Bitcask InnoDB DETS ETS Balanced trees

    File system LRU ram based, not durable disk based, durable Pluggable back-ends, per bucket Bitcask keeps the key-set in memory (ie, in some (rare) cases it might not fit, then look to InnoDB) ETS - Built in erlang storage, DETS is ETS on disk
  18. 21.

    Storage Riak Key/Value Store Bitcask InnoDB DETS ETS Balanced trees

    File system LRU ram based, not durable disk based, durable default common Pluggable back-ends, per bucket Bitcask keeps the key-set in memory (ie, in some (rare) cases it might not fit, then look to InnoDB) ETS - Built in erlang storage, DETS is ETS on disk
  19. 22.

    What is Riak and what’s the agenda? Decentralized key-value store

    A database ideally suited for web applications A flexible map/reduce engine Pluggable back-ends, per bucket
  20. 23.

    What is Riak and what’s the agenda? Decentralized key-value store

    A database ideally suited for web applications A flexible map/reduce engine Decentralized -> Cluster - No master node - No single point of failure
  21. 24.

    The Ring The “storage space” is partitioned into a ring

    Keys and their values has a primary partition
  22. 25.

    The Ring A ring size of 1024 should accommodate most

    needs Once you’ve set your ring size, it’s fixed Only way to change is to backup/restore your entire cluster
  23. 26.

    The Ring ring size = 12 1 2 3 4

    5 6 7 8 9 10 11 12 A ring size of 1024 should accommodate most needs Once you’ve set your ring size, it’s fixed Only way to change is to backup/restore your entire cluster
  24. 27.

    Consistent Hashing 1 2 3 4 5 6 7 8

    9 10 11 12 Always maps key “x” to partition “y”
  25. 28.

    Consistent Hashing 1 2 3 4 5 6 7 8

    9 10 11 12 Always maps key “x” to partition “y”
  26. 31.

    Cluster One Ring size to rule them all, One Ring

    size to find them, One Ring size to bring them all and in the cluster bind them...
  27. 32.

    Cluster node A node B node C ring size =

    12 instances = 3 ring size / nodes = ~slices per instances
  28. 33.

    Cluster node A node B node C ring size =

    12 instances = 3 ring size / nodes = ~slices per instances
  29. 35.

    Cluster - Read node A node B node C I

    can haz ? Hm, hashes to a slice of the ring owned by node C.
  30. 36.

    Cluster - Read node A node B node C Okidoki,

    now where’s he...a yeah in my fourth slice I can haz ? Hey C! I need
  31. 44.

    ...concurrent writes? node A node B node C client 1

    client 2 hey, A! save for me ? hey, C! save for me
  32. 45.

    What is Riak and what’s the agenda? Decentralized key-value store

    A database ideally suited for web applications A flexible map/reduce engine
  33. 47.

    The N to the R to the W to the

    DW and the RW Buckets have defaults for R, W, DW and RW
  34. 48.
  35. 49.

    Node A Node B Node C N_VAL Bucket 1 Client

    Number of replicas, set per bucket
  36. 50.

    Node A Node B Node C N_VAL Bucket 2 Client

    Number of replicas, set per bucket
  37. 51.

    Node A Node B Node C N_VAL Bucket 3 Client

    Number of replicas, set per bucket
  38. 52.

    Node A Node B Node C R Bucket GET /bucket/key?R=1

    Client R = Read Will pose a request to all nodes, answer to client when R clients have agreed
  39. 53.

    Node A Node B Node C R Bucket GET /bucket/key?R=1

    Client R = Read Will pose a request to all nodes, answer to client when R clients have agreed
  40. 54.

    Node A Node B Node C R Bucket Client GET

    /bucket/key?R=2 R = Read Will pose a request to all nodes, answer to client when R clients have agreed
  41. 55.

    Node A Node B Node C R Bucket Client GET

    /bucket/key?R=2 R = Read Will pose a request to all nodes, answer to client when R clients have agreed
  42. 56.

    Node A Node B Node C R Bucket Agree? Client

    GET /bucket/key?R=2 R = Read Will pose a request to all nodes, answer to client when R clients have agreed
  43. 57.

    Node A Node B Node C R Bucket Client GET

    /bucket/key?R=2 R = Read Will pose a request to all nodes, answer to client when R clients have agreed
  44. 58.

    The N to the R to the W to the

    DW and the RW Buckets have defaults for R, W, DW and RW
  45. 59.

    The N to the R to the W to the

    DW and the RW Number of copies ie. distribute to N nodes Buckets have defaults for R, W, DW and RW
  46. 60.

    The N to the R to the W to the

    DW and the RW Number of copies ie. distribute to N nodes Read ie. have R nodes agree Buckets have defaults for R, W, DW and RW
  47. 61.

    The N to the R to the W to the

    DW and the RW Number of copies ie. distribute to N nodes Read ie. have R nodes agree Write ie. ack’d by W nodes Buckets have defaults for R, W, DW and RW
  48. 62.

    The N to the R to the W to the

    DW and the RW Durable write ie. persistently written by DW nodes Number of copies ie. distribute to N nodes Read ie. have R nodes agree Write ie. ack’d by W nodes Buckets have defaults for R, W, DW and RW
  49. 63.

    The N to the R to the W to the

    DW and the RW Read-write ie. persistently deleted by RW nodes Durable write ie. persistently written by DW nodes Number of copies ie. distribute to N nodes Read ie. have R nodes agree Write ie. ack’d by W nodes Buckets have defaults for R, W, DW and RW
  50. 64.

    The Quorum ([node count] / 2) + 1 quorum =

    majority quorum = valid value for r, w, dw, rw
  51. 68.

    I don’t care! allow_mult = false last_write_wins = true Bucket

    properties allow_mult = false -> siblings never returned last_write_wins = true
  52. 70.

    I do care! • Resolve conflicts in application logic •

    Conflicts exposed as siblings beneath a key • Response is HTTP 300 Multiple Choice • Served as mime/multipart
  53. 71.

    Example HTTP/1.1 300 Multiple Choices X-Riak-Vclock: a85hYGBgzWDKBVIsrLnh3BlMiYx5rAymfeeO8EGFWRLl30G== Content-Type: multipart/mixed; boundary=ZZ3eyjUllBi7GXRRMJsUublFxjn

    Content-Length: 368 --ZZ3eyjUllBi7GXRRMJsUublFxjn Content-Type: text/plain Tuesday --ZZ3eyjUllBi7GXRRMJsUublFxjn Content-Type: text/plain Thursday --ZZ3eyjUllBi7GXRRMJsUublFxjn--
  54. 72.

    What is Riak and what’s the agenda? Decentralized key-value store

    A database ideally suited for web applications A flexible map/reduce engine
  55. 74.

    Map / Reduce count words function(v) { var words =

    v.values[0].data.toLowerCase().match('\w*','g'); var counts = []; for(var word in words) if (words[word] != '') { var count = {}; count[words[word]] = 1; counts.push(count); } return counts; }
  56. 75.

    Map / Reduce count words function(values) { var result =

    {}; for (var value in values) { for(var word in values[value]) { if (word in result) result[word] += values[value][word]; else result[word] = values[value][word]; } } return [result]; }
  57. 76.

    Map & Reduce count words {"inputs":"bucket", "query":[{"map":{"language":"javascript", "source":"function(v) { var

    words = v.values[0].data.toLowerCase().match(/ \w*/g); var counts = []; for(var word in words) if (words[word] != '') { var count = {}; count[words[word]] = 1; counts.push(count); } return counts; }"}},{"reduce":{"language":"javascript", "source":"function(values) { var result = {}; for (var value in values) { for(var word in values [value]) { if (word in result) result[word] += values[value][word]; else result[word] = values[value][word]; } } return [result]; }"}}]} Put this in your POST request and let Riak smoke it
  58. 77.

    Map & Reduce count words {"inputs":"bucket", "query":[{"map":{"language":"javascript", "source":"function(v) { var

    words = v.values[0].data.toLowerCase().match(/ \w*/g); var counts = []; for(var word in words) if (words[word] != '') { var count = {}; count[words[word]] = 1; counts.push(count); } return counts; }"}},{"reduce":{"language":"javascript", "source":"function(values) { var result = {}; for (var value in values) { for(var word in values [value]) { if (word in result) result[word] += values[value][word]; else result[word] = values[value][word]; } } return [result]; }"}}]} function(v) { var words = v.values[0].data.toLowerCase().match('\w*','g'); var counts = []; for(var word in words) if (words[word] != '') { var count = {}; count[words[word]] = 1; counts.push(count); } return counts; }
  59. 78.

    Map & Reduce count words {"inputs":"bucket", "query":[{"map":{"language":"javascript", "source":"function(v) { var

    words = v.values[0].data.toLowerCase().match(/ \w*/g); var counts = []; for(var word in words) if (words[word] != '') { var count = {}; count[words[word]] = 1; counts.push(count); } return counts; }"}},{"reduce":{"language":"javascript", "source":"function(values) { var result = {}; for (var value in values) { for(var word in values [value]) { if (word in result) result[word] += values[value][word]; else result[word] = values[value][word]; } } return [result]; }"}}]} function(values) { var result = {}; for (var value in values) { for(var word in values[value]) { if (word in result) result[word] += values[value][word]; else result[word] = values[value][word]; } } return [result]; }
  60. 82.

    The whole enchilada Erlang / OTP Riak Key/Value Store Riak

    Core Riak Search HTTP API Luwak Partitioning (consistent hashing, hinted handoff) Membership management leave/join Work distribution Cluster state gossip protocol Bitcask InnoDB DETS ETS Balanced trees File system LRU