Riak Overview

Riak Overview

Riak is key-value datastore. I hosted the 1st Stockholm Riak Meetup and started of with presenting an overview of Riak.

Approx 45 minutes.

A204e1fe2002bc6d087391759c3dfab0?s=128

Mårten Gustafson

March 31, 2011
Tweet

Transcript

  1. Overview Mårten Gustafson Stockholm Riak Meetup #1 2011-03-31 1 ->

    Don’t forget to start the demo instance 2 -> Reset Chrome and move to proper space
  2. What is Riak and what’s the agenda? Decentralized key-value store

    A database ideally suited for web applications A flexible map/reduce engine
  3. What is Riak and what’s the agenda? Decentralized key-value store

    A database ideally suited for web applications A flexible map/reduce engine
  4. http://www.flickr.com/photos/linneberg/4481309196/ Buckets Buckets contain things Things are keys, that have

    values A bucket is like a namespace (and it’s cheap - could even be free)
  5. Operations Riak Key/Value Store Bucket Bucket Bucket Item Item Item

    Item Item Item Item Item HTTP Client PUT /bucket/key GET /bucket/key DELETE /bucket/key
  6. An entry • lives in a bucket • has a

    key • has a value
  7. An entry • lives in a bucket • has a

    key • has a value bucket arbitrary name
  8. An entry • lives in a bucket • has a

    key • has a value bucket key arbitrary name arbitrary name
  9. An entry • lives in a bucket • has a

    key • has a value bucket key arbitrary name arbitrary name forms the path to the value
  10. An entry • lives in a bucket • has a

    key • has a value bucket key value arbitrary name arbitrary name a binary blob and mime type forms the path to the value
  11. An entry • lives in a bucket • has a

    key • has a value bucket key value arbitrary name arbitrary name forms the path to the value a binary blob and mime type = Store anything, yay!
  12. Store anything bucket meetup

  13. Store anything <html><body> <h1> <a href=”bar.html”>foo</a> </h1> </body></html> key foo.html

    mime text/html bucket meetup
  14. Store anything <html><body> <h1> <a href=”bar.html”>foo</a> </h1> </body></html> <html><body> <h1>bar!</h1>

    </body></html> key foo.html mime text/html key bar.html mime text/html bucket meetup
  15. Store anything http://127.0.0.1:8088/riak/meetup/foo.html <html><body> <h1> <a href=”bar.html”>foo</a> </h1> </body></html> <html><body>

    <h1>bar!</h1> </body></html> key foo.html mime text/html key bar.html mime text/html bucket meetup
  16. What is Riak and what’s the agenda? Decentralized key-value store

    A database ideally suited for web applications A flexible map/reduce engine
  17. What is Riak and what’s the agenda? Decentralized key-value store

    A database ideally suited for web applications A flexible map/reduce engine Database -> Persistence -> Durability -> Storage backends
  18. Storage Riak Key/Value Store Pluggable back-ends, per bucket Bitcask keeps

    the key-set in memory (ie, in some (rare) cases it might not fit, then look to InnoDB) ETS - Built in erlang storage, DETS is ETS on disk
  19. Storage Riak Key/Value Store Bitcask InnoDB DETS ETS Balanced trees

    File system LRU Pluggable back-ends, per bucket Bitcask keeps the key-set in memory (ie, in some (rare) cases it might not fit, then look to InnoDB) ETS - Built in erlang storage, DETS is ETS on disk
  20. Storage Riak Key/Value Store Bitcask InnoDB DETS ETS Balanced trees

    File system LRU ram based, not durable disk based, durable Pluggable back-ends, per bucket Bitcask keeps the key-set in memory (ie, in some (rare) cases it might not fit, then look to InnoDB) ETS - Built in erlang storage, DETS is ETS on disk
  21. Storage Riak Key/Value Store Bitcask InnoDB DETS ETS Balanced trees

    File system LRU ram based, not durable disk based, durable default common Pluggable back-ends, per bucket Bitcask keeps the key-set in memory (ie, in some (rare) cases it might not fit, then look to InnoDB) ETS - Built in erlang storage, DETS is ETS on disk
  22. What is Riak and what’s the agenda? Decentralized key-value store

    A database ideally suited for web applications A flexible map/reduce engine Pluggable back-ends, per bucket
  23. What is Riak and what’s the agenda? Decentralized key-value store

    A database ideally suited for web applications A flexible map/reduce engine Decentralized -> Cluster - No master node - No single point of failure
  24. The Ring The “storage space” is partitioned into a ring

    Keys and their values has a primary partition
  25. The Ring A ring size of 1024 should accommodate most

    needs Once you’ve set your ring size, it’s fixed Only way to change is to backup/restore your entire cluster
  26. The Ring ring size = 12 1 2 3 4

    5 6 7 8 9 10 11 12 A ring size of 1024 should accommodate most needs Once you’ve set your ring size, it’s fixed Only way to change is to backup/restore your entire cluster
  27. Consistent Hashing 1 2 3 4 5 6 7 8

    9 10 11 12 Always maps key “x” to partition “y”
  28. Consistent Hashing 1 2 3 4 5 6 7 8

    9 10 11 12 Always maps key “x” to partition “y”
  29. Read “I want “ requires us to know: where is

    on the ring?
  30. Read “I want “ requires us to know: where is

    on the ring?
  31. Cluster One Ring size to rule them all, One Ring

    size to find them, One Ring size to bring them all and in the cluster bind them...
  32. Cluster node A node B node C ring size =

    12 instances = 3 ring size / nodes = ~slices per instances
  33. Cluster node A node B node C ring size =

    12 instances = 3 ring size / nodes = ~slices per instances
  34. Cluster - Read node A node B node C

  35. Cluster - Read node A node B node C I

    can haz ? Hm, hashes to a slice of the ring owned by node C.
  36. Cluster - Read node A node B node C Okidoki,

    now where’s he...a yeah in my fourth slice I can haz ? Hey C! I need
  37. Cluster - Read node A node B node C I

    can haz ?
  38. So what about...

  39. ...network partitions? node A node B node C

  40. ...network partitions? node A node B node C X

  41. ...failed nodes? node A node B node C

  42. ...failed nodes? node A node B

  43. ...concurrent writes? node A node B node C client 1

    client 2
  44. ...concurrent writes? node A node B node C client 1

    client 2 hey, A! save for me ? hey, C! save for me
  45. What is Riak and what’s the agenda? Decentralized key-value store

    A database ideally suited for web applications A flexible map/reduce engine
  46. Livin’ on the web • concurrent users • unreliable networks

    • node failures
  47. The N to the R to the W to the

    DW and the RW Buckets have defaults for R, W, DW and RW
  48. Node A Node B Node C N_VAL Bucket Client Number

    of replicas, set per bucket
  49. Node A Node B Node C N_VAL Bucket 1 Client

    Number of replicas, set per bucket
  50. Node A Node B Node C N_VAL Bucket 2 Client

    Number of replicas, set per bucket
  51. Node A Node B Node C N_VAL Bucket 3 Client

    Number of replicas, set per bucket
  52. Node A Node B Node C R Bucket GET /bucket/key?R=1

    Client R = Read Will pose a request to all nodes, answer to client when R clients have agreed
  53. Node A Node B Node C R Bucket GET /bucket/key?R=1

    Client R = Read Will pose a request to all nodes, answer to client when R clients have agreed
  54. Node A Node B Node C R Bucket Client GET

    /bucket/key?R=2 R = Read Will pose a request to all nodes, answer to client when R clients have agreed
  55. Node A Node B Node C R Bucket Client GET

    /bucket/key?R=2 R = Read Will pose a request to all nodes, answer to client when R clients have agreed
  56. Node A Node B Node C R Bucket Agree? Client

    GET /bucket/key?R=2 R = Read Will pose a request to all nodes, answer to client when R clients have agreed
  57. Node A Node B Node C R Bucket Client GET

    /bucket/key?R=2 R = Read Will pose a request to all nodes, answer to client when R clients have agreed
  58. The N to the R to the W to the

    DW and the RW Buckets have defaults for R, W, DW and RW
  59. The N to the R to the W to the

    DW and the RW Number of copies ie. distribute to N nodes Buckets have defaults for R, W, DW and RW
  60. The N to the R to the W to the

    DW and the RW Number of copies ie. distribute to N nodes Read ie. have R nodes agree Buckets have defaults for R, W, DW and RW
  61. The N to the R to the W to the

    DW and the RW Number of copies ie. distribute to N nodes Read ie. have R nodes agree Write ie. ack’d by W nodes Buckets have defaults for R, W, DW and RW
  62. The N to the R to the W to the

    DW and the RW Durable write ie. persistently written by DW nodes Number of copies ie. distribute to N nodes Read ie. have R nodes agree Write ie. ack’d by W nodes Buckets have defaults for R, W, DW and RW
  63. The N to the R to the W to the

    DW and the RW Read-write ie. persistently deleted by RW nodes Durable write ie. persistently written by DW nodes Number of copies ie. distribute to N nodes Read ie. have R nodes agree Write ie. ack’d by W nodes Buckets have defaults for R, W, DW and RW
  64. The Quorum ([node count] / 2) + 1 quorum =

    majority quorum = valid value for r, w, dw, rw
  65. Bucket properties http://127.0.0.1:8088/riak/meetup/ Buckets have default values Clients can override

    per request
  66. Conflicts

  67. When worlds collide Conflicts

  68. I don’t care! allow_mult = false last_write_wins = true Bucket

    properties allow_mult = false -> siblings never returned last_write_wins = true
  69. I do care! allow_mult = true

  70. I do care! • Resolve conflicts in application logic •

    Conflicts exposed as siblings beneath a key • Response is HTTP 300 Multiple Choice • Served as mime/multipart
  71. Example HTTP/1.1 300 Multiple Choices X-Riak-Vclock: a85hYGBgzWDKBVIsrLnh3BlMiYx5rAymfeeO8EGFWRLl30G== Content-Type: multipart/mixed; boundary=ZZ3eyjUllBi7GXRRMJsUublFxjn

    Content-Length: 368 --ZZ3eyjUllBi7GXRRMJsUublFxjn Content-Type: text/plain Tuesday --ZZ3eyjUllBi7GXRRMJsUublFxjn Content-Type: text/plain Thursday --ZZ3eyjUllBi7GXRRMJsUublFxjn--
  72. What is Riak and what’s the agenda? Decentralized key-value store

    A database ideally suited for web applications A flexible map/reduce engine
  73. Map / Reduce • Javascript or Erlang • Exposed in

    the HTTP API
  74. Map / Reduce count words function(v) { var words =

    v.values[0].data.toLowerCase().match('\w*','g'); var counts = []; for(var word in words) if (words[word] != '') { var count = {}; count[words[word]] = 1; counts.push(count); } return counts; }
  75. Map / Reduce count words function(values) { var result =

    {}; for (var value in values) { for(var word in values[value]) { if (word in result) result[word] += values[value][word]; else result[word] = values[value][word]; } } return [result]; }
  76. Map & Reduce count words {"inputs":"bucket", "query":[{"map":{"language":"javascript", "source":"function(v) { var

    words = v.values[0].data.toLowerCase().match(/ \w*/g); var counts = []; for(var word in words) if (words[word] != '') { var count = {}; count[words[word]] = 1; counts.push(count); } return counts; }"}},{"reduce":{"language":"javascript", "source":"function(values) { var result = {}; for (var value in values) { for(var word in values [value]) { if (word in result) result[word] += values[value][word]; else result[word] = values[value][word]; } } return [result]; }"}}]} Put this in your POST request and let Riak smoke it
  77. Map & Reduce count words {"inputs":"bucket", "query":[{"map":{"language":"javascript", "source":"function(v) { var

    words = v.values[0].data.toLowerCase().match(/ \w*/g); var counts = []; for(var word in words) if (words[word] != '') { var count = {}; count[words[word]] = 1; counts.push(count); } return counts; }"}},{"reduce":{"language":"javascript", "source":"function(values) { var result = {}; for (var value in values) { for(var word in values [value]) { if (word in result) result[word] += values[value][word]; else result[word] = values[value][word]; } } return [result]; }"}}]} function(v) { var words = v.values[0].data.toLowerCase().match('\w*','g'); var counts = []; for(var word in words) if (words[word] != '') { var count = {}; count[words[word]] = 1; counts.push(count); } return counts; }
  78. Map & Reduce count words {"inputs":"bucket", "query":[{"map":{"language":"javascript", "source":"function(v) { var

    words = v.values[0].data.toLowerCase().match(/ \w*/g); var counts = []; for(var word in words) if (words[word] != '') { var count = {}; count[words[word]] = 1; counts.push(count); } return counts; }"}},{"reduce":{"language":"javascript", "source":"function(values) { var result = {}; for (var value in values) { for(var word in values [value]) { if (word in result) result[word] += values[value][word]; else result[word] = values[value][word]; } } return [result]; }"}}]} function(values) { var result = {}; for (var value in values) { for(var word in values[value]) { if (word in result) result[word] += values[value][word]; else result[word] = values[value][word]; } } return [result]; }
  79. Map / Reduce Demo

  80. Links • Non-enforced • Traversable Link walking uses M/R behind

    the scenes
  81. Links Demo Link walking uses M/R behind the scenes

  82. The whole enchilada Erlang / OTP Riak Key/Value Store Riak

    Core Riak Search HTTP API Luwak Partitioning (consistent hashing, hinted handoff) Membership management leave/join Work distribution Cluster state gossip protocol Bitcask InnoDB DETS ETS Balanced trees File system LRU
  83. Try it downloads.basho.com brew install riak Web admin @ github.com/gmaurice/Riaktive

  84. Resources Riak Fast Track @ wiki.basho.com #riak @ freenode github.com/basho/

  85. Thanks for listening Mårten Gustafson @martengustafson http://marten.gustafson.pp.se/ marten.gustafson@gmail.com