$30 off During Our Annual Pro Sale. View Details »

Riak Overview

Riak Overview

Riak is key-value datastore. I hosted the 1st Stockholm Riak Meetup and started of with presenting an overview of Riak.

Approx 45 minutes.

Mårten Gustafson

March 31, 2011
Tweet

More Decks by Mårten Gustafson

Other Decks in Technology

Transcript

  1. Overview
    Mårten Gustafson
    Stockholm Riak Meetup #1
    2011-03-31
    1 -> Don’t forget to start the demo instance
    2 -> Reset Chrome and move to proper space

    View Slide

  2. What is Riak
    and what’s the agenda?
    Decentralized key-value store
    A database ideally suited for web applications
    A flexible map/reduce engine

    View Slide

  3. What is Riak
    and what’s the agenda?
    Decentralized key-value store
    A database ideally suited for web applications
    A flexible map/reduce engine

    View Slide

  4. http://www.flickr.com/photos/linneberg/4481309196/
    Buckets
    Buckets contain things
    Things are keys, that have values
    A bucket is like a namespace (and it’s cheap - could even be free)

    View Slide

  5. Operations
    Riak Key/Value Store
    Bucket Bucket Bucket
    Item
    Item
    Item
    Item
    Item
    Item
    Item Item
    HTTP Client
    PUT /bucket/key
    GET /bucket/key
    DELETE /bucket/key

    View Slide

  6. An entry
    • lives in a bucket
    • has a key
    • has a value

    View Slide

  7. An entry
    • lives in a bucket
    • has a key
    • has a value
    bucket
    arbitrary name

    View Slide

  8. An entry
    • lives in a bucket
    • has a key
    • has a value
    bucket
    key
    arbitrary name
    arbitrary name

    View Slide

  9. An entry
    • lives in a bucket
    • has a key
    • has a value
    bucket
    key
    arbitrary name
    arbitrary name
    forms the path
    to the value

    View Slide

  10. An entry
    • lives in a bucket
    • has a key
    • has a value
    bucket
    key
    value
    arbitrary name
    arbitrary name
    a binary blob
    and mime type
    forms the path
    to the value

    View Slide

  11. An entry
    • lives in a bucket
    • has a key
    • has a value
    bucket
    key
    value
    arbitrary name
    arbitrary name
    forms the path
    to the value
    a binary blob
    and mime type
    = Store anything, yay!

    View Slide

  12. Store anything
    bucket meetup

    View Slide

  13. Store anything


    foo


    key foo.html mime text/html
    bucket meetup

    View Slide

  14. Store anything


    foo



    bar!

    key foo.html mime text/html key bar.html
    mime text/html
    bucket meetup

    View Slide

  15. Store anything
    http://127.0.0.1:8088/riak/meetup/foo.html


    foo



    bar!

    key foo.html mime text/html key bar.html
    mime text/html
    bucket meetup

    View Slide

  16. What is Riak
    and what’s the agenda?
    Decentralized key-value store
    A database ideally suited for web applications
    A flexible map/reduce engine

    View Slide

  17. What is Riak
    and what’s the agenda?
    Decentralized key-value store
    A database ideally suited for web applications
    A flexible map/reduce engine
    Database -> Persistence -> Durability -> Storage backends

    View Slide

  18. Storage
    Riak Key/Value Store
    Pluggable back-ends, per bucket
    Bitcask keeps the key-set in memory (ie, in some (rare) cases it might not fit, then look to InnoDB)
    ETS - Built in erlang storage, DETS is ETS on disk

    View Slide

  19. Storage
    Riak Key/Value Store
    Bitcask InnoDB DETS ETS
    Balanced trees
    File system LRU
    Pluggable back-ends, per bucket
    Bitcask keeps the key-set in memory (ie, in some (rare) cases it might not fit, then look to InnoDB)
    ETS - Built in erlang storage, DETS is ETS on disk

    View Slide

  20. Storage
    Riak Key/Value Store
    Bitcask InnoDB DETS ETS
    Balanced trees
    File system LRU
    ram based,
    not durable
    disk based,
    durable
    Pluggable back-ends, per bucket
    Bitcask keeps the key-set in memory (ie, in some (rare) cases it might not fit, then look to InnoDB)
    ETS - Built in erlang storage, DETS is ETS on disk

    View Slide

  21. Storage
    Riak Key/Value Store
    Bitcask InnoDB DETS ETS
    Balanced trees
    File system LRU
    ram based,
    not durable
    disk based,
    durable
    default
    common
    Pluggable back-ends, per bucket
    Bitcask keeps the key-set in memory (ie, in some (rare) cases it might not fit, then look to InnoDB)
    ETS - Built in erlang storage, DETS is ETS on disk

    View Slide

  22. What is Riak
    and what’s the agenda?
    Decentralized key-value store
    A database ideally suited for web applications
    A flexible map/reduce engine
    Pluggable back-ends, per bucket

    View Slide

  23. What is Riak
    and what’s the agenda?
    Decentralized key-value store
    A database ideally suited for web applications
    A flexible map/reduce engine
    Decentralized -> Cluster
    - No master node
    - No single point of failure

    View Slide

  24. The Ring
    The “storage space” is partitioned into a ring
    Keys and their values has a primary partition

    View Slide

  25. The Ring
    A ring size of 1024 should accommodate most needs
    Once you’ve set your ring size, it’s fixed
    Only way to change is to backup/restore your entire cluster

    View Slide

  26. The Ring
    ring size = 12
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    A ring size of 1024 should accommodate most needs
    Once you’ve set your ring size, it’s fixed
    Only way to change is to backup/restore your entire cluster

    View Slide

  27. Consistent Hashing
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    Always maps key “x” to partition “y”

    View Slide

  28. Consistent Hashing
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    Always maps key “x” to partition “y”

    View Slide

  29. Read
    “I want “
    requires us to know:
    where is on the ring?

    View Slide

  30. Read
    “I want “
    requires us to know:
    where is on the ring?

    View Slide

  31. Cluster
    One Ring size to rule them all, One Ring size to
    find them, One Ring size to bring them all and in
    the cluster bind them...

    View Slide

  32. Cluster
    node A
    node B
    node C
    ring size = 12
    instances = 3
    ring size / nodes = ~slices per instances

    View Slide

  33. Cluster
    node A
    node B
    node C
    ring size = 12
    instances = 3
    ring size / nodes = ~slices per instances

    View Slide

  34. Cluster - Read
    node A node B node C

    View Slide

  35. Cluster - Read
    node A node B node C
    I can haz ?
    Hm, hashes to a
    slice of the ring
    owned by node C.

    View Slide

  36. Cluster - Read
    node A node B node C
    Okidoki, now
    where’s he...a yeah
    in my fourth slice
    I can haz ?
    Hey C! I need

    View Slide

  37. Cluster - Read
    node A node B node C
    I can haz ?

    View Slide

  38. So what about...

    View Slide

  39. ...network partitions?
    node A node B node C

    View Slide

  40. ...network partitions?
    node A node B node C
    X

    View Slide

  41. ...failed nodes?
    node A node B node C

    View Slide

  42. ...failed nodes?
    node A node B

    View Slide

  43. ...concurrent writes?
    node A node B node C
    client 1 client 2

    View Slide

  44. ...concurrent writes?
    node A node B node C
    client 1 client 2
    hey, A!
    save for me
    ?
    hey, C!
    save for me

    View Slide

  45. What is Riak
    and what’s the agenda?
    Decentralized key-value store
    A database ideally suited for web applications
    A flexible map/reduce engine

    View Slide

  46. Livin’ on the web
    • concurrent users
    • unreliable networks
    • node failures

    View Slide

  47. The N to the R to the W to the DW and the RW
    Buckets have defaults for R, W, DW and RW

    View Slide

  48. Node A Node B Node C
    N_VAL
    Bucket
    Client
    Number of replicas, set per bucket

    View Slide

  49. Node A Node B Node C
    N_VAL
    Bucket
    1
    Client
    Number of replicas, set per bucket

    View Slide

  50. Node A Node B Node C
    N_VAL
    Bucket
    2
    Client
    Number of replicas, set per bucket

    View Slide

  51. Node A Node B Node C
    N_VAL
    Bucket
    3
    Client
    Number of replicas, set per bucket

    View Slide

  52. Node A Node B Node C
    R
    Bucket
    GET /bucket/key?R=1
    Client
    R = Read
    Will pose a request to all nodes, answer to client when R clients have
    agreed

    View Slide

  53. Node A Node B Node C
    R
    Bucket
    GET /bucket/key?R=1
    Client
    R = Read
    Will pose a request to all nodes, answer to client when R clients have
    agreed

    View Slide

  54. Node A Node B Node C
    R
    Bucket
    Client GET /bucket/key?R=2
    R = Read
    Will pose a request to all nodes, answer to client when R clients have
    agreed

    View Slide

  55. Node A Node B Node C
    R
    Bucket
    Client GET /bucket/key?R=2
    R = Read
    Will pose a request to all nodes, answer to client when R clients have
    agreed

    View Slide

  56. Node A Node B Node C
    R
    Bucket
    Agree?
    Client GET /bucket/key?R=2
    R = Read
    Will pose a request to all nodes, answer to client when R clients have
    agreed

    View Slide

  57. Node A Node B Node C
    R
    Bucket
    Client GET /bucket/key?R=2
    R = Read
    Will pose a request to all nodes, answer to client when R clients have
    agreed

    View Slide

  58. The N to the R to the W to the DW and the RW
    Buckets have defaults for R, W, DW and RW

    View Slide

  59. The N to the R to the W to the DW and the RW
    Number of copies
    ie. distribute to N nodes
    Buckets have defaults for R, W, DW and RW

    View Slide

  60. The N to the R to the W to the DW and the RW
    Number of copies
    ie. distribute to N nodes
    Read
    ie. have R nodes agree
    Buckets have defaults for R, W, DW and RW

    View Slide

  61. The N to the R to the W to the DW and the RW
    Number of copies
    ie. distribute to N nodes
    Read
    ie. have R nodes agree
    Write
    ie. ack’d by
    W nodes
    Buckets have defaults for R, W, DW and RW

    View Slide

  62. The N to the R to the W to the DW and the RW
    Durable write
    ie. persistently written
    by DW nodes
    Number of copies
    ie. distribute to N nodes
    Read
    ie. have R nodes agree
    Write
    ie. ack’d by
    W nodes
    Buckets have defaults for R, W, DW and RW

    View Slide

  63. The N to the R to the W to the DW and the RW
    Read-write
    ie. persistently deleted
    by RW nodes
    Durable write
    ie. persistently written
    by DW nodes
    Number of copies
    ie. distribute to N nodes
    Read
    ie. have R nodes agree
    Write
    ie. ack’d by
    W nodes
    Buckets have defaults for R, W, DW and RW

    View Slide

  64. The Quorum
    ([node count] / 2) + 1
    quorum = majority
    quorum = valid value for r, w, dw, rw

    View Slide

  65. Bucket properties
    http://127.0.0.1:8088/riak/meetup/
    Buckets have default values
    Clients can override per request

    View Slide

  66. Conflicts

    View Slide

  67. When worlds collide
    Conflicts

    View Slide

  68. I don’t care!
    allow_mult = false
    last_write_wins = true
    Bucket properties
    allow_mult = false -> siblings never returned
    last_write_wins = true

    View Slide

  69. I do care!
    allow_mult = true

    View Slide

  70. I do care!
    • Resolve conflicts in application logic
    • Conflicts exposed as siblings beneath a key
    • Response is HTTP 300 Multiple Choice
    • Served as mime/multipart

    View Slide

  71. Example
    HTTP/1.1 300 Multiple Choices
    X-Riak-Vclock: a85hYGBgzWDKBVIsrLnh3BlMiYx5rAymfeeO8EGFWRLl30G==
    Content-Type: multipart/mixed; boundary=ZZ3eyjUllBi7GXRRMJsUublFxjn
    Content-Length: 368
    --ZZ3eyjUllBi7GXRRMJsUublFxjn
    Content-Type: text/plain
    Tuesday
    --ZZ3eyjUllBi7GXRRMJsUublFxjn
    Content-Type: text/plain
    Thursday
    --ZZ3eyjUllBi7GXRRMJsUublFxjn--

    View Slide

  72. What is Riak
    and what’s the agenda?
    Decentralized key-value store
    A database ideally suited for web applications
    A flexible map/reduce engine

    View Slide

  73. Map / Reduce
    • Javascript or Erlang
    • Exposed in the HTTP API

    View Slide

  74. Map / Reduce
    count words
    function(v) {
    var words = v.values[0].data.toLowerCase().match('\w*','g');
    var counts = [];
    for(var word in words)
    if (words[word] != '') {
    var count = {};
    count[words[word]] = 1;
    counts.push(count);
    }
    return counts;
    }

    View Slide

  75. Map / Reduce
    count words
    function(values) {
    var result = {};
    for (var value in values) {
    for(var word in values[value]) {
    if (word in result)
    result[word] += values[value][word];
    else
    result[word] = values[value][word];
    }
    }
    return [result];
    }

    View Slide

  76. Map & Reduce
    count words
    {"inputs":"bucket", "query":[{"map":{"language":"javascript",
    "source":"function(v) { var words = v.values[0].data.toLowerCase().match(/
    \w*/g); var counts = []; for(var word in words) if (words[word] != '')
    { var count = {}; count[words[word]] = 1; counts.push(count); } return
    counts; }"}},{"reduce":{"language":"javascript", "source":"function(values)
    { var result = {}; for (var value in values) { for(var word in values
    [value]) { if (word in result) result[word] += values[value][word]; else
    result[word] = values[value][word]; } } return [result]; }"}}]}
    Put this in your POST request and let Riak smoke it

    View Slide

  77. Map & Reduce
    count words
    {"inputs":"bucket", "query":[{"map":{"language":"javascript",
    "source":"function(v) { var words = v.values[0].data.toLowerCase().match(/
    \w*/g); var counts = []; for(var word in words) if (words[word] != '')
    { var count = {}; count[words[word]] = 1; counts.push(count); } return
    counts; }"}},{"reduce":{"language":"javascript", "source":"function(values)
    { var result = {}; for (var value in values) { for(var word in values
    [value]) { if (word in result) result[word] += values[value][word]; else
    result[word] = values[value][word]; } } return [result]; }"}}]}
    function(v) {
    var words = v.values[0].data.toLowerCase().match('\w*','g');
    var counts = [];
    for(var word in words)
    if (words[word] != '') {
    var count = {};
    count[words[word]] = 1;
    counts.push(count);
    }
    return counts;
    }

    View Slide

  78. Map & Reduce
    count words
    {"inputs":"bucket", "query":[{"map":{"language":"javascript",
    "source":"function(v) { var words = v.values[0].data.toLowerCase().match(/
    \w*/g); var counts = []; for(var word in words) if (words[word] != '')
    { var count = {}; count[words[word]] = 1; counts.push(count); } return
    counts; }"}},{"reduce":{"language":"javascript", "source":"function(values)
    { var result = {}; for (var value in values) { for(var word in values
    [value]) { if (word in result) result[word] += values[value][word]; else
    result[word] = values[value][word]; } } return [result]; }"}}]}
    function(values) {
    var result = {};
    for (var value in values) {
    for(var word in values[value]) {
    if (word in result)
    result[word] += values[value][word];
    else
    result[word] = values[value][word];
    }
    }
    return [result];
    }

    View Slide

  79. Map / Reduce
    Demo

    View Slide

  80. Links
    • Non-enforced
    • Traversable
    Link walking uses M/R behind the scenes

    View Slide

  81. Links
    Demo
    Link walking uses M/R behind the scenes

    View Slide

  82. The whole enchilada
    Erlang / OTP
    Riak Key/Value Store
    Riak Core
    Riak Search
    HTTP API Luwak
    Partitioning
    (consistent hashing, hinted handoff)
    Membership management
    leave/join
    Work distribution
    Cluster state
    gossip protocol
    Bitcask InnoDB DETS ETS
    Balanced trees
    File system LRU

    View Slide

  83. Try it
    downloads.basho.com
    brew install riak
    Web admin @ github.com/gmaurice/Riaktive

    View Slide

  84. Resources
    Riak Fast Track @ wiki.basho.com
    #riak @ freenode
    github.com/basho/

    View Slide

  85. Thanks for listening
    Mårten Gustafson
    @martengustafson
    http://marten.gustafson.pp.se/
    [email protected]

    View Slide