Slide 1

Slide 1 text

Overview Mårten Gustafson Stockholm Riak Meetup #1 2011-03-31 1 -> Don’t forget to start the demo instance 2 -> Reset Chrome and move to proper space

Slide 2

Slide 2 text

What is Riak and what’s the agenda? Decentralized key-value store A database ideally suited for web applications A flexible map/reduce engine

Slide 3

Slide 3 text

What is Riak and what’s the agenda? Decentralized key-value store A database ideally suited for web applications A flexible map/reduce engine

Slide 4

Slide 4 text

http://www.flickr.com/photos/linneberg/4481309196/ Buckets Buckets contain things Things are keys, that have values A bucket is like a namespace (and it’s cheap - could even be free)

Slide 5

Slide 5 text

Operations Riak Key/Value Store Bucket Bucket Bucket Item Item Item Item Item Item Item Item HTTP Client PUT /bucket/key GET /bucket/key DELETE /bucket/key

Slide 6

Slide 6 text

An entry • lives in a bucket • has a key • has a value

Slide 7

Slide 7 text

An entry • lives in a bucket • has a key • has a value bucket arbitrary name

Slide 8

Slide 8 text

An entry • lives in a bucket • has a key • has a value bucket key arbitrary name arbitrary name

Slide 9

Slide 9 text

An entry • lives in a bucket • has a key • has a value bucket key arbitrary name arbitrary name forms the path to the value

Slide 10

Slide 10 text

An entry • lives in a bucket • has a key • has a value bucket key value arbitrary name arbitrary name a binary blob and mime type forms the path to the value

Slide 11

Slide 11 text

An entry • lives in a bucket • has a key • has a value bucket key value arbitrary name arbitrary name forms the path to the value a binary blob and mime type = Store anything, yay!

Slide 12

Slide 12 text

Store anything bucket meetup

Slide 13

Slide 13 text

Store anything

foo

key foo.html mime text/html bucket meetup

Slide 14

Slide 14 text

Store anything

foo

bar!

key foo.html mime text/html key bar.html mime text/html bucket meetup

Slide 15

Slide 15 text

Store anything http://127.0.0.1:8088/riak/meetup/foo.html

foo

bar!

key foo.html mime text/html key bar.html mime text/html bucket meetup

Slide 16

Slide 16 text

What is Riak and what’s the agenda? Decentralized key-value store A database ideally suited for web applications A flexible map/reduce engine

Slide 17

Slide 17 text

What is Riak and what’s the agenda? Decentralized key-value store A database ideally suited for web applications A flexible map/reduce engine Database -> Persistence -> Durability -> Storage backends

Slide 18

Slide 18 text

Storage Riak Key/Value Store Pluggable back-ends, per bucket Bitcask keeps the key-set in memory (ie, in some (rare) cases it might not fit, then look to InnoDB) ETS - Built in erlang storage, DETS is ETS on disk

Slide 19

Slide 19 text

Storage Riak Key/Value Store Bitcask InnoDB DETS ETS Balanced trees File system LRU Pluggable back-ends, per bucket Bitcask keeps the key-set in memory (ie, in some (rare) cases it might not fit, then look to InnoDB) ETS - Built in erlang storage, DETS is ETS on disk

Slide 20

Slide 20 text

Storage Riak Key/Value Store Bitcask InnoDB DETS ETS Balanced trees File system LRU ram based, not durable disk based, durable Pluggable back-ends, per bucket Bitcask keeps the key-set in memory (ie, in some (rare) cases it might not fit, then look to InnoDB) ETS - Built in erlang storage, DETS is ETS on disk

Slide 21

Slide 21 text

Storage Riak Key/Value Store Bitcask InnoDB DETS ETS Balanced trees File system LRU ram based, not durable disk based, durable default common Pluggable back-ends, per bucket Bitcask keeps the key-set in memory (ie, in some (rare) cases it might not fit, then look to InnoDB) ETS - Built in erlang storage, DETS is ETS on disk

Slide 22

Slide 22 text

What is Riak and what’s the agenda? Decentralized key-value store A database ideally suited for web applications A flexible map/reduce engine Pluggable back-ends, per bucket

Slide 23

Slide 23 text

What is Riak and what’s the agenda? Decentralized key-value store A database ideally suited for web applications A flexible map/reduce engine Decentralized -> Cluster - No master node - No single point of failure

Slide 24

Slide 24 text

The Ring The “storage space” is partitioned into a ring Keys and their values has a primary partition

Slide 25

Slide 25 text

The Ring A ring size of 1024 should accommodate most needs Once you’ve set your ring size, it’s fixed Only way to change is to backup/restore your entire cluster

Slide 26

Slide 26 text

The Ring ring size = 12 1 2 3 4 5 6 7 8 9 10 11 12 A ring size of 1024 should accommodate most needs Once you’ve set your ring size, it’s fixed Only way to change is to backup/restore your entire cluster

Slide 27

Slide 27 text

Consistent Hashing 1 2 3 4 5 6 7 8 9 10 11 12 Always maps key “x” to partition “y”

Slide 28

Slide 28 text

Consistent Hashing 1 2 3 4 5 6 7 8 9 10 11 12 Always maps key “x” to partition “y”

Slide 29

Slide 29 text

Read “I want “ requires us to know: where is on the ring?

Slide 30

Slide 30 text

Read “I want “ requires us to know: where is on the ring?

Slide 31

Slide 31 text

Cluster One Ring size to rule them all, One Ring size to find them, One Ring size to bring them all and in the cluster bind them...

Slide 32

Slide 32 text

Cluster node A node B node C ring size = 12 instances = 3 ring size / nodes = ~slices per instances

Slide 33

Slide 33 text

Cluster node A node B node C ring size = 12 instances = 3 ring size / nodes = ~slices per instances

Slide 34

Slide 34 text

Cluster - Read node A node B node C

Slide 35

Slide 35 text

Cluster - Read node A node B node C I can haz ? Hm, hashes to a slice of the ring owned by node C.

Slide 36

Slide 36 text

Cluster - Read node A node B node C Okidoki, now where’s he...a yeah in my fourth slice I can haz ? Hey C! I need

Slide 37

Slide 37 text

Cluster - Read node A node B node C I can haz ?

Slide 38

Slide 38 text

So what about...

Slide 39

Slide 39 text

...network partitions? node A node B node C

Slide 40

Slide 40 text

...network partitions? node A node B node C X

Slide 41

Slide 41 text

...failed nodes? node A node B node C

Slide 42

Slide 42 text

...failed nodes? node A node B

Slide 43

Slide 43 text

...concurrent writes? node A node B node C client 1 client 2

Slide 44

Slide 44 text

...concurrent writes? node A node B node C client 1 client 2 hey, A! save for me ? hey, C! save for me

Slide 45

Slide 45 text

What is Riak and what’s the agenda? Decentralized key-value store A database ideally suited for web applications A flexible map/reduce engine

Slide 46

Slide 46 text

Livin’ on the web • concurrent users • unreliable networks • node failures

Slide 47

Slide 47 text

The N to the R to the W to the DW and the RW Buckets have defaults for R, W, DW and RW

Slide 48

Slide 48 text

Node A Node B Node C N_VAL Bucket Client Number of replicas, set per bucket

Slide 49

Slide 49 text

Node A Node B Node C N_VAL Bucket 1 Client Number of replicas, set per bucket

Slide 50

Slide 50 text

Node A Node B Node C N_VAL Bucket 2 Client Number of replicas, set per bucket

Slide 51

Slide 51 text

Node A Node B Node C N_VAL Bucket 3 Client Number of replicas, set per bucket

Slide 52

Slide 52 text

Node A Node B Node C R Bucket GET /bucket/key?R=1 Client R = Read Will pose a request to all nodes, answer to client when R clients have agreed

Slide 53

Slide 53 text

Node A Node B Node C R Bucket GET /bucket/key?R=1 Client R = Read Will pose a request to all nodes, answer to client when R clients have agreed

Slide 54

Slide 54 text

Node A Node B Node C R Bucket Client GET /bucket/key?R=2 R = Read Will pose a request to all nodes, answer to client when R clients have agreed

Slide 55

Slide 55 text

Node A Node B Node C R Bucket Client GET /bucket/key?R=2 R = Read Will pose a request to all nodes, answer to client when R clients have agreed

Slide 56

Slide 56 text

Node A Node B Node C R Bucket Agree? Client GET /bucket/key?R=2 R = Read Will pose a request to all nodes, answer to client when R clients have agreed

Slide 57

Slide 57 text

Node A Node B Node C R Bucket Client GET /bucket/key?R=2 R = Read Will pose a request to all nodes, answer to client when R clients have agreed

Slide 58

Slide 58 text

The N to the R to the W to the DW and the RW Buckets have defaults for R, W, DW and RW

Slide 59

Slide 59 text

The N to the R to the W to the DW and the RW Number of copies ie. distribute to N nodes Buckets have defaults for R, W, DW and RW

Slide 60

Slide 60 text

The N to the R to the W to the DW and the RW Number of copies ie. distribute to N nodes Read ie. have R nodes agree Buckets have defaults for R, W, DW and RW

Slide 61

Slide 61 text

The N to the R to the W to the DW and the RW Number of copies ie. distribute to N nodes Read ie. have R nodes agree Write ie. ack’d by W nodes Buckets have defaults for R, W, DW and RW

Slide 62

Slide 62 text

The N to the R to the W to the DW and the RW Durable write ie. persistently written by DW nodes Number of copies ie. distribute to N nodes Read ie. have R nodes agree Write ie. ack’d by W nodes Buckets have defaults for R, W, DW and RW

Slide 63

Slide 63 text

The N to the R to the W to the DW and the RW Read-write ie. persistently deleted by RW nodes Durable write ie. persistently written by DW nodes Number of copies ie. distribute to N nodes Read ie. have R nodes agree Write ie. ack’d by W nodes Buckets have defaults for R, W, DW and RW

Slide 64

Slide 64 text

The Quorum ([node count] / 2) + 1 quorum = majority quorum = valid value for r, w, dw, rw

Slide 65

Slide 65 text

Bucket properties http://127.0.0.1:8088/riak/meetup/ Buckets have default values Clients can override per request

Slide 66

Slide 66 text

Conflicts

Slide 67

Slide 67 text

When worlds collide Conflicts

Slide 68

Slide 68 text

I don’t care! allow_mult = false last_write_wins = true Bucket properties allow_mult = false -> siblings never returned last_write_wins = true

Slide 69

Slide 69 text

I do care! allow_mult = true

Slide 70

Slide 70 text

I do care! • Resolve conflicts in application logic • Conflicts exposed as siblings beneath a key • Response is HTTP 300 Multiple Choice • Served as mime/multipart

Slide 71

Slide 71 text

Example HTTP/1.1 300 Multiple Choices X-Riak-Vclock: a85hYGBgzWDKBVIsrLnh3BlMiYx5rAymfeeO8EGFWRLl30G== Content-Type: multipart/mixed; boundary=ZZ3eyjUllBi7GXRRMJsUublFxjn Content-Length: 368 --ZZ3eyjUllBi7GXRRMJsUublFxjn Content-Type: text/plain Tuesday --ZZ3eyjUllBi7GXRRMJsUublFxjn Content-Type: text/plain Thursday --ZZ3eyjUllBi7GXRRMJsUublFxjn--

Slide 72

Slide 72 text

What is Riak and what’s the agenda? Decentralized key-value store A database ideally suited for web applications A flexible map/reduce engine

Slide 73

Slide 73 text

Map / Reduce • Javascript or Erlang • Exposed in the HTTP API

Slide 74

Slide 74 text

Map / Reduce count words function(v) { var words = v.values[0].data.toLowerCase().match('\w*','g'); var counts = []; for(var word in words) if (words[word] != '') { var count = {}; count[words[word]] = 1; counts.push(count); } return counts; }

Slide 75

Slide 75 text

Map / Reduce count words function(values) { var result = {}; for (var value in values) { for(var word in values[value]) { if (word in result) result[word] += values[value][word]; else result[word] = values[value][word]; } } return [result]; }

Slide 76

Slide 76 text

Map & Reduce count words {"inputs":"bucket", "query":[{"map":{"language":"javascript", "source":"function(v) { var words = v.values[0].data.toLowerCase().match(/ \w*/g); var counts = []; for(var word in words) if (words[word] != '') { var count = {}; count[words[word]] = 1; counts.push(count); } return counts; }"}},{"reduce":{"language":"javascript", "source":"function(values) { var result = {}; for (var value in values) { for(var word in values [value]) { if (word in result) result[word] += values[value][word]; else result[word] = values[value][word]; } } return [result]; }"}}]} Put this in your POST request and let Riak smoke it

Slide 77

Slide 77 text

Map & Reduce count words {"inputs":"bucket", "query":[{"map":{"language":"javascript", "source":"function(v) { var words = v.values[0].data.toLowerCase().match(/ \w*/g); var counts = []; for(var word in words) if (words[word] != '') { var count = {}; count[words[word]] = 1; counts.push(count); } return counts; }"}},{"reduce":{"language":"javascript", "source":"function(values) { var result = {}; for (var value in values) { for(var word in values [value]) { if (word in result) result[word] += values[value][word]; else result[word] = values[value][word]; } } return [result]; }"}}]} function(v) { var words = v.values[0].data.toLowerCase().match('\w*','g'); var counts = []; for(var word in words) if (words[word] != '') { var count = {}; count[words[word]] = 1; counts.push(count); } return counts; }

Slide 78

Slide 78 text

Map & Reduce count words {"inputs":"bucket", "query":[{"map":{"language":"javascript", "source":"function(v) { var words = v.values[0].data.toLowerCase().match(/ \w*/g); var counts = []; for(var word in words) if (words[word] != '') { var count = {}; count[words[word]] = 1; counts.push(count); } return counts; }"}},{"reduce":{"language":"javascript", "source":"function(values) { var result = {}; for (var value in values) { for(var word in values [value]) { if (word in result) result[word] += values[value][word]; else result[word] = values[value][word]; } } return [result]; }"}}]} function(values) { var result = {}; for (var value in values) { for(var word in values[value]) { if (word in result) result[word] += values[value][word]; else result[word] = values[value][word]; } } return [result]; }

Slide 79

Slide 79 text

Map / Reduce Demo

Slide 80

Slide 80 text

Links • Non-enforced • Traversable Link walking uses M/R behind the scenes

Slide 81

Slide 81 text

Links Demo Link walking uses M/R behind the scenes

Slide 82

Slide 82 text

The whole enchilada Erlang / OTP Riak Key/Value Store Riak Core Riak Search HTTP API Luwak Partitioning (consistent hashing, hinted handoff) Membership management leave/join Work distribution Cluster state gossip protocol Bitcask InnoDB DETS ETS Balanced trees File system LRU

Slide 83

Slide 83 text

Try it downloads.basho.com brew install riak Web admin @ github.com/gmaurice/Riaktive

Slide 84

Slide 84 text

Resources Riak Fast Track @ wiki.basho.com #riak @ freenode github.com/basho/

Slide 85

Slide 85 text

Thanks for listening Mårten Gustafson @martengustafson http://marten.gustafson.pp.se/ [email protected]