Slide 1

Slide 1 text

Introduction to Riak montreal.rb • June 16, 2015 Florencia Herra-Vega

Slide 2

Slide 2 text

Overview • 1. Networks and databases and drama • 2. Riak architecture & usage • 3. CRDTs • 4. Distributed data modelling • 5. Etc / Questions

Slide 3

Slide 3 text

What is Riak? A decentralized key-value database with high availability & fault tolerance.

Slide 4

Slide 4 text

What is Riak? A decentralized key-value database with high availability & fault tolerance. The end.

Slide 5

Slide 5 text

What is Riak? A decentralized key-value database with high availability & fault tolerance. What do these things mean??

Slide 6

Slide 6 text

A database! SELECT * FROM articles WHERE author_name = ‘flohdot’ Relational bliss.

Slide 7

Slide 7 text

Key-value store A gigantic associative array. username flohdot location Montreal favourite_author Octavia Butler last_book_read The Savage Detectives

Slide 8

Slide 8 text

Key-value store A gigantic associative array. flohdot_location Montreal flohdot_favourite_author Octavia Butler flohdot_last_book_read The Savage Detectives scifidude99_location Toronto scifidude99_favourite_author Terry Pratchett scifidude99_last_book_read Kraken

Slide 9

Slide 9 text

Distributed DBs Because your data… … is too big for one disk? … has too many transactions for one node? … cannot have a simple point of failure? … needs backups? … all of the above?

Slide 10

Slide 10 text

Distributed DBs Two approaches: 1. partition 2. replicate

Slide 11

Slide 11 text

Distributed DBs Two approaches: 1. partition 2. replicate (3. BOTH)

Slide 12

Slide 12 text

Networks suck $ curl -XPOST http://your_database/important_key -d ‘{ “important_info”: “something you care about”}’ 500 NOPE … then… $ curl http://your_database/important_key 503 MAYBE LATER

Slide 13

Slide 13 text

Networks suck What even happened? Data… (A) took the scenic route through a wormhole? (B) was eaten by monsters, never to be seen again? (C) returned from the underworld with an evil twin?

Slide 14

Slide 14 text

Networks suck user1 on nodeA at 8:05am $ curl -XPOST -d ‘{ “important_info”: “some initial data”}’ http://your_database/important_key user2 on nodeB at 8:06am $ curl -XPOST -d ‘{ “important_info”: “modified data”}’ http://your_database/important_key user1 on nodeA at 8:07am $ curl http://your_database/important_key { “important_info”: “some initial data”} :(

Slide 15

Slide 15 text

Networks suck user1 on nodeA at 8:05am $ curl -XPOST -d ‘{ “important_info”: “something user1 cares about”}’ http://your_database/important_key user2 on nodeB at 8:05am $ curl -XPOST -d ‘{ “important_info”: “total nonsense HAHA”}’ http://your_database/important_key ????

Slide 16

Slide 16 text

Hardware sucks … sometimes hardware fails completely!

Slide 17

Slide 17 text

Hardware sucks … sometimes hardware fails completely! … or SOME hardware fails. … or you need to add or replace hardware.

Slide 18

Slide 18 text

Hardware sucks … sometimes hardware fails completely! … or SOME hardware fails. … or you need to add or replace hardware. If you’re using a distributed database system, it needs to be fault/partition tolerant.

Slide 19

Slide 19 text

The CAP theorem partition tolerance availability consistency AP CA CP At any moment in time, a system cannot be consistent, available, AND partition tolerant.

Slide 20

Slide 20 text

The CAP theorem partition tolerance availability consistency AP CA CP X Distributed systems must be partition tolerant.

Slide 21

Slide 21 text

The CAP theorem partition tolerance availability consistency AP CA CP X Lock during a write, make sure it propagates, then allow reads.

Slide 22

Slide 22 text

The CAP theorem partition tolerance availability consistency AP CA CP X Reads and writes will always succeed, but the data you get might not be the same.

Slide 23

Slide 23 text

The CAP theorem partition tolerance availability consistency AP CA CP X Riak is an AP system with eventual consistency.

Slide 24

Slide 24 text

The CAP theorem partition tolerance availability consistency AP CA CP X Riak is an AP system with tunable eventual consistency.

Slide 25

Slide 25 text

So how do I Riak?

Slide 26

Slide 26 text

KV Buckets & Keys (strings) bucket: “books”, key: Value: JSON, plaintext, image, anything up to ~1-2MB.

Slide 27

Slide 27 text

HTTP API $ curl -v -X PUT http://localhost:8091/ buckets/books/keys/1594483299 \ -H “Content-Type: application/json” \ -d ‘{ “title”: “The Brief & Wondrous Life of Oscar Wao”, “author”: “Junot Diaz” }’ & libraries in Ruby, Python, Java, Erlang and more.

Slide 28

Slide 28 text

Choose a backend • Bitcask • LevelDB • Memory

Slide 29

Slide 29 text

The ring riak@node1 $ riak-admin join [email protected] Rinse and repeat.

Slide 30

Slide 30 text

The ring

Slide 31

Slide 31 text

The ring Replication AND partitioning.

Slide 32

Slide 32 text

Partitioning 64 virtual nodes (vnodes) by default…

Slide 33

Slide 33 text

Partitioning 64 virtual nodes (vnodes) by default… … each responsible for 1/64th of the keyspace.

Slide 34

Slide 34 text

Partitioning 64 virtual nodes (vnodes) by default… … each responsible for 1/64th of the keyspace. The keys are hashed with SHA1 (2160 values)… “Harry Potter & the Chamber of Secrets” => 628e87e7ec52e212a7efbc88aaf7dfbf9e314a23

Slide 35

Slide 35 text

Partitioning 64 virtual nodes (vnodes) by default… … each responsible for 1/64th of the keyspace. The keys are hashed with SHA1 (2160 values)… … and the hashed value determines which vnode owns the data. N1 0 to 2154-1 N2 2154 to 2*2154-1 N2 63*2154 to 2160-1 …

Slide 36

Slide 36 text

Vnodes on physical nodes Image from 7 Databases in 7 Weeks

Slide 37

Slide 37 text

Replication N number of (physical) nodes a write eventually replicates to W number of nodes that must be successfully written to before a successful response R number of nodes required to read a value successfully R + W > N

Slide 38

Slide 38 text

Replication For a cluster with 5 physical nodes, and N = 3. W = 5 R = 1 Slow writes, fast reads.

Slide 39

Slide 39 text

Replication For a cluster with 5 physical nodes, and N = 3. W = 1 R = 5 Fast writes, reads might have conflicts.

Slide 40

Slide 40 text

Replication For a cluster with 5 physical nodes, and N = 3. W = 2 R = 2 “quorum” more than half the replicated nodes, or (floor(N/2) + 1)

Slide 41

Slide 41 text

Conflict resolution allow_mult & last_write_wins

Slide 42

Slide 42 text

Conflict resolution allow_mult & last_write_wins true

Slide 43

Slide 43 text

Conflict resolution allow_mult & last_write_wins false true timestamp-based resolution older values discarded

Slide 44

Slide 44 text

Conflict resolution allow_mult & last_write_wins false false use vector clocks for causal context keep options

Slide 45

Slide 45 text

Conflict resolution allow_mult & last_write_wins true (false) use vector clocks for causal context keep options your application resolves conflicts

Slide 46

Slide 46 text

CRDTs

Slide 47

Slide 47 text

CRDTs Stepping out of Riak again for a minute.

Slide 48

Slide 48 text

CRDT Conflict-free replicated data-type Guarantee eventual consistency.

Slide 49

Slide 49 text

CALM An eventually consistent system only grows in one direction. “Consistency As Logical Monotonicity” Operations should be: Associative Commutative Idempotent

Slide 50

Slide 50 text

Set Union (1 ∪ 2) ∪ 3 = 1 ∪ (2 ∪ 3)

Slide 51

Slide 51 text

Set Union (1 ∪ 2) ∪ 3 = 1 ∪ (2 ∪ 3) 1 ∪ 2 = 2 ∪ 1

Slide 52

Slide 52 text

Set Union (1 ∪ 2) ∪ 3 = 1 ∪ (2 ∪ 3) 1 ∪ 2 = 2 ∪ 1 1 ∪ 1 = 1

Slide 53

Slide 53 text

CRDT in practice listed_2015_08_14 “books favourited (by any user) today”

Slide 54

Slide 54 text

CRDT in practice listed_2015_08_14 “books favourited (by any user) today” “A Wrinkle In Time” N1 N2 N3

Slide 55

Slide 55 text

CRDT in practice listed_2015_08_14 “books favourited (by any user) today” {“A Wrinkle In Time” } “A Wrinkle In Time” N1 N2 N3

Slide 56

Slide 56 text

CRDT in practice listed_2015_08_14 “books favourited (by any user) today” {“A Wrinkle In Time” } {“A Wrinkle In Time” } {“A Wrinkle In Time” } “A Wrinkle In Time” N1 N2 N3 replicate! replicate!

Slide 57

Slide 57 text

Set CRDT in practice listed_2015_08_14 “books favourited (by any user) today” {“A Wrinkle In Time”, “Where the Wild Things Are” } {“A Wrinkle In Time” } {“A Wrinkle In Time” } “Where the Wild Things Are” N1 N2 N3

Slide 58

Slide 58 text

CRDT in practice listed_2015_08_14 “books favourited (by any user) today” {“A Wrinkle In Time”, “Where the Wild Things Are” } {“A Wrinkle In Time”, “Where the Wild Things Are” } {“A Wrinkle In Time” } “Where the Wild Things Are” N1 N2 N3 replicate!

Slide 59

Slide 59 text

listed_2015_08_14 “books favourited (by any user) today” CRDT in practice {“A Wrinkle In Time”, “Where the Wild Things Are” } {“A Wrinkle In Time”, “Where the Wild Things Are” } {“A Wrinkle In Time” } “Where the Wild Things Are” N1 N2 N3 replicate! replicate!

Slide 60

Slide 60 text

listed_2015_08_14 “books favourited (by any user) today” CRDT in practice {“A Wrinkle In Time”, “Where the Wild Things Are” } {“A Wrinkle In Time”, “Where the Wild Things Are” } {“A Wrinkle In Time” } “Where the Wild Things Are” N1 N2 N3 replicate! replicate! X

Slide 61

Slide 61 text

Read listed_2015_08_14 Set Union {“A Wrinkle In Time”, “Where the Wild Things Are” } {“A Wrinkle In Time”, “Where the Wild Things Are” } {“A Wrinkle In Time” } N1 N2 N3 ∪ ∪ = {“A Wrinkle In Time”, “Where the Wild Things Are” }

Slide 62

Slide 62 text

Riak CRDTs Riak Datatypes (Sets, Counters, Maps) An API with familiar abstractions for the underlying math and magic of CRDTs.

Slide 63

Slide 63 text

Riak CRDTs Set CRDT favourite_books = client.bucket(‘favourite_books') my_faves_set = Riak::Crdt::Set.new(favourite_books, 'flohdot', ‘sets') my_faves_set.add(‘Thinking, Fast and Slow’) my_faves_set.add(‘Oryx and Crake’) my_faves_set.remove(‘Oryx and Crake’)

Slide 64

Slide 64 text

Riak CRDTs Counter Datatype books_published_day = client.bucket(‘books_published_day’) today_count = Riak::Crdt::Counter.new(books_published_day, ‘2015_06_16’, ‘counters') counter.increment counter.increment(5) counter.decrement(2)

Slide 65

Slide 65 text

Riak CRDTs Counter Datatype user_profiles = client.bucket(‘user_profiles’) flohdot = Riak::Crdt::Map.new(user_profiles, ‘2015_06_16’, ‘maps') flohdot.batch do |m| m.registers['first_name'] = ‘Florencia' # string m.flags[‘pro_user'] = true # boolean m.counters[‘logins’].increment end

Slide 66

Slide 66 text

Riak CRDTs Counter Datatype user_profiles = client.bucket(‘user_profiles’) flohdot = Riak::Crdt::Map.new(user_profiles, ‘2015_06_16’, ‘maps') flohdot.batch do |m| m.registers['first_name'] = ‘Florencia' # string m.flags[‘pro_user'] = true # boolean m.counters[‘logins’].increment # yo dawg i herd you like maps so i put some maps in your maps end

Slide 67

Slide 67 text

Data Modeling

Slide 68

Slide 68 text

Riak data design • Are your objects immutable? • What kind of consistency/accuracy is required? • Do you need manual conflict resolution, or will CRDTs do? • Do you need search?

Slide 69

Slide 69 text

Library data { “title” : “A Game of Thrones”, “author” : “George RR Martin”, “year” : “1996“ … }

Slide 70

Slide 70 text

User-curated lists “Summer book club” “NYT Bestsellers 2015” “Books I couldn’t finish”

Slide 71

Slide 71 text

Leverage namespacing • Finding objects is fast. • No hard limit or performance impact on number of buckets. • Extra namespace: bucket types (multitenancy?)

Slide 72

Slide 72 text

More goodies • Search with solar • map/reduce • Multi-datacenter replication

Slide 73

Slide 73 text

Sources/Learn more • A Little Riak Book by Eric Redmond *give me your name if you want the e-book • 7 Databases in 7 Weeks by Eric Redmond & Jim R Wilson • Riak docs http://docs.basho.com/riak/latest • Hector Castro @ Big Ruby 2014 https://www.youtube.com/watch? v=-_3Us7Ystyg#aid=P-4heI_bFwo • Peter Bourgon @ Strangeloop 2014 https://www.youtube.com/ watch?v=em9zLzM8O7c • Kyle Kingsbury’s Jepsen blog series https://aphyr.com/tags/jepsen

Slide 74

Slide 74 text

Thanks! Get in touch! @flohdot (Twitter, Github, etc) peerio.com • @peerio • github.com/PeerioTechnologies Hiring… soon! & we use Riak.