Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Intro to Riak at montreal.rb

Ba0668208a6e892c6849d75e083c4b41?s=47 FHV
June 16, 2015

Intro to Riak at montreal.rb

An introduction to Riak with some background on the problems of distributed databases.

Ba0668208a6e892c6849d75e083c4b41?s=128

FHV

June 16, 2015
Tweet

Transcript

  1. Introduction to Riak montreal.rb • June 16, 2015 Florencia Herra-Vega

  2. Overview • 1. Networks and databases and drama • 2.

    Riak architecture & usage • 3. CRDTs • 4. Distributed data modelling • 5. Etc / Questions
  3. What is Riak? A decentralized key-value database with high availability

    & fault tolerance.
  4. What is Riak? A decentralized key-value database with high availability

    & fault tolerance. The end.
  5. What is Riak? A decentralized key-value database with high availability

    & fault tolerance. What do these things mean??
  6. A database! SELECT * FROM articles WHERE author_name = ‘flohdot’

    Relational bliss.
  7. Key-value store A gigantic associative array. username flohdot location Montreal

    favourite_author Octavia Butler last_book_read The Savage Detectives
  8. Key-value store A gigantic associative array. flohdot_location Montreal flohdot_favourite_author Octavia

    Butler flohdot_last_book_read The Savage Detectives scifidude99_location Toronto scifidude99_favourite_author Terry Pratchett scifidude99_last_book_read Kraken
  9. Distributed DBs Because your data… … is too big for

    one disk? … has too many transactions for one node? … cannot have a simple point of failure? … needs backups? … all of the above?
  10. Distributed DBs Two approaches: 1. partition 2. replicate

  11. Distributed DBs Two approaches: 1. partition 2. replicate (3. BOTH)

  12. Networks suck $ curl -XPOST http://your_database/important_key -d ‘{ “important_info”: “something

    you care about”}’ 500 NOPE … then… $ curl http://your_database/important_key 503 MAYBE LATER
  13. Networks suck What even happened? Data… (A) took the scenic

    route through a wormhole? (B) was eaten by monsters, never to be seen again? (C) returned from the underworld with an evil twin?
  14. Networks suck user1 on nodeA at 8:05am $ curl -XPOST

    -d ‘{ “important_info”: “some initial data”}’ http://your_database/important_key user2 on nodeB at 8:06am $ curl -XPOST -d ‘{ “important_info”: “modified data”}’ http://your_database/important_key user1 on nodeA at 8:07am $ curl http://your_database/important_key { “important_info”: “some initial data”} :(
  15. Networks suck user1 on nodeA at 8:05am $ curl -XPOST

    -d ‘{ “important_info”: “something user1 cares about”}’ http://your_database/important_key user2 on nodeB at 8:05am $ curl -XPOST -d ‘{ “important_info”: “total nonsense HAHA”}’ http://your_database/important_key ????
  16. Hardware sucks … sometimes hardware fails completely!

  17. Hardware sucks … sometimes hardware fails completely! … or SOME

    hardware fails. … or you need to add or replace hardware.
  18. Hardware sucks … sometimes hardware fails completely! … or SOME

    hardware fails. … or you need to add or replace hardware. If you’re using a distributed database system, it needs to be fault/partition tolerant.
  19. The CAP theorem partition tolerance availability consistency AP CA CP

    At any moment in time, a system cannot be consistent, available, AND partition tolerant.
  20. The CAP theorem partition tolerance availability consistency AP CA CP

    X Distributed systems must be partition tolerant.
  21. The CAP theorem partition tolerance availability consistency AP CA CP

    X Lock during a write, make sure it propagates, then allow reads.
  22. The CAP theorem partition tolerance availability consistency AP CA CP

    X Reads and writes will always succeed, but the data you get might not be the same.
  23. The CAP theorem partition tolerance availability consistency AP CA CP

    X Riak is an AP system with eventual consistency.
  24. The CAP theorem partition tolerance availability consistency AP CA CP

    X Riak is an AP system with tunable eventual consistency.
  25. So how do I Riak?

  26. KV Buckets & Keys (strings) bucket: “books”, key: <ISBN of

    a book> Value: JSON, plaintext, image, anything up to ~1-2MB.
  27. HTTP API $ curl -v -X PUT http://localhost:8091/ buckets/books/keys/1594483299 \

    -H “Content-Type: application/json” \ -d ‘{ “title”: “The Brief & Wondrous Life of Oscar Wao”, “author”: “Junot Diaz” }’ & libraries in Ruby, Python, Java, Erlang and more.
  28. Choose a backend • Bitcask • LevelDB • Memory

  29. The ring riak@node1 $ riak-admin join node2@10.1.1.1 Rinse and repeat.

  30. The ring

  31. The ring Replication AND partitioning.

  32. Partitioning 64 virtual nodes (vnodes) by default…

  33. Partitioning 64 virtual nodes (vnodes) by default… … each responsible

    for 1/64th of the keyspace.
  34. Partitioning 64 virtual nodes (vnodes) by default… … each responsible

    for 1/64th of the keyspace. The keys are hashed with SHA1 (2160 values)… “Harry Potter & the Chamber of Secrets” => 628e87e7ec52e212a7efbc88aaf7dfbf9e314a23
  35. Partitioning 64 virtual nodes (vnodes) by default… … each responsible

    for 1/64th of the keyspace. The keys are hashed with SHA1 (2160 values)… … and the hashed value determines which vnode owns the data. N1 0 to 2154-1 N2 2154 to 2*2154-1 N2 63*2154 to 2160-1 …
  36. Vnodes on physical nodes Image from 7 Databases in 7

    Weeks
  37. Replication N number of (physical) nodes a write eventually replicates

    to W number of nodes that must be successfully written to before a successful response R number of nodes required to read a value successfully R + W > N
  38. Replication For a cluster with 5 physical nodes, and N

    = 3. W = 5 R = 1 Slow writes, fast reads.
  39. Replication For a cluster with 5 physical nodes, and N

    = 3. W = 1 R = 5 Fast writes, reads might have conflicts.
  40. Replication For a cluster with 5 physical nodes, and N

    = 3. W = 2 R = 2 “quorum” more than half the replicated nodes, or (floor(N/2) + 1)
  41. Conflict resolution allow_mult & last_write_wins

  42. Conflict resolution allow_mult & last_write_wins true

  43. Conflict resolution allow_mult & last_write_wins false true timestamp-based resolution older

    values discarded
  44. Conflict resolution allow_mult & last_write_wins false false use vector clocks

    for causal context keep options
  45. Conflict resolution allow_mult & last_write_wins true (false) use vector clocks

    for causal context keep options your application resolves conflicts
  46. CRDTs

  47. CRDTs Stepping out of Riak again for a minute.

  48. CRDT Conflict-free replicated data-type Guarantee eventual consistency.

  49. CALM An eventually consistent system only grows in one direction.

    “Consistency As Logical Monotonicity” Operations should be: Associative Commutative Idempotent
  50. Set Union (1 ∪ 2) ∪ 3 = 1 ∪

    (2 ∪ 3)
  51. Set Union (1 ∪ 2) ∪ 3 = 1 ∪

    (2 ∪ 3) 1 ∪ 2 = 2 ∪ 1
  52. Set Union (1 ∪ 2) ∪ 3 = 1 ∪

    (2 ∪ 3) 1 ∪ 2 = 2 ∪ 1 1 ∪ 1 = 1
  53. CRDT in practice listed_2015_08_14 “books favourited (by any user) today”

  54. CRDT in practice listed_2015_08_14 “books favourited (by any user) today”

    “A Wrinkle In Time” N1 N2 N3
  55. CRDT in practice listed_2015_08_14 “books favourited (by any user) today”

    {“A Wrinkle In Time” } “A Wrinkle In Time” N1 N2 N3
  56. CRDT in practice listed_2015_08_14 “books favourited (by any user) today”

    {“A Wrinkle In Time” } {“A Wrinkle In Time” } {“A Wrinkle In Time” } “A Wrinkle In Time” N1 N2 N3 replicate! replicate!
  57. Set CRDT in practice listed_2015_08_14 “books favourited (by any user)

    today” {“A Wrinkle In Time”, “Where the Wild Things Are” } {“A Wrinkle In Time” } {“A Wrinkle In Time” } “Where the Wild Things Are” N1 N2 N3
  58. CRDT in practice listed_2015_08_14 “books favourited (by any user) today”

    {“A Wrinkle In Time”, “Where the Wild Things Are” } {“A Wrinkle In Time”, “Where the Wild Things Are” } {“A Wrinkle In Time” } “Where the Wild Things Are” N1 N2 N3 replicate!
  59. listed_2015_08_14 “books favourited (by any user) today” CRDT in practice

    {“A Wrinkle In Time”, “Where the Wild Things Are” } {“A Wrinkle In Time”, “Where the Wild Things Are” } {“A Wrinkle In Time” } “Where the Wild Things Are” N1 N2 N3 replicate! replicate!
  60. listed_2015_08_14 “books favourited (by any user) today” CRDT in practice

    {“A Wrinkle In Time”, “Where the Wild Things Are” } {“A Wrinkle In Time”, “Where the Wild Things Are” } {“A Wrinkle In Time” } “Where the Wild Things Are” N1 N2 N3 replicate! replicate! X
  61. Read listed_2015_08_14 Set Union {“A Wrinkle In Time”, “Where the

    Wild Things Are” } {“A Wrinkle In Time”, “Where the Wild Things Are” } {“A Wrinkle In Time” } N1 N2 N3 ∪ ∪ = {“A Wrinkle In Time”, “Where the Wild Things Are” }
  62. Riak CRDTs Riak Datatypes (Sets, Counters, Maps) An API with

    familiar abstractions for the underlying math and magic of CRDTs.
  63. Riak CRDTs Set CRDT favourite_books = client.bucket(‘favourite_books') my_faves_set = Riak::Crdt::Set.new(favourite_books,

    'flohdot', ‘sets') my_faves_set.add(‘Thinking, Fast and Slow’) my_faves_set.add(‘Oryx and Crake’) my_faves_set.remove(‘Oryx and Crake’)
  64. Riak CRDTs Counter Datatype books_published_day = client.bucket(‘books_published_day’) today_count = Riak::Crdt::Counter.new(books_published_day,

    ‘2015_06_16’, ‘counters') counter.increment counter.increment(5) counter.decrement(2)
  65. Riak CRDTs Counter Datatype user_profiles = client.bucket(‘user_profiles’) flohdot = Riak::Crdt::Map.new(user_profiles,

    ‘2015_06_16’, ‘maps') flohdot.batch do |m| m.registers['first_name'] = ‘Florencia' # string m.flags[‘pro_user'] = true # boolean m.counters[‘logins’].increment end
  66. Riak CRDTs Counter Datatype user_profiles = client.bucket(‘user_profiles’) flohdot = Riak::Crdt::Map.new(user_profiles,

    ‘2015_06_16’, ‘maps') flohdot.batch do |m| m.registers['first_name'] = ‘Florencia' # string m.flags[‘pro_user'] = true # boolean m.counters[‘logins’].increment # yo dawg i herd you like maps so i put some maps in your maps end
  67. Data Modeling

  68. Riak data design • Are your objects immutable? • What

    kind of consistency/accuracy is required? • Do you need manual conflict resolution, or will CRDTs do? • Do you need search?
  69. Library data { “title” : “A Game of Thrones”, “author”

    : “George RR Martin”, “year” : “1996“ … }
  70. User-curated lists “Summer book club” “NYT Bestsellers 2015” “Books I

    couldn’t finish”
  71. Leverage namespacing • Finding objects is fast. • No hard

    limit or performance impact on number of buckets. • Extra namespace: bucket types (multitenancy?)
  72. More goodies • Search with solar • map/reduce • Multi-datacenter

    replication
  73. Sources/Learn more • A Little Riak Book by Eric Redmond

    *give me your name if you want the e-book • 7 Databases in 7 Weeks by Eric Redmond & Jim R Wilson • Riak docs http://docs.basho.com/riak/latest • Hector Castro @ Big Ruby 2014 https://www.youtube.com/watch? v=-_3Us7Ystyg#aid=P-4heI_bFwo • Peter Bourgon @ Strangeloop 2014 https://www.youtube.com/ watch?v=em9zLzM8O7c • Kyle Kingsbury’s Jepsen blog series https://aphyr.com/tags/jepsen
  74. Thanks! Get in touch! @flohdot (Twitter, Github, etc) peerio.com •

    @peerio • github.com/PeerioTechnologies Hiring… soon! & we use Riak.