Not Only SQL

Not Only SQL

NoSQL overview with Riak and CouchDB details

I was invited by my former employer Qbranch to give an presentation on NoSQL in general and choose to also talk about Riak and CouchDB in particular.

Approx 60 minutes.

A204e1fe2002bc6d087391759c3dfab0?s=128

Mårten Gustafson

April 15, 2010
Tweet

Transcript

  1. Not only SQL Mårten Gustafson http://marten.gustafson.pp.se/ Qbranch CODE tech-meet @

    2010-04-14 Thursday, April 15, 2010
  2. What? “NoSQL is a movement promoting a loosely defined class

    of non-relational data stores that break with a long history of relational databases” - Wikipedia Thursday, April 15, 2010
  3. What? “NoSQL is a movement promoting a loosely defined class

    of non-relational data stores that break with a long history of relational databases” - Wikipedia Not a single technique Not a single type of data Not a single type of use case Thursday, April 15, 2010
  4. Why? • Non-relational • Schema-less • “Easily” scalable • REST/JSON

    API = web friendly Thursday, April 15, 2010
  5. What’s out there? Storage type License Implemented in Amazon Dynamo

    Key/Value n/a ? Cassandra Columnfamily ASL 2.0 Java CouchDB Document ASL 2.0 Erlang Dynomite Key/Value BSD/MIT-style Erlang HBase Columnfamily ASL 2.0 Java MongoDB Document AGPL v3.0 C++ Neo4J Graph AGPL v3.0 / Comm Java Riak Key/Value ASL 2.0 Erlang Redis Key/Value BSD/MIT-style C Scalaris Key/Value ASL 2.0 Erlang Tokyo Cabinet Key/Value LGPL C Voldemort Key/Value ASL 2.0 Java Thursday, April 15, 2010
  6. Distribution • Master / Slave • Master / Slave(s) •

    Masterless (Master / Master) Thursday, April 15, 2010
  7. Distribution Masterless Master/Slave Hot standby Amazon Dynamo X Cassandra X

    CouchDB X Dynomite X HBase ? MongoDB X X Neo4J* Riak X Redis X Scalaris X Tokyo Cabinet Voldemort X * Neo4J HA coming “soon” Thursday, April 15, 2010
  8. Common factor “...of the web...” Of the who?! Thursday, April

    15, 2010
  9. Of the web “...Django may be built for the Web,

    but CouchDB is built of the Web. I’ve never seen software that so completely embraces the philosophies behind HTTP. CouchDB makes Django look old-school in the same way that Django makes ASP look outdated” - http://jacobian.org/writing/of-the-web/ Thursday, April 15, 2010
  10. Of the web “...CouchDB may succeeded, and it may fail;

    who knows. I’m sure of one thing, though — this is what the software of the future looks like” - http://jacobian.org/writing/of-the-web/ Thursday, April 15, 2010
  11. So freakin’ what?! All your webish skillz and tools apply...

    Thursday, April 15, 2010
  12. So freakin’ what?! All your webish skillz and tools apply...

    proxies load balancers caches HTTP client libs (etag, if-modified-since, etc) language-, platform- and OS-neutral MIME / Content-Type Thursday, April 15, 2010
  13. These guys can just suck it HTTP/REST is integration that

    works (YMMV) Thursday, April 15, 2010
  14. Buckle Up Dorothy. Cause' Kansas, Is Going Bye-Bye Thursday, April

    15, 2010
  15. I got keys but no locks Thursday, April 15, 2010

  16. Riak Decentralized key-value store A flexible map/reduce engine HTTP/JSON API

    A database ideally suited for Web applications Thursday, April 15, 2010
  17. The Ring Thursday, April 15, 2010

  18. The Ring ring size = 12 1 2 3 4

    5 6 7 8 9 10 11 12 Thursday, April 15, 2010
  19. The Ring One Ring size to rule them all, One

    Ring size to find them, One Ring size to bring them all and in the cluster bind them... Thursday, April 15, 2010
  20. Consistent Hashing Store/Save (PUT) Thursday, April 15, 2010

  21. Consistent Hashing Store/Save (PUT) Thursday, April 15, 2010

  22. Consistent Hashing Read (GET) “I want “ is answered by:

    where is on the ring? Thursday, April 15, 2010
  23. Consistent Hashing Read (GET) “I want “ is answered by:

    where is on the ring? Thursday, April 15, 2010
  24. Cluster Instance A Instance B Instance C ring size =

    12 instances = 3 ring size / nodes = ~slices per instances Thursday, April 15, 2010
  25. Cluster Instance A Instance B Instance C ring size =

    12 instances = 3 ring size / nodes = ~slices per instances Thursday, April 15, 2010
  26. Cluster - Read (GET) Instance A Instance B Instance C

    Thursday, April 15, 2010
  27. Cluster - Read (GET) Instance A Instance B Instance C

    I can haz ? Hm, lives in a slice of the ring owned by instance C. Thursday, April 15, 2010
  28. Cluster - Read (GET) Instance A Instance B Instance C

    Okidoki, now where’s he...a yeah in my fourth slice I can haz ? Hey C! I need Thursday, April 15, 2010
  29. Cluster - Read (GET) Instance A Instance B Instance C

    Here ya go I can haz ? Cheers! Thursday, April 15, 2010
  30. Riak “stuff” Thursday, April 15, 2010

  31. Riak “stuff” Bucket Container/keyspace. Determines number of replicas for its

    contents Thursday, April 15, 2010
  32. Riak “stuff” Bucket Consistent Hashing Key hashing technique used to

    distribute keys on the Container/keyspace. Determines number of replicas for its contents Thursday, April 15, 2010
  33. Riak “stuff” Bucket Consistent Hashing Gossiping Shares state, bucket and

    ring knowledge in the cluster Key hashing technique used to distribute keys on the Container/keyspace. Determines number of replicas for its contents Thursday, April 15, 2010
  34. Riak “stuff” Bucket Consistent Hashing Gossiping Hinted Handoff Shares state,

    bucket and ring knowledge in the cluster Key hashing technique used to distribute keys on the Container/keyspace. Determines number of replicas for its contents Covering for a failed “neighbor” node while gone Thursday, April 15, 2010
  35. Riak “stuff” Bucket Consistent Hashing Gossiping Hinted Handoff Links Shares

    state, bucket and ring knowledge in the cluster Allows retrieval of “weakly” linked objects Key hashing technique used to distribute keys on the Container/keyspace. Determines number of replicas for its contents Covering for a failed “neighbor” node while gone Thursday, April 15, 2010
  36. Riak “stuff” Bucket Consistent Hashing Gossiping Hinted Handoff Links Merkle

    Tree Shares state, bucket and ring knowledge in the cluster Allows retrieval of “weakly” linked objects Data structure for efficient summary about keys. Gossiped. Key hashing technique used to distribute keys on the Container/keyspace. Determines number of replicas for its contents Covering for a failed “neighbor” node while gone Thursday, April 15, 2010
  37. Riak “stuff” Bucket Consistent Hashing Gossiping Hinted Handoff Links Merkle

    Tree Node Shares state, bucket and ring knowledge in the cluster Allows retrieval of “weakly” linked objects Data structure for efficient summary about keys. Gossiped. One server. Runs vnodes which claims partitions. Key hashing technique used to distribute keys on the Container/keyspace. Determines number of replicas for its contents Covering for a failed “neighbor” node while gone Thursday, April 15, 2010
  38. Riak “stuff” Bucket Consistent Hashing Gossiping Hinted Handoff Links Merkle

    Tree Node Partition Shares state, bucket and ring knowledge in the cluster Allows retrieval of “weakly” linked objects One slice (part) of the ring. Data structure for efficient summary about keys. Gossiped. One server. Runs vnodes which claims partitions. Key hashing technique used to distribute keys on the Container/keyspace. Determines number of replicas for its contents Covering for a failed “neighbor” node while gone Thursday, April 15, 2010
  39. Riak “stuff” Bucket Consistent Hashing Gossiping Hinted Handoff Links Merkle

    Tree Node Partition Read Repair Shares state, bucket and ring knowledge in the cluster Allows retrieval of “weakly” linked objects One slice (part) of the ring. Data structure for efficient summary about keys. Gossiped. One server. Runs vnodes which claims partitions. Key hashing technique used to distribute keys on the Auto correction of out-of-date objects Container/keyspace. Determines number of replicas for its contents Covering for a failed “neighbor” node while gone Thursday, April 15, 2010
  40. Riak “stuff” Bucket Consistent Hashing Gossiping Hinted Handoff Links Merkle

    Tree Node Partition Read Repair Replica Shares state, bucket and ring knowledge in the cluster Allows retrieval of “weakly” linked objects One slice (part) of the ring. Data structure for efficient summary about keys. Gossiped. Number of copies of the same object in the cluster One server. Runs vnodes which claims partitions. Key hashing technique used to distribute keys on the Auto correction of out-of-date objects Container/keyspace. Determines number of replicas for its contents Covering for a failed “neighbor” node while gone Thursday, April 15, 2010
  41. Riak “stuff” Bucket Consistent Hashing Gossiping Hinted Handoff Links Merkle

    Tree Node Partition Read Repair Replica Ring Shares state, bucket and ring knowledge in the cluster Allows retrieval of “weakly” linked objects One slice (part) of the ring. Data structure for efficient summary about keys. Gossiped. Number of copies of the same object in the cluster One server. Runs vnodes which claims partitions. Key hashing technique used to distribute keys on the The complete “space”, divided into partitions which are claimed by vnodes Auto correction of out-of-date objects Container/keyspace. Determines number of replicas for its contents Covering for a failed “neighbor” node while gone Thursday, April 15, 2010
  42. Riak “stuff” Bucket Consistent Hashing Gossiping Hinted Handoff Links Merkle

    Tree Node Partition Read Repair Replica Ring Vector Clock Shares state, bucket and ring knowledge in the cluster Allows retrieval of “weakly” linked objects One slice (part) of the ring. Data structure for efficient summary about keys. Gossiped. Number of copies of the same object in the cluster One server. Runs vnodes which claims partitions. Key hashing technique used to distribute keys on the The complete “space”, divided into partitions which are claimed by vnodes Auto correction of out-of-date objects Container/keyspace. Determines number of replicas for its contents Covering for a failed “neighbor” node while gone Version control technique used for objects. Thursday, April 15, 2010
  43. Riak “stuff” Bucket Consistent Hashing Gossiping Hinted Handoff Links Merkle

    Tree Node Partition Read Repair Replica Ring Vector Clock Vnode Shares state, bucket and ring knowledge in the cluster Allows retrieval of “weakly” linked objects Runs in a node and claims one partition in the ring One slice (part) of the ring. Data structure for efficient summary about keys. Gossiped. Number of copies of the same object in the cluster One server. Runs vnodes which claims partitions. Key hashing technique used to distribute keys on the The complete “space”, divided into partitions which are claimed by vnodes Auto correction of out-of-date objects Container/keyspace. Determines number of replicas for its contents Covering for a failed “neighbor” node while gone Version control technique used for objects. Thursday, April 15, 2010
  44. Riak - Takeaways • No single point of failure •

    Choose your levels for: • availability • consistency • partition tolerance Thursday, April 15, 2010
  45. But wait, there’s more... • Binary data + Content-Type =

    whatever • MP3’s, Images, Text, ... • Map/Reduce • Local data, parallel Thursday, April 15, 2010
  46. This slide intentionally left blank Thursday, April 15, 2010

  47. Document Store Relax Thursday, April 15, 2010

  48. CouchDB Document oriented databased Kick ass replication HTTP/JSON API Map/reduce

    view (index) definitions Thursday, April 15, 2010
  49. World view One document == JSON One document == One

    record Many Documents == One database No schema Thursday, April 15, 2010
  50. A document { "_id": "b098445d587b1f347e48e1a79301de02", "_rev": "1-80bfd8302e0f08eec2396c8107cafc19", "platform": { "browser":

    "mozilla", "version": "1.9.1.8" }, "timestamp": 1270131033337 } Key, either you choose it or CouchDB does it for you Revision number Thursday, April 15, 2010
  51. Views Filter Collate Aggregate Thursday, April 15, 2010

  52. Views { "_id": "b098445d587b1f347e48e1a79301de02", "_rev": "1-80bfd8302e0f08eec2396c8107cafc19", "platform": { "browser": "mozilla",

    "version": "1.9.1.8" }, "timestamp": 1270131033337 } + function(doc) { emit(doc.platform.browser, doc.browser.version); } = { "total_rows": 58, "offset": 0, "rows": [ "id": "b098445d587b1f347e48e1a79301de02", "key": "mozilla", "value": "1.9.1.8" ] } Thursday, April 15, 2010
  53. Views Views are stored as an accessible web resource on

    disk and incrementally updated as well as replicated with the database Thursday, April 15, 2010
  54. Replication Peer to peer Online/Offline Conflict detection and resolution Any

    number of nodes Local Remote Thursday, April 15, 2010
  55. Replication Thursday, April 15, 2010

  56. Replication Thursday, April 15, 2010

  57. Replication Thursday, April 15, 2010

  58. Replication Thursday, April 15, 2010

  59. Replication Thursday, April 15, 2010

  60. CouchDB “stuff” Thursday, April 15, 2010

  61. CouchDB “stuff” Append only Hence, won’t corrupt its data files

    Thursday, April 15, 2010
  62. CouchDB “stuff” MVCC Append only Multi version concurrency control. Writers

    do not block readers. Readers do not block Hence, won’t corrupt its data files Thursday, April 15, 2010
  63. CouchDB “stuff” MVCC Append only BDCRR Multi version concurrency control.

    Writers do not block readers. Readers do not block Bi-directional, conflict resolving, replication Hence, won’t corrupt its data files Thursday, April 15, 2010
  64. CouchDB “stuff” MVCC Append only Compaction BDCRR Multi version concurrency

    control. Writers do not block readers. Readers do not block Bi-directional, conflict resolving, replication Hence, won’t corrupt its data files Append only will cause data files to grow. Compaction to the rescue, in the background - for your pleasure. Thursday, April 15, 2010
  65. CouchDB “stuff” MVCC Append only Compaction ACID BDCRR Multi version

    concurrency control. Writers do not block readers. Readers do not block Bi-directional, conflict resolving, replication Hence, won’t corrupt its data files Awesome, Cool, Impressive, Dope Append only will cause data files to grow. Compaction to the rescue, in the background - for your pleasure. Thursday, April 15, 2010
  66. CouchDB - Takeaways • Kick ass replication • Views are

    fast • Can host and serve complete webapps Thursday, April 15, 2010
  67. Outro • Test one or more NoSQL thingys • Get

    familiar with Brewers CAP theorem • Get familiar with the Dynamo paper Thursday, April 15, 2010
  68. Over and out. Mårten Gustafson @martengustafson http://marten.gustafson.pp.se/ marten.gustafson@gmail.com Thursday, April

    15, 2010