Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Not Only SQL

Not Only SQL

NoSQL overview with Riak and CouchDB details

I was invited by my former employer Qbranch to give an presentation on NoSQL in general and choose to also talk about Riak and CouchDB in particular.

Approx 60 minutes.

Mårten Gustafson

April 15, 2010
Tweet

More Decks by Mårten Gustafson

Other Decks in Technology

Transcript

  1. What? “NoSQL is a movement promoting a loosely defined class

    of non-relational data stores that break with a long history of relational databases” - Wikipedia Thursday, April 15, 2010
  2. What? “NoSQL is a movement promoting a loosely defined class

    of non-relational data stores that break with a long history of relational databases” - Wikipedia Not a single technique Not a single type of data Not a single type of use case Thursday, April 15, 2010
  3. What’s out there? Storage type License Implemented in Amazon Dynamo

    Key/Value n/a ? Cassandra Columnfamily ASL 2.0 Java CouchDB Document ASL 2.0 Erlang Dynomite Key/Value BSD/MIT-style Erlang HBase Columnfamily ASL 2.0 Java MongoDB Document AGPL v3.0 C++ Neo4J Graph AGPL v3.0 / Comm Java Riak Key/Value ASL 2.0 Erlang Redis Key/Value BSD/MIT-style C Scalaris Key/Value ASL 2.0 Erlang Tokyo Cabinet Key/Value LGPL C Voldemort Key/Value ASL 2.0 Java Thursday, April 15, 2010
  4. Distribution • Master / Slave • Master / Slave(s) •

    Masterless (Master / Master) Thursday, April 15, 2010
  5. Distribution Masterless Master/Slave Hot standby Amazon Dynamo X Cassandra X

    CouchDB X Dynomite X HBase ? MongoDB X X Neo4J* Riak X Redis X Scalaris X Tokyo Cabinet Voldemort X * Neo4J HA coming “soon” Thursday, April 15, 2010
  6. Of the web “...Django may be built for the Web,

    but CouchDB is built of the Web. I’ve never seen software that so completely embraces the philosophies behind HTTP. CouchDB makes Django look old-school in the same way that Django makes ASP look outdated” - http://jacobian.org/writing/of-the-web/ Thursday, April 15, 2010
  7. Of the web “...CouchDB may succeeded, and it may fail;

    who knows. I’m sure of one thing, though — this is what the software of the future looks like” - http://jacobian.org/writing/of-the-web/ Thursday, April 15, 2010
  8. So freakin’ what?! All your webish skillz and tools apply...

    proxies load balancers caches HTTP client libs (etag, if-modified-since, etc) language-, platform- and OS-neutral MIME / Content-Type Thursday, April 15, 2010
  9. These guys can just suck it HTTP/REST is integration that

    works (YMMV) Thursday, April 15, 2010
  10. Riak Decentralized key-value store A flexible map/reduce engine HTTP/JSON API

    A database ideally suited for Web applications Thursday, April 15, 2010
  11. The Ring ring size = 12 1 2 3 4

    5 6 7 8 9 10 11 12 Thursday, April 15, 2010
  12. The Ring One Ring size to rule them all, One

    Ring size to find them, One Ring size to bring them all and in the cluster bind them... Thursday, April 15, 2010
  13. Consistent Hashing Read (GET) “I want “ is answered by:

    where is on the ring? Thursday, April 15, 2010
  14. Consistent Hashing Read (GET) “I want “ is answered by:

    where is on the ring? Thursday, April 15, 2010
  15. Cluster Instance A Instance B Instance C ring size =

    12 instances = 3 ring size / nodes = ~slices per instances Thursday, April 15, 2010
  16. Cluster Instance A Instance B Instance C ring size =

    12 instances = 3 ring size / nodes = ~slices per instances Thursday, April 15, 2010
  17. Cluster - Read (GET) Instance A Instance B Instance C

    I can haz ? Hm, lives in a slice of the ring owned by instance C. Thursday, April 15, 2010
  18. Cluster - Read (GET) Instance A Instance B Instance C

    Okidoki, now where’s he...a yeah in my fourth slice I can haz ? Hey C! I need Thursday, April 15, 2010
  19. Cluster - Read (GET) Instance A Instance B Instance C

    Here ya go I can haz ? Cheers! Thursday, April 15, 2010
  20. Riak “stuff” Bucket Consistent Hashing Key hashing technique used to

    distribute keys on the Container/keyspace. Determines number of replicas for its contents Thursday, April 15, 2010
  21. Riak “stuff” Bucket Consistent Hashing Gossiping Shares state, bucket and

    ring knowledge in the cluster Key hashing technique used to distribute keys on the Container/keyspace. Determines number of replicas for its contents Thursday, April 15, 2010
  22. Riak “stuff” Bucket Consistent Hashing Gossiping Hinted Handoff Shares state,

    bucket and ring knowledge in the cluster Key hashing technique used to distribute keys on the Container/keyspace. Determines number of replicas for its contents Covering for a failed “neighbor” node while gone Thursday, April 15, 2010
  23. Riak “stuff” Bucket Consistent Hashing Gossiping Hinted Handoff Links Shares

    state, bucket and ring knowledge in the cluster Allows retrieval of “weakly” linked objects Key hashing technique used to distribute keys on the Container/keyspace. Determines number of replicas for its contents Covering for a failed “neighbor” node while gone Thursday, April 15, 2010
  24. Riak “stuff” Bucket Consistent Hashing Gossiping Hinted Handoff Links Merkle

    Tree Shares state, bucket and ring knowledge in the cluster Allows retrieval of “weakly” linked objects Data structure for efficient summary about keys. Gossiped. Key hashing technique used to distribute keys on the Container/keyspace. Determines number of replicas for its contents Covering for a failed “neighbor” node while gone Thursday, April 15, 2010
  25. Riak “stuff” Bucket Consistent Hashing Gossiping Hinted Handoff Links Merkle

    Tree Node Shares state, bucket and ring knowledge in the cluster Allows retrieval of “weakly” linked objects Data structure for efficient summary about keys. Gossiped. One server. Runs vnodes which claims partitions. Key hashing technique used to distribute keys on the Container/keyspace. Determines number of replicas for its contents Covering for a failed “neighbor” node while gone Thursday, April 15, 2010
  26. Riak “stuff” Bucket Consistent Hashing Gossiping Hinted Handoff Links Merkle

    Tree Node Partition Shares state, bucket and ring knowledge in the cluster Allows retrieval of “weakly” linked objects One slice (part) of the ring. Data structure for efficient summary about keys. Gossiped. One server. Runs vnodes which claims partitions. Key hashing technique used to distribute keys on the Container/keyspace. Determines number of replicas for its contents Covering for a failed “neighbor” node while gone Thursday, April 15, 2010
  27. Riak “stuff” Bucket Consistent Hashing Gossiping Hinted Handoff Links Merkle

    Tree Node Partition Read Repair Shares state, bucket and ring knowledge in the cluster Allows retrieval of “weakly” linked objects One slice (part) of the ring. Data structure for efficient summary about keys. Gossiped. One server. Runs vnodes which claims partitions. Key hashing technique used to distribute keys on the Auto correction of out-of-date objects Container/keyspace. Determines number of replicas for its contents Covering for a failed “neighbor” node while gone Thursday, April 15, 2010
  28. Riak “stuff” Bucket Consistent Hashing Gossiping Hinted Handoff Links Merkle

    Tree Node Partition Read Repair Replica Shares state, bucket and ring knowledge in the cluster Allows retrieval of “weakly” linked objects One slice (part) of the ring. Data structure for efficient summary about keys. Gossiped. Number of copies of the same object in the cluster One server. Runs vnodes which claims partitions. Key hashing technique used to distribute keys on the Auto correction of out-of-date objects Container/keyspace. Determines number of replicas for its contents Covering for a failed “neighbor” node while gone Thursday, April 15, 2010
  29. Riak “stuff” Bucket Consistent Hashing Gossiping Hinted Handoff Links Merkle

    Tree Node Partition Read Repair Replica Ring Shares state, bucket and ring knowledge in the cluster Allows retrieval of “weakly” linked objects One slice (part) of the ring. Data structure for efficient summary about keys. Gossiped. Number of copies of the same object in the cluster One server. Runs vnodes which claims partitions. Key hashing technique used to distribute keys on the The complete “space”, divided into partitions which are claimed by vnodes Auto correction of out-of-date objects Container/keyspace. Determines number of replicas for its contents Covering for a failed “neighbor” node while gone Thursday, April 15, 2010
  30. Riak “stuff” Bucket Consistent Hashing Gossiping Hinted Handoff Links Merkle

    Tree Node Partition Read Repair Replica Ring Vector Clock Shares state, bucket and ring knowledge in the cluster Allows retrieval of “weakly” linked objects One slice (part) of the ring. Data structure for efficient summary about keys. Gossiped. Number of copies of the same object in the cluster One server. Runs vnodes which claims partitions. Key hashing technique used to distribute keys on the The complete “space”, divided into partitions which are claimed by vnodes Auto correction of out-of-date objects Container/keyspace. Determines number of replicas for its contents Covering for a failed “neighbor” node while gone Version control technique used for objects. Thursday, April 15, 2010
  31. Riak “stuff” Bucket Consistent Hashing Gossiping Hinted Handoff Links Merkle

    Tree Node Partition Read Repair Replica Ring Vector Clock Vnode Shares state, bucket and ring knowledge in the cluster Allows retrieval of “weakly” linked objects Runs in a node and claims one partition in the ring One slice (part) of the ring. Data structure for efficient summary about keys. Gossiped. Number of copies of the same object in the cluster One server. Runs vnodes which claims partitions. Key hashing technique used to distribute keys on the The complete “space”, divided into partitions which are claimed by vnodes Auto correction of out-of-date objects Container/keyspace. Determines number of replicas for its contents Covering for a failed “neighbor” node while gone Version control technique used for objects. Thursday, April 15, 2010
  32. Riak - Takeaways • No single point of failure •

    Choose your levels for: • availability • consistency • partition tolerance Thursday, April 15, 2010
  33. But wait, there’s more... • Binary data + Content-Type =

    whatever • MP3’s, Images, Text, ... • Map/Reduce • Local data, parallel Thursday, April 15, 2010
  34. World view One document == JSON One document == One

    record Many Documents == One database No schema Thursday, April 15, 2010
  35. A document { "_id": "b098445d587b1f347e48e1a79301de02", "_rev": "1-80bfd8302e0f08eec2396c8107cafc19", "platform": { "browser":

    "mozilla", "version": "1.9.1.8" }, "timestamp": 1270131033337 } Key, either you choose it or CouchDB does it for you Revision number Thursday, April 15, 2010
  36. Views { "_id": "b098445d587b1f347e48e1a79301de02", "_rev": "1-80bfd8302e0f08eec2396c8107cafc19", "platform": { "browser": "mozilla",

    "version": "1.9.1.8" }, "timestamp": 1270131033337 } + function(doc) { emit(doc.platform.browser, doc.browser.version); } = { "total_rows": 58, "offset": 0, "rows": [ "id": "b098445d587b1f347e48e1a79301de02", "key": "mozilla", "value": "1.9.1.8" ] } Thursday, April 15, 2010
  37. Views Views are stored as an accessible web resource on

    disk and incrementally updated as well as replicated with the database Thursday, April 15, 2010
  38. Replication Peer to peer Online/Offline Conflict detection and resolution Any

    number of nodes Local Remote Thursday, April 15, 2010
  39. CouchDB “stuff” MVCC Append only Multi version concurrency control. Writers

    do not block readers. Readers do not block Hence, won’t corrupt its data files Thursday, April 15, 2010
  40. CouchDB “stuff” MVCC Append only BDCRR Multi version concurrency control.

    Writers do not block readers. Readers do not block Bi-directional, conflict resolving, replication Hence, won’t corrupt its data files Thursday, April 15, 2010
  41. CouchDB “stuff” MVCC Append only Compaction BDCRR Multi version concurrency

    control. Writers do not block readers. Readers do not block Bi-directional, conflict resolving, replication Hence, won’t corrupt its data files Append only will cause data files to grow. Compaction to the rescue, in the background - for your pleasure. Thursday, April 15, 2010
  42. CouchDB “stuff” MVCC Append only Compaction ACID BDCRR Multi version

    concurrency control. Writers do not block readers. Readers do not block Bi-directional, conflict resolving, replication Hence, won’t corrupt its data files Awesome, Cool, Impressive, Dope Append only will cause data files to grow. Compaction to the rescue, in the background - for your pleasure. Thursday, April 15, 2010
  43. CouchDB - Takeaways • Kick ass replication • Views are

    fast • Can host and serve complete webapps Thursday, April 15, 2010
  44. Outro • Test one or more NoSQL thingys • Get

    familiar with Brewers CAP theorem • Get familiar with the Dynamo paper Thursday, April 15, 2010