$30 off During Our Annual Pro Sale. View Details »

Not Only SQL

Not Only SQL

NoSQL overview with Riak and CouchDB details

I was invited by my former employer Qbranch to give an presentation on NoSQL in general and choose to also talk about Riak and CouchDB in particular.

Approx 60 minutes.

Mårten Gustafson

April 15, 2010
Tweet

More Decks by Mårten Gustafson

Other Decks in Technology

Transcript

  1. Not only SQL
    Mårten Gustafson
    http://marten.gustafson.pp.se/
    Qbranch CODE tech-meet @ 2010-04-14
    Thursday, April 15, 2010

    View Slide

  2. What?
    “NoSQL is a movement promoting a loosely
    defined class of non-relational data stores that
    break with a long history of relational
    databases” - Wikipedia
    Thursday, April 15, 2010

    View Slide

  3. What?
    “NoSQL is a movement promoting a loosely
    defined class of non-relational data stores
    that break with a long history of relational
    databases” - Wikipedia
    Not a single technique
    Not a single type of data
    Not a single type of use case
    Thursday, April 15, 2010

    View Slide

  4. Why?
    • Non-relational
    • Schema-less
    • “Easily” scalable
    • REST/JSON API = web friendly
    Thursday, April 15, 2010

    View Slide

  5. What’s out there?
    Storage type License Implemented in
    Amazon Dynamo Key/Value n/a ?
    Cassandra Columnfamily ASL 2.0 Java
    CouchDB Document ASL 2.0 Erlang
    Dynomite Key/Value BSD/MIT-style Erlang
    HBase Columnfamily ASL 2.0 Java
    MongoDB Document AGPL v3.0 C++
    Neo4J Graph AGPL v3.0 / Comm Java
    Riak Key/Value ASL 2.0 Erlang
    Redis Key/Value BSD/MIT-style C
    Scalaris Key/Value ASL 2.0 Erlang
    Tokyo Cabinet Key/Value LGPL C
    Voldemort Key/Value ASL 2.0 Java
    Thursday, April 15, 2010

    View Slide

  6. Distribution
    • Master / Slave
    • Master / Slave(s)
    • Masterless (Master / Master)
    Thursday, April 15, 2010

    View Slide

  7. Distribution
    Masterless Master/Slave Hot standby
    Amazon Dynamo X
    Cassandra X
    CouchDB X
    Dynomite X
    HBase ?
    MongoDB X X
    Neo4J*
    Riak X
    Redis X
    Scalaris X
    Tokyo Cabinet
    Voldemort X
    * Neo4J HA coming “soon”
    Thursday, April 15, 2010

    View Slide

  8. Common factor
    “...of the web...”
    Of the who?!
    Thursday, April 15, 2010

    View Slide

  9. Of the web
    “...Django may be built for the Web, but
    CouchDB is built of the Web. I’ve never seen
    software that so completely embraces the
    philosophies behind HTTP. CouchDB
    makes Django look old-school in the same way
    that Django makes ASP look outdated”
    - http://jacobian.org/writing/of-the-web/
    Thursday, April 15, 2010

    View Slide

  10. Of the web
    “...CouchDB may succeeded, and it may fail; who
    knows. I’m sure of one thing, though — this is
    what the software of the future looks like”
    - http://jacobian.org/writing/of-the-web/
    Thursday, April 15, 2010

    View Slide

  11. So freakin’ what?!
    All your webish skillz and tools apply...
    Thursday, April 15, 2010

    View Slide

  12. So freakin’ what?!
    All your webish skillz and tools apply...
    proxies
    load balancers
    caches
    HTTP client libs (etag, if-modified-since, etc)
    language-, platform- and OS-neutral
    MIME / Content-Type
    Thursday, April 15, 2010

    View Slide

  13. These guys can just suck it
    HTTP/REST is integration that works
    (YMMV)
    Thursday, April 15, 2010

    View Slide

  14. Buckle Up Dorothy. Cause' Kansas, Is Going Bye-Bye
    Thursday, April 15, 2010

    View Slide

  15. I got keys but no locks
    Thursday, April 15, 2010

    View Slide

  16. Riak
    Decentralized key-value store
    A flexible map/reduce engine
    HTTP/JSON API
    A database ideally suited for Web applications
    Thursday, April 15, 2010

    View Slide

  17. The Ring
    Thursday, April 15, 2010

    View Slide

  18. The Ring
    ring size = 12
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    Thursday, April 15, 2010

    View Slide

  19. The Ring
    One Ring size to rule them all, One Ring size to
    find them, One Ring size to bring them all and in
    the cluster bind them...
    Thursday, April 15, 2010

    View Slide

  20. Consistent Hashing
    Store/Save (PUT)
    Thursday, April 15, 2010

    View Slide

  21. Consistent Hashing
    Store/Save (PUT)
    Thursday, April 15, 2010

    View Slide

  22. Consistent Hashing
    Read (GET)
    “I want “ is
    answered by:
    where is on
    the ring?
    Thursday, April 15, 2010

    View Slide

  23. Consistent Hashing
    Read (GET)
    “I want “ is
    answered by:
    where is on
    the ring?
    Thursday, April 15, 2010

    View Slide

  24. Cluster
    Instance A
    Instance B
    Instance C
    ring size = 12
    instances = 3
    ring size / nodes = ~slices per instances
    Thursday, April 15, 2010

    View Slide

  25. Cluster
    Instance A
    Instance B
    Instance C
    ring size = 12
    instances = 3
    ring size / nodes = ~slices per instances
    Thursday, April 15, 2010

    View Slide

  26. Cluster - Read (GET)
    Instance A Instance B Instance C
    Thursday, April 15, 2010

    View Slide

  27. Cluster - Read (GET)
    Instance A Instance B Instance C
    I can haz ?
    Hm, lives in a
    slice of the ring
    owned by instance
    C.
    Thursday, April 15, 2010

    View Slide

  28. Cluster - Read (GET)
    Instance A Instance B Instance C
    Okidoki, now
    where’s he...a yeah
    in my fourth slice
    I can haz ?
    Hey C! I need
    Thursday, April 15, 2010

    View Slide

  29. Cluster - Read (GET)
    Instance A Instance B Instance C
    Here ya go
    I can haz ?
    Cheers!
    Thursday, April 15, 2010

    View Slide

  30. Riak “stuff”
    Thursday, April 15, 2010

    View Slide

  31. Riak “stuff”
    Bucket
    Container/keyspace.
    Determines number of
    replicas for its contents
    Thursday, April 15, 2010

    View Slide

  32. Riak “stuff”
    Bucket
    Consistent Hashing
    Key hashing
    technique used to
    distribute keys on the
    Container/keyspace.
    Determines number of
    replicas for its contents
    Thursday, April 15, 2010

    View Slide

  33. Riak “stuff”
    Bucket
    Consistent Hashing
    Gossiping
    Shares state, bucket
    and ring knowledge
    in the cluster
    Key hashing
    technique used to
    distribute keys on the
    Container/keyspace.
    Determines number of
    replicas for its contents
    Thursday, April 15, 2010

    View Slide

  34. Riak “stuff”
    Bucket
    Consistent Hashing
    Gossiping
    Hinted Handoff
    Shares state, bucket
    and ring knowledge
    in the cluster
    Key hashing
    technique used to
    distribute keys on the
    Container/keyspace.
    Determines number of
    replicas for its contents
    Covering for a
    failed “neighbor”
    node while gone
    Thursday, April 15, 2010

    View Slide

  35. Riak “stuff”
    Bucket
    Consistent Hashing
    Gossiping
    Hinted Handoff
    Links
    Shares state, bucket
    and ring knowledge
    in the cluster Allows retrieval of
    “weakly” linked objects
    Key hashing
    technique used to
    distribute keys on the
    Container/keyspace.
    Determines number of
    replicas for its contents
    Covering for a
    failed “neighbor”
    node while gone
    Thursday, April 15, 2010

    View Slide

  36. Riak “stuff”
    Bucket
    Consistent Hashing
    Gossiping
    Hinted Handoff
    Links
    Merkle Tree
    Shares state, bucket
    and ring knowledge
    in the cluster Allows retrieval of
    “weakly” linked objects
    Data structure for
    efficient summary about
    keys. Gossiped.
    Key hashing
    technique used to
    distribute keys on the
    Container/keyspace.
    Determines number of
    replicas for its contents
    Covering for a
    failed “neighbor”
    node while gone
    Thursday, April 15, 2010

    View Slide

  37. Riak “stuff”
    Bucket
    Consistent Hashing
    Gossiping
    Hinted Handoff
    Links
    Merkle Tree
    Node
    Shares state, bucket
    and ring knowledge
    in the cluster Allows retrieval of
    “weakly” linked objects
    Data structure for
    efficient summary about
    keys. Gossiped.
    One server. Runs
    vnodes which claims
    partitions.
    Key hashing
    technique used to
    distribute keys on the
    Container/keyspace.
    Determines number of
    replicas for its contents
    Covering for a
    failed “neighbor”
    node while gone
    Thursday, April 15, 2010

    View Slide

  38. Riak “stuff”
    Bucket
    Consistent Hashing
    Gossiping
    Hinted Handoff
    Links
    Merkle Tree
    Node
    Partition
    Shares state, bucket
    and ring knowledge
    in the cluster Allows retrieval of
    “weakly” linked objects
    One slice (part) of the ring.
    Data structure for
    efficient summary about
    keys. Gossiped.
    One server. Runs
    vnodes which claims
    partitions.
    Key hashing
    technique used to
    distribute keys on the
    Container/keyspace.
    Determines number of
    replicas for its contents
    Covering for a
    failed “neighbor”
    node while gone
    Thursday, April 15, 2010

    View Slide

  39. Riak “stuff”
    Bucket
    Consistent Hashing
    Gossiping
    Hinted Handoff
    Links
    Merkle Tree
    Node
    Partition
    Read Repair
    Shares state, bucket
    and ring knowledge
    in the cluster Allows retrieval of
    “weakly” linked objects
    One slice (part) of the ring.
    Data structure for
    efficient summary about
    keys. Gossiped.
    One server. Runs
    vnodes which claims
    partitions.
    Key hashing
    technique used to
    distribute keys on the
    Auto correction of
    out-of-date objects
    Container/keyspace.
    Determines number of
    replicas for its contents
    Covering for a
    failed “neighbor”
    node while gone
    Thursday, April 15, 2010

    View Slide

  40. Riak “stuff”
    Bucket
    Consistent Hashing
    Gossiping
    Hinted Handoff
    Links
    Merkle Tree
    Node
    Partition
    Read Repair
    Replica
    Shares state, bucket
    and ring knowledge
    in the cluster Allows retrieval of
    “weakly” linked objects
    One slice (part) of the ring.
    Data structure for
    efficient summary about
    keys. Gossiped.
    Number of copies
    of the same object
    in the cluster
    One server. Runs
    vnodes which claims
    partitions.
    Key hashing
    technique used to
    distribute keys on the
    Auto correction of
    out-of-date objects
    Container/keyspace.
    Determines number of
    replicas for its contents
    Covering for a
    failed “neighbor”
    node while gone
    Thursday, April 15, 2010

    View Slide

  41. Riak “stuff”
    Bucket
    Consistent Hashing
    Gossiping
    Hinted Handoff
    Links
    Merkle Tree
    Node
    Partition
    Read Repair
    Replica
    Ring
    Shares state, bucket
    and ring knowledge
    in the cluster Allows retrieval of
    “weakly” linked objects
    One slice (part) of the ring.
    Data structure for
    efficient summary about
    keys. Gossiped.
    Number of copies
    of the same object
    in the cluster
    One server. Runs
    vnodes which claims
    partitions.
    Key hashing
    technique used to
    distribute keys on the
    The complete “space”,
    divided into partitions which
    are claimed by vnodes
    Auto correction of
    out-of-date objects
    Container/keyspace.
    Determines number of
    replicas for its contents
    Covering for a
    failed “neighbor”
    node while gone
    Thursday, April 15, 2010

    View Slide

  42. Riak “stuff”
    Bucket
    Consistent Hashing
    Gossiping
    Hinted Handoff
    Links
    Merkle Tree
    Node
    Partition
    Read Repair
    Replica
    Ring
    Vector Clock
    Shares state, bucket
    and ring knowledge
    in the cluster Allows retrieval of
    “weakly” linked objects
    One slice (part) of the ring.
    Data structure for
    efficient summary about
    keys. Gossiped.
    Number of copies
    of the same object
    in the cluster
    One server. Runs
    vnodes which claims
    partitions.
    Key hashing
    technique used to
    distribute keys on the
    The complete “space”,
    divided into partitions which
    are claimed by vnodes
    Auto correction of
    out-of-date objects
    Container/keyspace.
    Determines number of
    replicas for its contents
    Covering for a
    failed “neighbor”
    node while gone
    Version control technique
    used for objects.
    Thursday, April 15, 2010

    View Slide

  43. Riak “stuff”
    Bucket
    Consistent Hashing
    Gossiping
    Hinted Handoff
    Links
    Merkle Tree
    Node
    Partition
    Read Repair
    Replica
    Ring
    Vector Clock Vnode
    Shares state, bucket
    and ring knowledge
    in the cluster Allows retrieval of
    “weakly” linked objects
    Runs in a node
    and claims one
    partition in the
    ring
    One slice (part) of the ring.
    Data structure for
    efficient summary about
    keys. Gossiped.
    Number of copies
    of the same object
    in the cluster
    One server. Runs
    vnodes which claims
    partitions.
    Key hashing
    technique used to
    distribute keys on the
    The complete “space”,
    divided into partitions which
    are claimed by vnodes
    Auto correction of
    out-of-date objects
    Container/keyspace.
    Determines number of
    replicas for its contents
    Covering for a
    failed “neighbor”
    node while gone
    Version control technique
    used for objects.
    Thursday, April 15, 2010

    View Slide

  44. Riak - Takeaways
    • No single point of failure
    • Choose your levels for:
    • availability
    • consistency
    • partition tolerance
    Thursday, April 15, 2010

    View Slide

  45. But wait, there’s more...
    • Binary data + Content-Type = whatever
    • MP3’s, Images, Text, ...
    • Map/Reduce
    • Local data, parallel
    Thursday, April 15, 2010

    View Slide

  46. This slide intentionally left blank
    Thursday, April 15, 2010

    View Slide

  47. Document Store
    Relax
    Thursday, April 15, 2010

    View Slide

  48. CouchDB
    Document oriented databased
    Kick ass replication
    HTTP/JSON API
    Map/reduce view (index) definitions
    Thursday, April 15, 2010

    View Slide

  49. World view
    One document == JSON
    One document == One record
    Many Documents == One database
    No schema
    Thursday, April 15, 2010

    View Slide

  50. A document
    {
    "_id": "b098445d587b1f347e48e1a79301de02",
    "_rev": "1-80bfd8302e0f08eec2396c8107cafc19",
    "platform": {
    "browser": "mozilla",
    "version": "1.9.1.8"
    },
    "timestamp": 1270131033337
    }
    Key, either you
    choose it or CouchDB
    does it for you
    Revision
    number
    Thursday, April 15, 2010

    View Slide

  51. Views
    Filter
    Collate
    Aggregate
    Thursday, April 15, 2010

    View Slide

  52. Views
    {
    "_id": "b098445d587b1f347e48e1a79301de02",
    "_rev": "1-80bfd8302e0f08eec2396c8107cafc19",
    "platform": {
    "browser": "mozilla",
    "version": "1.9.1.8"
    },
    "timestamp": 1270131033337
    }
    +
    function(doc)
    {
    emit(doc.platform.browser, doc.browser.version);
    }
    =
    {
    "total_rows": 58,
    "offset": 0,
    "rows": [
    "id": "b098445d587b1f347e48e1a79301de02",
    "key": "mozilla",
    "value": "1.9.1.8"
    ]
    }
    Thursday, April 15, 2010

    View Slide

  53. Views
    Views are stored
    as an accessible web resource
    on disk
    and incrementally updated
    as well as replicated with the database
    Thursday, April 15, 2010

    View Slide

  54. Replication
    Peer to peer
    Online/Offline
    Conflict detection and resolution
    Any number of nodes
    Local
    Remote
    Thursday, April 15, 2010

    View Slide

  55. Replication
    Thursday, April 15, 2010

    View Slide

  56. Replication
    Thursday, April 15, 2010

    View Slide

  57. Replication
    Thursday, April 15, 2010

    View Slide

  58. Replication
    Thursday, April 15, 2010

    View Slide

  59. Replication
    Thursday, April 15, 2010

    View Slide

  60. CouchDB “stuff”
    Thursday, April 15, 2010

    View Slide

  61. CouchDB “stuff”
    Append only
    Hence, won’t corrupt
    its data files
    Thursday, April 15, 2010

    View Slide

  62. CouchDB “stuff”
    MVCC
    Append only
    Multi version concurrency
    control. Writers do not block
    readers. Readers do not block
    Hence, won’t corrupt
    its data files
    Thursday, April 15, 2010

    View Slide

  63. CouchDB “stuff”
    MVCC
    Append only
    BDCRR
    Multi version concurrency
    control. Writers do not block
    readers. Readers do not block
    Bi-directional, conflict
    resolving, replication
    Hence, won’t corrupt
    its data files
    Thursday, April 15, 2010

    View Slide

  64. CouchDB “stuff”
    MVCC
    Append only
    Compaction
    BDCRR
    Multi version concurrency
    control. Writers do not block
    readers. Readers do not block
    Bi-directional, conflict
    resolving, replication
    Hence, won’t corrupt
    its data files
    Append only will cause data files to
    grow. Compaction to the rescue, in
    the background - for your pleasure.
    Thursday, April 15, 2010

    View Slide

  65. CouchDB “stuff”
    MVCC
    Append only
    Compaction
    ACID
    BDCRR
    Multi version concurrency
    control. Writers do not block
    readers. Readers do not block
    Bi-directional, conflict
    resolving, replication
    Hence, won’t corrupt
    its data files
    Awesome, Cool,
    Impressive, Dope
    Append only will cause data files to
    grow. Compaction to the rescue, in
    the background - for your pleasure.
    Thursday, April 15, 2010

    View Slide

  66. CouchDB - Takeaways
    • Kick ass replication
    • Views are fast
    • Can host and serve complete webapps
    Thursday, April 15, 2010

    View Slide

  67. Outro
    • Test one or more NoSQL thingys
    • Get familiar with Brewers CAP theorem
    • Get familiar with the Dynamo paper
    Thursday, April 15, 2010

    View Slide

  68. Over and out.
    Mårten Gustafson
    @martengustafson
    http://marten.gustafson.pp.se/
    [email protected]
    Thursday, April 15, 2010

    View Slide