Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Cassandra London UG July 2011 - Riak vs Cassandra

Cassandra London UG July 2011 - Riak vs Cassandra

Russell Smith

April 20, 2012
Tweet

More Decks by Russell Smith

Other Decks in Technology

Transcript

  1. Riak
    How does Riak compare to Cassandra?
    Friday, 20 April 12

    View full-size slide

  2. /usr/bin/whoami
    • Russell Smith
    • Work for UKD1, a consultancy for web-related-tech
    • Help with application design, infrastructure, capacity planning, etc
    • Mainly for the video-games industry & web-startups
    • Twitter: @ukd1
    Friday, 20 April 12

    View full-size slide

  3. What is Riak?
    • Pronounced ‘ree-ack’
    • A scalable, high-availability, distributed, key-value store
    • Modelled on Amazon’s description of Dynamo, like Cassandra
    • Commercially supported / developed by Basho
    • Written in Erlang
    • Open source - Apache License (2.0)
    Friday, 20 April 12

    View full-size slide

  4. What isn’t Riak?
    • Schema enforced - store what you want
    • Relational database - No joins or constraint enforcement as there are no global locks
    • Not intended to compete with in-memory column based databases
    Friday, 20 April 12

    View full-size slide

  5. What versions are available?
    • Riak
    • Riak Search (Riak + distributed full-text indexing / search)
    • Riak Enterprise - commercially licensed - supports extra features for
    enterprise use (SNMP, data-centre awareness, etc)
    • Luwak (Riak + app for storing large files; it’s bundled by default)
    Friday, 20 April 12

    View full-size slide

  6. Riak’s take on CAP
    • Exposed to the end user - allowing tuning of N, R & W
    • N - # of nodes, set per bucket (default of 3)
    • R - # of nodes required for a read (per request)
    • W - # of nodes required for a successful write (a number, all, quorum
    or default for the bucket)
    Friday, 20 April 12

    View full-size slide

  7. Client libraries
    • PHP, Python, Ruby, Java, Erlang, Javascript, .NET
    • Community client libraries;
    • C, Clojure, Go, Griffon, Groovy, Haskell, Perl, Scala, Smalltalk
    Friday, 20 April 12

    View full-size slide

  8. What can you store?
    • Values against keys
    • Keys are organised in to buckets
    • Practical value limit of 64mb
    • For large files; Luwak (built in > 0.13) splits them in to smaller blocks
    Friday, 20 April 12

    View full-size slide

  9. Querying
    • Two main interfaces; HTTP & Protocol buffers
    • HTTP API is mainly REST - GET, PUT, DELETE
    • Riak stores the key, value & metadata about the key;
    • Content Type, Charset, Encoding & link data
    • Also: any custom metadata
    Friday, 20 April 12

    View full-size slide

  10. Links
    • Used to store one-way relationships between objects;
    • Stored in object meta-data
    • Link-walking uses MapReduce
    Friday, 20 April 12

    View full-size slide

  11. MapReduce
    • Designed to be used for web-page-speed requests
    • Built in
    • Map / Reduce functions are written in Javascript or Erlang
    • Can do re-reduce
    • Streaming MapReduce
    Friday, 20 April 12

    View full-size slide

  12. Vector clocks
    • Each value is tagged with a vector clock
    • Riak can determine if values;
    • Are direct decendants of a single object
    • Share a common parent
    • Unrelated
    • In Riak each object has a vector clock
    • Cassandra uses timestamps - problems can occur with out of sync
    Friday, 20 April 12

    View full-size slide

  13. Siblings
    • Siblings are different versions of the same document which Riak has
    not merged
    • Occurs only if allow_mult is enabled on a bucket AND;
    • Concurrent write with the same vector clock value
    • Stale vector clock
    • No vector clock passed
    Friday, 20 April 12

    View full-size slide

  14. Pre & Post Commit Hooks
    • Allow the object to be written
    • Modify the object
    • Fail the update
    • They are per-bucket (stored in the properties)
    • Written in Javascript (pre-hooks) or Erlang (pre/post-hooks)
    Friday, 20 April 12

    View full-size slide

  15. Admin
    • Super simple;
    • riak-admin join
    • riak-admin leave
    • Backup tools are provided....
    Friday, 20 April 12

    View full-size slide

  16. Backup / restore
    • riak-admin backup|restore [[node|
    all]]
    • Alternative is filesystem backup for bitcask; as it uses append-only files
    • riak-admin backup is storage-engine agnostic
    • riak-admin only backs up kv data; not search indexes (Riak-Search)
    Friday, 20 April 12

    View full-size slide

  17. Storage engines
    • Ships with two default storage engines;
    • Bitcask - default, best when keyspace < RAM
    • InnoDB - suggested when keyspace > RAM
    • Also available - Google’s LevelDB. It’s BSD licensed & recently
    integrated, good for large sets.
    Friday, 20 April 12

    View full-size slide

  18. Riak-Search
    • Full-text search engine built on top of Riak
    • Realtime
    • Uses Lucene Analyzers, custom ones may be written in Erlang / Java
    • Supports term / field searchs, boolean operators, grouping, lexical
    range queries and end of word wildcards
    • Will be part of Riak as default from 1.0
    Friday, 20 April 12

    View full-size slide

  19. Riak > Cassandra
    • Extremely simple to add or remove nodes from a cluster
    • No pre-setup of datamodel
    • Rest & Protobuf API access
    • Commercial support from the original developers, Basho
    Friday, 20 April 12

    View full-size slide

  20. Riak = Cassandra
    • No single point of failure
    • Linearly scalable
    • High availability
    • Eventually consistent
    • You can choose your own consistency requirements
    Friday, 20 April 12

    View full-size slide

  21. Riak < Cassandra
    • CQL; an SQL-ish language
    • Range / cover queries are built in (no need to write MapReduce functions)
    • ‘Enterprise’ features (dc / rack awareness) are free & in the open-source build
    • Wide support / training from 3rd party commercial parties; DataStax / Acunu / Impetus / Onzra
    http://wiki.apache.org/cassandra/ThirdPartySupport
    • Cassandra is seemly more popular & has a bigger community
    • Partitions vs MD5 of RandomPartitioner; you can’t reconfigure if you need - careful you plan with Riak!
    http://wiki.basho.com/Cluster-Capacity-Planning.html
    Friday, 20 April 12

    View full-size slide

  22. Further reading
    • Basho’s slide deck; http://wiki.basho.com/Slide-Decks.html
    • Commit hooks; http://wiki.basho.com/Pre--and-Post-Commit-
    Hooks.html
    • Riak / Cassandra; http://wiki.basho.com/Riak-Compared-to-
    Cassandra.html
    Friday, 20 April 12

    View full-size slide

  23. Questions?
    Friday, 20 April 12

    View full-size slide