Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Cassandra London UG July 2011 - Riak vs Cassandra

Cassandra London UG July 2011 - Riak vs Cassandra

Russell Smith

April 20, 2012
Tweet

More Decks by Russell Smith

Other Decks in Technology

Transcript

  1. /usr/bin/whoami • Russell Smith • Work for UKD1, a consultancy

    for web-related-tech • Help with application design, infrastructure, capacity planning, etc • Mainly for the video-games industry & web-startups • Twitter: @ukd1 Friday, 20 April 12
  2. What is Riak? • Pronounced ‘ree-ack’ • A scalable, high-availability,

    distributed, key-value store • Modelled on Amazon’s description of Dynamo, like Cassandra • Commercially supported / developed by Basho • Written in Erlang • Open source - Apache License (2.0) Friday, 20 April 12
  3. What isn’t Riak? • Schema enforced - store what you

    want • Relational database - No joins or constraint enforcement as there are no global locks • Not intended to compete with in-memory column based databases Friday, 20 April 12
  4. What versions are available? • Riak • Riak Search (Riak

    + distributed full-text indexing / search) • Riak Enterprise - commercially licensed - supports extra features for enterprise use (SNMP, data-centre awareness, etc) • Luwak (Riak + app for storing large files; it’s bundled by default) Friday, 20 April 12
  5. Riak’s take on CAP • Exposed to the end user

    - allowing tuning of N, R & W • N - # of nodes, set per bucket (default of 3) • R - # of nodes required for a read (per request) • W - # of nodes required for a successful write (a number, all, quorum or default for the bucket) Friday, 20 April 12
  6. Client libraries • PHP, Python, Ruby, Java, Erlang, Javascript, .NET

    • Community client libraries; • C, Clojure, Go, Griffon, Groovy, Haskell, Perl, Scala, Smalltalk Friday, 20 April 12
  7. What can you store? • Values against keys • Keys

    are organised in to buckets • Practical value limit of 64mb • For large files; Luwak (built in > 0.13) splits them in to smaller blocks Friday, 20 April 12
  8. Querying • Two main interfaces; HTTP & Protocol buffers •

    HTTP API is mainly REST - GET, PUT, DELETE • Riak stores the key, value & metadata about the key; • Content Type, Charset, Encoding & link data • Also: any custom metadata Friday, 20 April 12
  9. Links • Used to store one-way relationships between objects; •

    Stored in object meta-data • Link-walking uses MapReduce Friday, 20 April 12
  10. MapReduce • Designed to be used for web-page-speed requests •

    Built in • Map / Reduce functions are written in Javascript or Erlang • Can do re-reduce • Streaming MapReduce Friday, 20 April 12
  11. Vector clocks • Each value is tagged with a vector

    clock • Riak can determine if values; • Are direct decendants of a single object • Share a common parent • Unrelated • In Riak each object has a vector clock • Cassandra uses timestamps - problems can occur with out of sync Friday, 20 April 12
  12. Siblings • Siblings are different versions of the same document

    which Riak has not merged • Occurs only if allow_mult is enabled on a bucket AND; • Concurrent write with the same vector clock value • Stale vector clock • No vector clock passed Friday, 20 April 12
  13. Pre & Post Commit Hooks • Allow the object to

    be written • Modify the object • Fail the update • They are per-bucket (stored in the properties) • Written in Javascript (pre-hooks) or Erlang (pre/post-hooks) Friday, 20 April 12
  14. Admin • Super simple; • riak-admin join <node-in-cluster> • riak-admin

    leave • Backup tools are provided.... Friday, 20 April 12
  15. Backup / restore • riak-admin backup|restore <node> <cookie> <output_file> [[node|

    all]] • Alternative is filesystem backup for bitcask; as it uses append-only files • riak-admin backup is storage-engine agnostic • riak-admin only backs up kv data; not search indexes (Riak-Search) Friday, 20 April 12
  16. Storage engines • Ships with two default storage engines; •

    Bitcask - default, best when keyspace < RAM • InnoDB - suggested when keyspace > RAM • Also available - Google’s LevelDB. It’s BSD licensed & recently integrated, good for large sets. Friday, 20 April 12
  17. Riak-Search • Full-text search engine built on top of Riak

    • Realtime • Uses Lucene Analyzers, custom ones may be written in Erlang / Java • Supports term / field searchs, boolean operators, grouping, lexical range queries and end of word wildcards • Will be part of Riak as default from 1.0 Friday, 20 April 12
  18. Riak > Cassandra • Extremely simple to add or remove

    nodes from a cluster • No pre-setup of datamodel • Rest & Protobuf API access • Commercial support from the original developers, Basho Friday, 20 April 12
  19. Riak = Cassandra • No single point of failure •

    Linearly scalable • High availability • Eventually consistent • You can choose your own consistency requirements Friday, 20 April 12
  20. Riak < Cassandra • CQL; an SQL-ish language • Range

    / cover queries are built in (no need to write MapReduce functions) • ‘Enterprise’ features (dc / rack awareness) are free & in the open-source build • Wide support / training from 3rd party commercial parties; DataStax / Acunu / Impetus / Onzra http://wiki.apache.org/cassandra/ThirdPartySupport • Cassandra is seemly more popular & has a bigger community • Partitions vs MD5 of RandomPartitioner; you can’t reconfigure if you need - careful you plan with Riak! http://wiki.basho.com/Cluster-Capacity-Planning.html Friday, 20 April 12
  21. Further reading • Basho’s slide deck; http://wiki.basho.com/Slide-Decks.html • Commit hooks;

    http://wiki.basho.com/Pre--and-Post-Commit- Hooks.html • Riak / Cassandra; http://wiki.basho.com/Riak-Compared-to- Cassandra.html Friday, 20 April 12