Slide 1

Slide 1 text

Riak How does Riak compare to Cassandra? Friday, 20 April 12

Slide 2

Slide 2 text

/usr/bin/whoami • Russell Smith • Work for UKD1, a consultancy for web-related-tech • Help with application design, infrastructure, capacity planning, etc • Mainly for the video-games industry & web-startups • Twitter: @ukd1 Friday, 20 April 12

Slide 3

Slide 3 text

What is Riak? • Pronounced ‘ree-ack’ • A scalable, high-availability, distributed, key-value store • Modelled on Amazon’s description of Dynamo, like Cassandra • Commercially supported / developed by Basho • Written in Erlang • Open source - Apache License (2.0) Friday, 20 April 12

Slide 4

Slide 4 text

What isn’t Riak? • Schema enforced - store what you want • Relational database - No joins or constraint enforcement as there are no global locks • Not intended to compete with in-memory column based databases Friday, 20 April 12

Slide 5

Slide 5 text

What versions are available? • Riak • Riak Search (Riak + distributed full-text indexing / search) • Riak Enterprise - commercially licensed - supports extra features for enterprise use (SNMP, data-centre awareness, etc) • Luwak (Riak + app for storing large files; it’s bundled by default) Friday, 20 April 12

Slide 6

Slide 6 text

Riak’s take on CAP • Exposed to the end user - allowing tuning of N, R & W • N - # of nodes, set per bucket (default of 3) • R - # of nodes required for a read (per request) • W - # of nodes required for a successful write (a number, all, quorum or default for the bucket) Friday, 20 April 12

Slide 7

Slide 7 text

Client libraries • PHP, Python, Ruby, Java, Erlang, Javascript, .NET • Community client libraries; • C, Clojure, Go, Griffon, Groovy, Haskell, Perl, Scala, Smalltalk Friday, 20 April 12

Slide 8

Slide 8 text

What can you store? • Values against keys • Keys are organised in to buckets • Practical value limit of 64mb • For large files; Luwak (built in > 0.13) splits them in to smaller blocks Friday, 20 April 12

Slide 9

Slide 9 text

Querying • Two main interfaces; HTTP & Protocol buffers • HTTP API is mainly REST - GET, PUT, DELETE • Riak stores the key, value & metadata about the key; • Content Type, Charset, Encoding & link data • Also: any custom metadata Friday, 20 April 12

Slide 10

Slide 10 text

Links • Used to store one-way relationships between objects; • Stored in object meta-data • Link-walking uses MapReduce Friday, 20 April 12

Slide 11

Slide 11 text

MapReduce • Designed to be used for web-page-speed requests • Built in • Map / Reduce functions are written in Javascript or Erlang • Can do re-reduce • Streaming MapReduce Friday, 20 April 12

Slide 12

Slide 12 text

Vector clocks • Each value is tagged with a vector clock • Riak can determine if values; • Are direct decendants of a single object • Share a common parent • Unrelated • In Riak each object has a vector clock • Cassandra uses timestamps - problems can occur with out of sync Friday, 20 April 12

Slide 13

Slide 13 text

Siblings • Siblings are different versions of the same document which Riak has not merged • Occurs only if allow_mult is enabled on a bucket AND; • Concurrent write with the same vector clock value • Stale vector clock • No vector clock passed Friday, 20 April 12

Slide 14

Slide 14 text

Pre & Post Commit Hooks • Allow the object to be written • Modify the object • Fail the update • They are per-bucket (stored in the properties) • Written in Javascript (pre-hooks) or Erlang (pre/post-hooks) Friday, 20 April 12

Slide 15

Slide 15 text

Admin • Super simple; • riak-admin join • riak-admin leave • Backup tools are provided.... Friday, 20 April 12

Slide 16

Slide 16 text

Backup / restore • riak-admin backup|restore [[node| all]] • Alternative is filesystem backup for bitcask; as it uses append-only files • riak-admin backup is storage-engine agnostic • riak-admin only backs up kv data; not search indexes (Riak-Search) Friday, 20 April 12

Slide 17

Slide 17 text

Storage engines • Ships with two default storage engines; • Bitcask - default, best when keyspace < RAM • InnoDB - suggested when keyspace > RAM • Also available - Google’s LevelDB. It’s BSD licensed & recently integrated, good for large sets. Friday, 20 April 12

Slide 18

Slide 18 text

Riak-Search • Full-text search engine built on top of Riak • Realtime • Uses Lucene Analyzers, custom ones may be written in Erlang / Java • Supports term / field searchs, boolean operators, grouping, lexical range queries and end of word wildcards • Will be part of Riak as default from 1.0 Friday, 20 April 12

Slide 19

Slide 19 text

Riak > Cassandra • Extremely simple to add or remove nodes from a cluster • No pre-setup of datamodel • Rest & Protobuf API access • Commercial support from the original developers, Basho Friday, 20 April 12

Slide 20

Slide 20 text

Riak = Cassandra • No single point of failure • Linearly scalable • High availability • Eventually consistent • You can choose your own consistency requirements Friday, 20 April 12

Slide 21

Slide 21 text

Riak < Cassandra • CQL; an SQL-ish language • Range / cover queries are built in (no need to write MapReduce functions) • ‘Enterprise’ features (dc / rack awareness) are free & in the open-source build • Wide support / training from 3rd party commercial parties; DataStax / Acunu / Impetus / Onzra http://wiki.apache.org/cassandra/ThirdPartySupport • Cassandra is seemly more popular & has a bigger community • Partitions vs MD5 of RandomPartitioner; you can’t reconfigure if you need - careful you plan with Riak! http://wiki.basho.com/Cluster-Capacity-Planning.html Friday, 20 April 12

Slide 22

Slide 22 text

Further reading • Basho’s slide deck; http://wiki.basho.com/Slide-Decks.html • Commit hooks; http://wiki.basho.com/Pre--and-Post-Commit- Hooks.html • Riak / Cassandra; http://wiki.basho.com/Riak-Compared-to- Cassandra.html Friday, 20 April 12

Slide 23

Slide 23 text

Questions? Friday, 20 April 12