Riak
How does Riak compare to Cassandra?
Friday, 20 April 12
Slide 2
Slide 2 text
/usr/bin/whoami
• Russell Smith
• Work for UKD1, a consultancy for web-related-tech
• Help with application design, infrastructure, capacity planning, etc
• Mainly for the video-games industry & web-startups
• Twitter: @ukd1
Friday, 20 April 12
Slide 3
Slide 3 text
What is Riak?
• Pronounced ‘ree-ack’
• A scalable, high-availability, distributed, key-value store
• Modelled on Amazon’s description of Dynamo, like Cassandra
• Commercially supported / developed by Basho
• Written in Erlang
• Open source - Apache License (2.0)
Friday, 20 April 12
Slide 4
Slide 4 text
What isn’t Riak?
• Schema enforced - store what you want
• Relational database - No joins or constraint enforcement as there are no global locks
• Not intended to compete with in-memory column based databases
Friday, 20 April 12
Slide 5
Slide 5 text
What versions are available?
• Riak
• Riak Search (Riak + distributed full-text indexing / search)
• Riak Enterprise - commercially licensed - supports extra features for
enterprise use (SNMP, data-centre awareness, etc)
• Luwak (Riak + app for storing large files; it’s bundled by default)
Friday, 20 April 12
Slide 6
Slide 6 text
Riak’s take on CAP
• Exposed to the end user - allowing tuning of N, R & W
• N - # of nodes, set per bucket (default of 3)
• R - # of nodes required for a read (per request)
• W - # of nodes required for a successful write (a number, all, quorum
or default for the bucket)
Friday, 20 April 12
What can you store?
• Values against keys
• Keys are organised in to buckets
• Practical value limit of 64mb
• For large files; Luwak (built in > 0.13) splits them in to smaller blocks
Friday, 20 April 12
Slide 9
Slide 9 text
Querying
• Two main interfaces; HTTP & Protocol buffers
• HTTP API is mainly REST - GET, PUT, DELETE
• Riak stores the key, value & metadata about the key;
• Content Type, Charset, Encoding & link data
• Also: any custom metadata
Friday, 20 April 12
Slide 10
Slide 10 text
Links
• Used to store one-way relationships between objects;
• Stored in object meta-data
• Link-walking uses MapReduce
Friday, 20 April 12
Slide 11
Slide 11 text
MapReduce
• Designed to be used for web-page-speed requests
• Built in
• Map / Reduce functions are written in Javascript or Erlang
• Can do re-reduce
• Streaming MapReduce
Friday, 20 April 12
Slide 12
Slide 12 text
Vector clocks
• Each value is tagged with a vector clock
• Riak can determine if values;
• Are direct decendants of a single object
• Share a common parent
• Unrelated
• In Riak each object has a vector clock
• Cassandra uses timestamps - problems can occur with out of sync
Friday, 20 April 12
Slide 13
Slide 13 text
Siblings
• Siblings are different versions of the same document which Riak has
not merged
• Occurs only if allow_mult is enabled on a bucket AND;
• Concurrent write with the same vector clock value
• Stale vector clock
• No vector clock passed
Friday, 20 April 12
Slide 14
Slide 14 text
Pre & Post Commit Hooks
• Allow the object to be written
• Modify the object
• Fail the update
• They are per-bucket (stored in the properties)
• Written in Javascript (pre-hooks) or Erlang (pre/post-hooks)
Friday, 20 April 12
Slide 15
Slide 15 text
Admin
• Super simple;
• riak-admin join
• riak-admin leave
• Backup tools are provided....
Friday, 20 April 12
Slide 16
Slide 16 text
Backup / restore
• riak-admin backup|restore [[node|
all]]
• Alternative is filesystem backup for bitcask; as it uses append-only files
• riak-admin backup is storage-engine agnostic
• riak-admin only backs up kv data; not search indexes (Riak-Search)
Friday, 20 April 12
Slide 17
Slide 17 text
Storage engines
• Ships with two default storage engines;
• Bitcask - default, best when keyspace < RAM
• InnoDB - suggested when keyspace > RAM
• Also available - Google’s LevelDB. It’s BSD licensed & recently
integrated, good for large sets.
Friday, 20 April 12
Slide 18
Slide 18 text
Riak-Search
• Full-text search engine built on top of Riak
• Realtime
• Uses Lucene Analyzers, custom ones may be written in Erlang / Java
• Supports term / field searchs, boolean operators, grouping, lexical
range queries and end of word wildcards
• Will be part of Riak as default from 1.0
Friday, 20 April 12
Slide 19
Slide 19 text
Riak > Cassandra
• Extremely simple to add or remove nodes from a cluster
• No pre-setup of datamodel
• Rest & Protobuf API access
• Commercial support from the original developers, Basho
Friday, 20 April 12
Slide 20
Slide 20 text
Riak = Cassandra
• No single point of failure
• Linearly scalable
• High availability
• Eventually consistent
• You can choose your own consistency requirements
Friday, 20 April 12
Slide 21
Slide 21 text
Riak < Cassandra
• CQL; an SQL-ish language
• Range / cover queries are built in (no need to write MapReduce functions)
• ‘Enterprise’ features (dc / rack awareness) are free & in the open-source build
• Wide support / training from 3rd party commercial parties; DataStax / Acunu / Impetus / Onzra
http://wiki.apache.org/cassandra/ThirdPartySupport
• Cassandra is seemly more popular & has a bigger community
• Partitions vs MD5 of RandomPartitioner; you can’t reconfigure if you need - careful you plan with Riak!
http://wiki.basho.com/Cluster-Capacity-Planning.html
Friday, 20 April 12