for web-related-tech • Help with application design, infrastructure, capacity planning, etc • Mainly for the video-games industry & web-startups • Twitter: @ukd1 Friday, 20 April 12
distributed, key-value store • Modelled on Amazon’s description of Dynamo, like Cassandra • Commercially supported / developed by Basho • Written in Erlang • Open source - Apache License (2.0) Friday, 20 April 12
want • Relational database - No joins or constraint enforcement as there are no global locks • Not intended to compete with in-memory column based databases Friday, 20 April 12
+ distributed full-text indexing / search) • Riak Enterprise - commercially licensed - supports extra features for enterprise use (SNMP, data-centre awareness, etc) • Luwak (Riak + app for storing large files; it’s bundled by default) Friday, 20 April 12
- allowing tuning of N, R & W • N - # of nodes, set per bucket (default of 3) • R - # of nodes required for a read (per request) • W - # of nodes required for a successful write (a number, all, quorum or default for the bucket) Friday, 20 April 12
are organised in to buckets • Practical value limit of 64mb • For large files; Luwak (built in > 0.13) splits them in to smaller blocks Friday, 20 April 12
HTTP API is mainly REST - GET, PUT, DELETE • Riak stores the key, value & metadata about the key; • Content Type, Charset, Encoding & link data • Also: any custom metadata Friday, 20 April 12
clock • Riak can determine if values; • Are direct decendants of a single object • Share a common parent • Unrelated • In Riak each object has a vector clock • Cassandra uses timestamps - problems can occur with out of sync Friday, 20 April 12
which Riak has not merged • Occurs only if allow_mult is enabled on a bucket AND; • Concurrent write with the same vector clock value • Stale vector clock • No vector clock passed Friday, 20 April 12
be written • Modify the object • Fail the update • They are per-bucket (stored in the properties) • Written in Javascript (pre-hooks) or Erlang (pre/post-hooks) Friday, 20 April 12
all]] • Alternative is filesystem backup for bitcask; as it uses append-only files • riak-admin backup is storage-engine agnostic • riak-admin only backs up kv data; not search indexes (Riak-Search) Friday, 20 April 12
Bitcask - default, best when keyspace < RAM • InnoDB - suggested when keyspace > RAM • Also available - Google’s LevelDB. It’s BSD licensed & recently integrated, good for large sets. Friday, 20 April 12
• Realtime • Uses Lucene Analyzers, custom ones may be written in Erlang / Java • Supports term / field searchs, boolean operators, grouping, lexical range queries and end of word wildcards • Will be part of Riak as default from 1.0 Friday, 20 April 12
nodes from a cluster • No pre-setup of datamodel • Rest & Protobuf API access • Commercial support from the original developers, Basho Friday, 20 April 12
/ cover queries are built in (no need to write MapReduce functions) • ‘Enterprise’ features (dc / rack awareness) are free & in the open-source build • Wide support / training from 3rd party commercial parties; DataStax / Acunu / Impetus / Onzra http://wiki.apache.org/cassandra/ThirdPartySupport • Cassandra is seemly more popular & has a bigger community • Partitions vs MD5 of RandomPartitioner; you can’t reconfigure if you need - careful you plan with Riak! http://wiki.basho.com/Cluster-Capacity-Planning.html Friday, 20 April 12