Basho 120+ employees, offices in SF, MA, London, Japan Founded in 2008, open sourced Riak in 2009 Sponsors of the Riak open source database (Apache 2) Sell Enterprise features (multi-DC replication), support, training. Riak CS (S3-compat storage) released in March 2012 Tuesday, April 2, 13
Now Open Source (Apache 2) Cloud storage software backed by Riak S3 API Formerly closed-source Per-tenant reporting Pluggable authentication Detailed stats DTrace support Multi-datacenter replication (Enterprise) Preliminary integration with CloudStack Tuesday, April 2, 13
you can’t outsource these properties operationally simple horizontally scalable globally distributed highly available no SPOFs fault tolerant Tuesday, April 2, 13
Key-Value store (plus extras) Distributed, horizontally scalable Eventually consistent Fault-tolerant Highly-available Inspired by Amazon’s Dynamo Tuesday, April 2, 13
Distributed & Horizontally Scalable Default configuration is in a cluster Load and data are spread evenly via consistent hashing Scalable: Add more nodes to get more X Tuesday, April 2, 13
Fault-Tolerant Symmetry: All nodes participate equally Decentralized: no central control, no SPOF All data is replicated 3x by default Cluster transparently survives... node failure network partitions Tuesday, April 2, 13
Highly-Available Any node can serve client requests Fallbacks (sloppy quorums) are used when nodes are down Always accepts write requests Accepts read request as long as R/N nodes are alive Per-request quorums Tuesday, April 2, 13
Inspired by Amazon’s Dynamo Masterless, peer-coordinated replication Consistent hashing Eventually consistent Quorum reads and writes Anti-entropy: read repair, hinted handoff Tuesday, April 2, 13
Riak Node Riak Node Riak Node Riak Node Riak Node Large Object Riak CS S3 API Reporting API Riak CS S3 API Reporting API Riak CS S3 API Reporting API Riak CS S3 API Reporting API Riak CS S3 API Reporting API Tuesday, April 2, 13
Riak Node Riak Node Riak Node Riak Node Riak Node Large Object Riak CS S3 API Reporting API Riak CS S3 API Reporting API Riak CS S3 API Reporting API Riak CS S3 API Reporting API Riak CS S3 API Reporting API 1. user uploads an object Tuesday, April 2, 13
Riak Node Riak Node Riak Node Riak Node Riak Node Large Object Riak CS S3 API Reporting API Riak CS S3 API Reporting API Riak CS S3 API Reporting API Riak CS S3 API Reporting API Riak CS S3 API Reporting API Tuesday, April 2, 13
Riak Node Riak Node Riak Node Riak Node Riak Node Large Object Riak CS S3 API Reporting API Riak CS S3 API Reporting API Riak CS S3 API Reporting API Riak CS S3 API Reporting API Riak CS S3 API Reporting API Tuesday, April 2, 13
Riak Node Riak Node Riak Node Riak Node Riak Node Large Object Riak CS S3 API Reporting API Riak CS S3 API Reporting API Riak CS S3 API Reporting API Riak CS S3 API Reporting API Riak CS S3 API Reporting API 4. Riak replicates and stores chunks Tuesday, April 2, 13
Consistent Hashing Invented by Danny Lewin and others @ MIT/Akamai Minimizes remapping of keys when number of hash slots changes Originally applied to CDNs, used in Dynamo for replica placement Enables incremental scalability, even spread Minimizes hot spots Tuesday, April 2, 13
Vector Clocks Introduced by Mattern et al, in 1988 Extends Lamport’s timestamps (1978) Each value in Dynamo tagged with vector clock Allows detection of stale values, logical siblings Tuesday, April 2, 13
Read Repair Update stale versions opportunistically on reads (instead of writes) Pushes system toward consistency, after returning value to client Reflects focus on a cheap, always-available write path Tuesday, April 2, 13
Hinted Handoff Any node can accept writes for other nodes if they’re down All messages include a destination Data accepted by node other than destination is handed off when node recovers As long as a single node is alive the cluster can accept a write Tuesday, April 2, 13
Anti-Entropy Replicas maintain a Merkle Tree of keys and their versions/hashes Trees periodically exchanged with peer vnodes Merkle tree enables cheap comparison Only values with different hashes are exchanged Pushes system toward consistency Tuesday, April 2, 13
Gossip Protocol Decentralized approach to managing global state Trades off atomicity of state changes for a decentralized approach Volume of gossip can overwhelm networks without care Tuesday, April 2, 13
Hinted Handoff • Node fails • Requests go to fallback • Node comes back • “Handoff” - data returns to recovered node hash(“blocks/6307C89A-710A-42CD-9FFB-2A6B39F983EA”) Tuesday, April 2, 13
Hinted Handoff • Node fails • Requests go to fallback • Node comes back • “Handoff” - data returns to recovered node • Normal operations resume hash(“blocks/6307C89A-710A-42CD-9FFB-2A6B39F983EA”) Tuesday, April 2, 13
Anatomy of a Request get(“blocks/6307C89A-710A-42CD-9FFB-2A6B39F983EA”) Get Handler (FSM) client Riak hash(“blocks/ 6307C89A-710A-42CD-9FFB-2A6B39F983EA”) == 10, 11, 12 Tuesday, April 2, 13
Coming Soon: Riak CS 1.4 (Q2) Swift API Keystone Integration S3 Features COPY Object Object Versioning Riak CS 1.5 (Q3) Server side encryption More S3 features Enhanced CloudStack and OpenStack integration Tuesday, April 2, 13
RICON East - May 13-14, NYC A distributed systems conference for developers Speakers from Comcast, State Farm, UC Berkeley, Harvard, and many more Use discount code SVCloud20 for 20% off tickets http://ricon.io/east.html Tuesday, April 2, 13
thanks!/questions? download riakcs: http://docs.basho.com/riakcs/latest/riakcs-downloads/ hack riakcs: http://github.com/basho/riak_cs work at basho: http://bashojobs.theresumator.com follow basho on twitter: http:/twitter.com/basho Tuesday, April 2, 13