1
Riak & Riak CS
Enterprise Grade NoSQL Distributed Data Store and Cloud Storage Solutions
Slide 2
Slide 2 text
AGENDA
• Riak Overview
• Riak Architecture
• Use Cases
• Riak EDS
• Riak CS
• Q & A
Slide 3
Slide 3 text
About Basho
Our Mission is to Be The Leader in Distributed Systems
• Founded January 2008
• 115+ employees
• Headquartered in Cambridge, with regional
offices in San Francisco, Washington DC,
London and Tokyo
• Makers of Riak- A popular distributed key-
value store
• Thousands of Users Worldwide including
over 20% of the Fortune 50
• 30,000+ downloads per month now up from
19,500 in Dec 2011
• Strategic Partners include Citrix, IDC Frontier,
Yahoo! Japan, and Microsoft
Slide 4
Slide 4 text
Riak
• Master-Slave architecture
• Application sharding
• Distributed model
• Active-Active with write scalability
• Active-Passive
Say “NO” to
Say “Yes” to
Slide 5
Slide 5 text
Riak
Riak is a distributed NoSQL key-value store.
Simple operations
• Get
• Put
• Delete
Slide 6
Slide 6 text
Key-Value Data Model
• Keys are grouped into buckets.
• All data (objects) are referenced by
keys
• Object is composed of metadata and
value
Object/key Operations
KEY
VALUE
KEY
VALUE
KEY
VALUE
bucket
Slide 7
Slide 7 text
Masterless & Highly Available
Any node can serve
client requests
Fallbacks are used
when nodes are
down
Always accepts
read and write
requests
Per-request
quorums
Slide 8
Slide 8 text
Consistent Hashing & The Ring
• 160-bit integer keyspace
• Divided into fixed number of
evenly-sized partitions
• Partitions are claimed by nodes in
the cluster
• Replicas go to the N partitions
following the key
32 partitions
N=3
node 0
node 1
node 2
node 3
hash(“product/iphone”)
2160/4
2160/2
0
Slide 9
Slide 9 text
Failure Scenario
• Node fails
• Requests go to fallback nodes
hash(“product/iphone”)
node 0
node 1
node 2
node 3
X
X
X
X
X
X
X
X
Slide 10
Slide 10 text
Hinted Handoff
• Node comes back
• “Handoff” - data returns to
recovered node
• Normal operations resume
hash(“product/iphone”)
node 0
node 1
node 2
node 3
Slide 11
Slide 11 text
Riak’s core capability
Scalable
Add commodity hardware to get more
[throughput | processing | storage]
Slide 12
Slide 12 text
Riak’s core capability
Fault Tolerant
All nodes participate equally (no SPOF)
All data is replicated (n=3 by default)
Cluster transparently survives node failure & network partition
Slide 13
Slide 13 text
Tunable Consistency
• n_val - number of replica to store; bucket-level
setting. Defaults to “3”.
• w - number of replicas required for a successful
write; Defaults to “2”.
• r - number of replica acks required for a successful
read. request-level setting. Defaults to “2”.
• pr, pw & dw
• Tweak consistency vs. availability
Slide 14
Slide 14 text
Two APIs
HTTP (just like the web)
Protocol Bu"ers (thank you, Google)
Client Libraries
Ruby, Node.js, Java, Python, Perl, Erlang, PHP, C, Scala, Haskell,
Lisp,.NET, Play, and more (supported by either Basho or the
community).
Slide 15
Slide 15 text
Riak Backend
• Riak has a pluggable backend architecture
• Bitcask, LevelDB are used the most in
production depending on use-case
• All writes are appends to a file
• This provides crash safety and fast writes
Slide 16
Slide 16 text
Accessing Data in Riak
Retrieving Single
Objects
• Support for retrieving the object associated with a particular bucket / key
• Support for retrieving all of the keys associated with a particular bucket
Object/Key Operations
Collecting, Parsing,
and Storing Data
• Distributed, full-text search engine with an easy-to-use query
language, a Solr-like HTTP interface and a Apache Lucene-style
query syntax
• Support for a wide variety of mime types, including JSON, plain
text, XML and Erlang)
• Ideal for indexing JSON documents, as indexes are built
automatically from a schema.
Riak Search
Seeking Reverse
Lookups on Data
Stored
• Provides the ability, at write time, to tag an object stored in Riak
with one or more values (key/value metadata), which can then
be queried
• Useful for finding data that is based on terms other than an
objects’ bucket/key pair, or for adding metadata values to a
binary object or opaque blob
Secondary Indexes (Riak 2i)
Processing a Large
Dataset
• Provides the general ability to analyze and aggregate data in
phases with data locality
• Features Javascript support and Erlang for performance benefit
MapReduce
Riak Search
and 2I
Query
Results Can
be Used as
an Input to
MapReduce
What Riak Isn’t
• NOT Relational
• No fixed schema
• NOT Right for Every Project
• Large Objects (Riak CS is a good fit here)
• Dynamic Queries(SQL)
Slide 20
Slide 20 text
Modeling Applications in Riak
Slide 21
Slide 21 text
Ideal Riak Scenarios
• When you have enough data to
require >1 physical machine
(preferably > 4)
• When availability is the top
requirement
• When your data can be modeled
as keys and values
When to Use
Popular Use Cases
• Ad Networks
• Digital Media
• On-Line Games
• Social Networks
• Social Analysis
• Cloud Operators
• Messaging Services
• Product Catalogs
• Document Management
• Health Care Information Management
Slide 22
Slide 22 text
Riak Production Users- growing & growing ….
Mobile, Retail & Social
Cloud Computing & Advertising
Security and Others
Gaming, Payments and Others
Slide 23
Slide 23 text
Web / Mobile App Growth
Case Study for Top Rated Apple App Store App
• #4 most popular Apple App
Store Social Networking App at
EOY behind Facebook, Skype
and Twitter
• Truly Viral Growth: Scaled 10x
between Thanksgiving and New
Years Day
• Required scaling across multiple
IaaS / hosting providers
• Surpassed one billion operations
per day
Slide 24
Slide 24 text
Mobile-to-Mobile Content Store
Bump – Low Latency and Always Available
• 800 million pieces of
structural data in Riak,
including Photos, Chats, and
Contact Cards.
• 10 million active users
• 77 million downloads to date
• Switched to Riak in
August 2011
• #7 Most Downloaded
iPhone App
Slide 25
Slide 25 text
Enstratus
• It is a cloud infrastructure management solution
for deploying and managing enterprise-class
applications
• Moved from MySQL to Riak. Reasons-
• Write Scalability
• Resilience to failure across multiple
datacenters
• Stores machine and state information, and data
supporting analytics and audit control.
• George Reese gave an excellent talk during Ricon
last year, link below
http://vimeo.com/54887751
“As I’ve looked at a number of problem domains
from customers and our own systems, you see this
pattern where a relational database has been used
just because it’s the default… and the reality is that
more of the world is eventually consistent than not”,
said George Reese, CTO of enStratus
Slide 26
Slide 26 text
Riak MDC- (EDS)
Cloud Mobile Social
Data Center
#2
Data Center #3
Data Center
#1
Multi-Data Center
Replication
Applications, Users and
Machines Generate Data
1
2 Riak Stores and Manages Data
Efficiently and Effectively
• Clusters are local to regional
users to solve latency
• Replication is uni-directional,
remote clusters can be setup
to replicate data back to a
primary cluster, thus
synchronizing bi-directionally.
• Easily deploy in many regional
zones
• Write everywhere solution
• Easy to scale, can easily add
additional data centers
Slide 27
Slide 27 text
Full-sync Replication
Slide 28
Slide 28 text
Real-time Replication
Slide 29
Slide 29 text
Product Information Repository
Slide 30
Slide 30 text
Multi-Device Session Store
Case Study Showcases Seamless User Experience
The Global Session Store Manages a Seamless User
Session throughout a Customer’s multi-mode
experience, from Web to device
Philadelphia
Data Center
Denver
Slide 31
Slide 31 text
Backups
• Bitcask and LevelDB are both Log-structure stores; cp, rsync,
tar, custom backup tools will work
• FS-level snapshots of directory can be done while node is
running
• Backups aren't yet perfected and that future releases will have
more efficient, specialized backup methods for each backend
Slide 32
Slide 32 text
Stats and Monitoring (1)
• Riak exposes data about current operating status (counters,
histograms, etc.) via the HTTP /stats endpoint or ‘riak-
admin status’
• Anything that speaks HTTP can be plugged into Riak
• Plugins exist for most OSS monitoring tools (munin, cacti,
nagios, graphite, statsd)
Slide 33
Slide 33 text
Stats and Monitoring (2)
• ‘riaknostic’ is a suite of diagnostic checks that can used to debug your
cluster before it’s in production; checks for common misconfigurations
• Riak Control is a full-fledged management GUI that Basho develops and
maintains.
Slide 34
Slide 34 text
Riak 1.3 – GA Now
• Active Anti Entropy
• Replication enhancements for MDC
• IPv6 support
• New Look for Riak Control
Slide 35
Slide 35 text
What is Riak CS?
Key features:
• Multi-Tenant support
• User Authentication and Authorization
• Amazon S3 API-compatibility
• Per-Tenant visibility
• Provisioning, Metering, Billing and Reporting
• Multi part upload up to 5TB
Slide 36
Slide 36 text
Riak CS in Action
Slide 37
Slide 37 text
Reporting
Large Objects
AuthZ
Riak CS Use Cases
Storage for
Cloud
Computing
S3 Without AWS Cloud Drive
(General Content Storage)
Backup-as-a-
Service
Archival and
Preservation
Integration with
Workflow
Multi-Tenancy
Slide 38
Slide 38 text
Multi-Datacenter Replication
• Multi-site storage replication
• Data locality
• Availability in disaster scenarios
• Active backup
Slide 39
Slide 39 text
Multi-Datacenter Replication
• Global information for users, buckets
and manifests is streamed in real-
time
• Objects are replicated in full or real-
time sync mode
• If a client requests an object from a
site but not all of the blocks that
constitute that object have been
replicated to that site, missing blocks
will be requested and streamed
from the “origin” cluster
How It Works
Slide 40
Slide 40 text
Riak CS Roadmap
• Swift API
• Keystone Integration
• S3 Features
• COPY Object
• Object Versioning
• Cloud Stack Integration
Slide 41
Slide 41 text
Riak 1.4 (Roadmap)
• Dynamic Ring Sizing
• 2i Pagination
• Performance and Scaling Improvements
Slide 42
Slide 42 text
Riak EDS or CS
Does data unavailability costs thousands of $/minute?
Riak EDS (Enterprise Data Store)
Do you want to build a cloud storage service for your business?
Riak CS (Cloud Storage)
Slide 43
Slide 43 text
Basho’s Product Family
Distributed Data Technology is Our Passion
EnterpriseDS
Open Source
Distributed Database
Commercial
Distributed Database
Distributed Cloud
Storage Platform
• Always-available,
scalable, low-cost
NoSQL database
• Over 35,000
Downloads per Month
• Thousands of users
worldwide
• Available Since Sept
2009
• Version 1.0 unveiled
September 2011
• Subsequent versions
released along side
Riak EDS
• Adds multi-data
center replication,
monitoring & 24x7
support
• Requires commercial
contract and secure
download
• Version 1.1 launched
with Riak Control in
Feb 2012
• Version 1.2 launched
in August 2012
• Version 1.3 launched
in February 2013
• Expands with multi-
tenancy, large
object support,
metering and
Amazon S3 API
• Launched on March
27, 2012
• Used by multiple
global cloud
operators
• Software released to
open source on
March 20th
Slide 44
Slide 44 text
Resources
Basho Docs
http://docs.basho.com/
Riak Fast Track
http://docs.basho.com/riak/1.1.4/tutorials/fast-track/
http://docs.basho.com/riakcs/latest/riakcs-tutorials/fast-track/
Basho Blog
http://basho.com/blog/technical/