• @basho
• basho.com
• github.com/basho
• docs.basho.com
Us
Slide 3
Slide 3 text
No content
Slide 4
Slide 4 text
What`s in store?
• At a High Level
• For Developers
• Under the Hood
• When and Why
• Some Use Cases
• Commercial Extensions
• 1.4 and Roadmap
Slide 5
Slide 5 text
At a High Level
Slide 6
Slide 6 text
• Dynamo-inspired key/value store
• with some extras: search, MapReduce, 2i,
links, pre- and post-commit hooks, pluggable
backends, HTTP and binary interfaces
• Written in Erlang with C/C++
• Open source under Apache 2 License
Riak
Riak’s Design Goals (2)
• Design informed by Brewer’s CAP Theorem
and Amazon’s Dynamo Paper
• Riak is tuned to offer availability above all else
Slide 9
Slide 9 text
For Developers
Slide 10
Slide 10 text
Riak is a database that stores keys
against values. Keys are grouped
into a higher-level namespace
called buckets.
Slide 11
Slide 11 text
Riak doesn’t care what you store. It
will accept any data type; things are
stored on disk as binaries.
Slide 12
Slide 12 text
No content
Slide 13
Slide 13 text
No content
Slide 14
Slide 14 text
No content
Slide 15
Slide 15 text
No content
Slide 16
Slide 16 text
Examples
Application Type Key Value
Session User/Session ID Session Data
Advertising Campaign ID Ad Content
Logs Date Log File
Sensor Date, Date/Time Sensor Updates
User Data Login, eMail, UUID User Attributes
Content Title, Integer Text, JSON/XML/
HTTP document,
images, etc.
Slide 17
Slide 17 text
Two APIs
1. HTTP (just like the web)
2. Protocol Buffers (thank you, Google)
Client Libraries
Ruby, Node.js, Java, Python, Perl,
OCaml, Erlang, PHP, C, Squeak,
Smalltalk, Pharoah, Clojure, Scala,
Haskell, Lisp, Go, .NET, Play, and
more (supported by either Basho or
the community).
Slide 20
Slide 20 text
Under the Hood
Slide 21
Slide 21 text
Consistent Hashing and Replicas
Handoff and Rebalancing
Slide 22
Slide 22 text
Masterless; deployed as a
cluster of nodes
Slide 23
Slide 23 text
No content
Slide 24
Slide 24 text
No content
Slide 25
Slide 25 text
No content
Slide 26
Slide 26 text
No content
Slide 27
Slide 27 text
No content
Slide 28
Slide 28 text
No content
Slide 29
Slide 29 text
No content
Slide 30
Slide 30 text
No content
Slide 31
Slide 31 text
No content
Slide 32
Slide 32 text
No content
Slide 33
Slide 33 text
No content
Slide 34
Slide 34 text
• Automatic, self-healing property
• Repairs divergent, missing or corrupt replicas
caused by hardware failure, bad disks, data
corruption, and other failure modes
• Useful for large clusters, long term storage
• Uses hash tree exchange
• Minimal performance impact
• More on our blog
Active Anti-Entropy
Slide 35
Slide 35 text
Riak 1.4
Slide 36
Slide 36 text
Eventually Consistent Counters
• First publicly available, distributed data type in
Riak
• PN Counters are capable of being both
incremented (P) and decremented (N)
• Provide automatic conflict resolution after a
network partition
Slide 37
Slide 37 text
Secondary Indexing Improvements
• 2i queries are now sorted and client can
request only first “n” results
• Pagination also allows queries to begin where
“n” left off to deliver the rest of the results (can
paginate through lists in order)
• Can also view start value, continuation value,
end value, min/max
Slide 38
Slide 38 text
Staging in Riak Control
Slide 39
Slide 39 text
• Progress bar for Handoff
• Reduced object storage overhead – best for
small objects
• Updated PB properties
• Overload Protection for vnode processes
• Cascading real-time writes for Riak Enterprise
multi-datacenter replication
Other Features
Slide 40
Slide 40 text
Riak: When and Why
Slide 41
Slide 41 text
When Might Riak Make Sense
When you have enough data to require >1
physical machine (preferably >5)
When availability is more important than
consistency (think “critical data”on “big
data”)
When your data can be modeled as keys and
values; don’t be afraid to denormalize
Slide 42
Slide 42 text
User Case Studies
Slide 43
Slide 43 text
• Cloud infrastructure management
• Machine, customer, and API data
• “Design for failure” architecture
“Enstratius relies on Riak to ensure that our cloud
infrastructure management platform scales seamlessly, without
interruption and performance bottlenecks, while meeting and
exceeding internal requirements for high availability and data
durability.”
Slide 44
Slide 44 text
• Scaling writes in MySQL became a bottleneck
• Master/slave replication made master nodes
a single point of failure
• Multi-site replication
ß vimeo.com/bashotech
Slide 45
Slide 45 text
ß ricon.io/archive/
ricon2012.html
• Re-platform of e-commerce platform
Slide 46
Slide 46 text
Social Authentication
• Social commerce
marketplace
• Uses Riak to store all
registered accounts and
tokens for Facebook/
Twitter logins
• Looking to move more
data over due to
operational simplicity
Slide 47
Slide 47 text
Session Storage
• First Basho customer in
2009
• Every hit to a Mochi web
property results in at
least one read, maybe
write to Riak
• Unavailability or high
latency = lost ad revenue
Slide 48
Slide 48 text
Ad Serving
• OpenX served ~4T ad in
2012
• Started with CouchDB
and Cassandra for
various parts of
infrastructure
• Now consolidating on
Riak and Riak Core
Slide 49
Slide 49 text
Riak for All Storage: Voxer
Slide 50
Slide 50 text
Voxer: Post Growth
• ~60 Nodes total in prod
• 100s of TBs of data (>1TB daily)
• ~400k Concurrent Users
• Billions of daily Requests
Slide 51
Slide 51 text
Riak : Hybrid Solutions
• Riak with Postgres
• Riak with Elastic Search
• Riak with Hadoop
• Secondary analytics clusters
Slide 52
Slide 52 text
Try Us On…
• Amazon AMIs
• Engine Yard
• Microsoft Azure VM Depot
• SoftLayer
Slide 53
Slide 53 text
Commercial Software
Slide 54
Slide 54 text
Riak Enterprise
• Multi-datacenter replication
• Real-time or full sync
• 24/7 support
Slide 55
Slide 55 text
• Faster, with more connections between clusters
• Easier set up and configuration
• Better per-connection statistics
• Supports SSL, NAT, and full sync scheduling
Replication in Riak 1.4
Slide 56
Slide 56 text
Riak CS (cloud storage)
• Large object support
• S3-compatible API
• Multi-tenancy
• Reporting on usage
• Now open source
• Riak CS 1.4 coming soon…
Slide 57
Slide 57 text
Roadmap Stuff...
Slide 58
Slide 58 text
Future Work
• Tight Solr integration
• Greater consistency
• Faster data transfer between clusters
• Dynamic Ring resizing
• Lots of other good stuff, check Github
Slide 59
Slide 59 text
RICON.io
A distributed systems
conference
RICON25Web for 25% off