• @basho
• basho.com
• github.com/basho
• docs.basho.com
Us
Slide 3
Slide 3 text
No content
Slide 4
Slide 4 text
What`s in store?
• At a High Level
• For Developers
• Under the Hood
• When and Why
• Some Use Cases
• Commercial Extensions
• 1.3 and Roadmap
Slide 5
Slide 5 text
At a High Level
Slide 6
Slide 6 text
• Dynamo-inspired key/value store
• with some extras: search, MapReduce, 2i,
links, pre- and post-commit hooks, pluggable
backends, HTTP and binary interfaces
• Written in Erlang with C/C++
• Open source under Apache 2 License
Riak
Riak’s Design Goals (2)
• Design Informed by Brewer’s CAP Theorem
and Amazon’s Dynamo Paper
• Riak is tuned to offer availability above all else
Slide 9
Slide 9 text
For Developers
Slide 10
Slide 10 text
Riak is a database that stores keys
against values. Keys are grouped
into a higher-level namespace
called buckets.
Slide 11
Slide 11 text
Riak doesn’t care what you store.
It will accept any data type; things
are stored on disk as binaries.
Slide 12
Slide 12 text
No content
Slide 13
Slide 13 text
No content
Slide 14
Slide 14 text
No content
Slide 15
Slide 15 text
No content
Slide 16
Slide 16 text
Examples
Application Type Key Value
Session User/Session ID Session Data
Advertising Campaign ID Ad Content
Logs Date Log File
Sensor Date, Date/Time Sensor Updates
User Data Login, eMail, UUID User Attributes
Content Title, Integer Text, JSON/XML/
HTTP document,
images, etc.
Slide 17
Slide 17 text
Two APIs
1. HTTP (just like the web)
2. Protocol Buffers (thank you, Google)
Client Libraries
Ruby, Node.js, Java, Python, Perl,
OCaml, Erlang, PHP, C, Squeak,
Smalltalk, Pharoah, Clojure, Scala,
Haskell, Lisp, Go, .NET, Play, and
more (supported by either Basho or
the community).
Slide 20
Slide 20 text
Under the Hood
Slide 21
Slide 21 text
Consistent Hashing and Replicas
Handoff and Rebalancing
Slide 22
Slide 22 text
Masterless; deployed as a
cluster of nodes
Slide 23
Slide 23 text
No content
Slide 24
Slide 24 text
No content
Slide 25
Slide 25 text
No content
Slide 26
Slide 26 text
No content
Slide 27
Slide 27 text
No content
Slide 28
Slide 28 text
No content
Slide 29
Slide 29 text
No content
Slide 30
Slide 30 text
No content
Slide 31
Slide 31 text
No content
Slide 32
Slide 32 text
No content
Slide 33
Slide 33 text
No content
Slide 34
Slide 34 text
Riak 1.3
Slide 35
Slide 35 text
• Automatic, self-healing property
• Repairs divergent, missing or corrupt replicas
caused by hardware failure, bad disks, data
corruption, and other failure modes
• Useful for large clusters, long term storage
• Uses hash tree exchange
• Minimal performance impact
• More on our blog
Active Anti-Entropy
Slide 36
Slide 36 text
New Look for Riak Control
Slide 37
Slide 37 text
• Expanded IPv6 support (protocol buffers and
handoff interfaces)
• MapReduce backpressure
• Optionally send log messages to syslog
Other Features
Slide 38
Slide 38 text
Riak: when and why
Slide 39
Slide 39 text
When Might Riak Make Sense
When you have enough data to require >1
physical machine (preferably >5)
When availability is more important than
consistency (think “critical data”on “big
data”)
When your data can be modeled as keys and
values; don’t be afraid to denormalize
Slide 40
Slide 40 text
• Cloud infrastructure management
• Machine, customer and API data
• “Design for failure” architecture
“Enstratius relies on Riak to ensure that our cloud
infrastructure management platform scales seamlessly, without
interruption and performance bottlenecks, while meeting and
exceeding internal requirements for high availability and data
durability.”
Slide 41
Slide 41 text
• Scaling writes in MySQL became a bottleneck
• Master/slave replication made master nodes
a single point of failure
• Multi-site replication
ß vimeo.com/bashotech
Slide 42
Slide 42 text
ß ricon2012.com
• Re-platform of e-commerce platform
Slide 43
Slide 43 text
Session Storage
• First Basho customer in
2009
• Every hit to a Mochi web
property results in at
least one read, maybe
write to Riak
• Unavailability or high
latency = lost ad revenue
Slide 44
Slide 44 text
Ad Serving
• OpenX served ~4T ad in
2012
• Started with CouchDB
and Cassandra for
various parts of
infrastructure
• Now consolidating on
Riak and Riak Core
Slide 45
Slide 45 text
Riak for All Storage: Voxer
Slide 46
Slide 46 text
Voxer: Post Growth
• ~60 Nodes total in prod
• 100s of TBs of data (>1TB daily)
• ~400k Concurrent Users
• Billions of daily Requests
Slide 47
Slide 47 text
Riak : Hybrid Solutions
• Riak with Postgres
• Riak with Elastic Search
• Riak with Hadoop
• Secondary analytics clusters
Slide 48
Slide 48 text
Try Us On…
• Amazon AMIs
• Engine Yard
• Microsoft Azure VM Depot
• Riakon.com
Slide 49
Slide 49 text
Commercial Software
Slide 50
Slide 50 text
Riak Enterprise
• Multi-datacenter replication
• Real-time or full sync
• 24/7 support
Slide 51
Slide 51 text
• Faster, with more connections between clusters
• Easier set up and configuration
• Better per-connection statistics
• Already used in production by many Riak
Enterprise customers
Advanced Multi-Datacenter
Slide 52
Slide 52 text
Riak Cloud Storage
• Large object support
• S3-compatible API
• Multi-tenancy
• Reporting on usage
• Now open source
Slide 53
Slide 53 text
Roadmap Stuff...
Slide 54
Slide 54 text
Future Work
• CRDTs
• Tight Solr integration
• Greater consistency
• Lots of other good stuff, check Github