What`s in store?
• At a High Level
• For Developers
• Under the Hood
• When and Why
• Some Use Cases
• Commercial Extensions
• Latest Release and 1.3
Slide 4
Slide 4 text
At a High Level
Slide 5
Slide 5 text
• Built on Amazon principles (Dynamo paper)
• Key/value data model
• with some extras: search, MapReduce, 2i,
links, pre- and post-commit hooks, pluggable
backends, HTTP and binary interfaces
• Written in Erlang with C/C++
• Open source under Apache 2 License
Riak
Retail / eCommerce Use Cases
• Shopping cart functionality
• Must be highly available
• High latency is perceived as unavailability
• Withstands node failure, network partition,
datacenter failure
• Many of the same architectural principles that
power Amazon’s shopping cart
Slide 8
Slide 8 text
Retail / eCommerce Use Cases
• Product Catalog
• Up to tens of thousands or more inventory items
• Content agnostic: images, video, text, JSON/XML/
HTML documents
• Add and serve product data even under failure
conditions
• Scale out without sharding
Slide 9
Slide 9 text
Retail / eCommerce Use Cases
• API Platforms
• Expose data as a platform to internal and external
client, developers and partners/affiliates
• Flexible, schemaless design
• RESTful HTTP API, protocol buffers and many
client libraries
• Throughput and capacity scales linearly with
growth
Slide 10
Slide 10 text
Retail / eCommerce Use Cases
• Mobile Applications
• Riak powers top consumer mobile apps including
Bump and Voxer
• Fast, small object storage
• Designed for concurrency to meet mobile client
request patterns
Slide 11
Slide 11 text
For Developers
Slide 12
Slide 12 text
Riak is a database that stores keys
against values. Keys are grouped
into a higher-level namespace
called buckets.
Slide 13
Slide 13 text
Riak doesn’t care what you store.
It will accept any data type; things
are stored on disk as binaries.
Slide 14
Slide 14 text
No content
Slide 15
Slide 15 text
No content
Slide 16
Slide 16 text
No content
Slide 17
Slide 17 text
No content
Slide 18
Slide 18 text
Examples
Type Key Value
Item in Product
Inventory
Product Name,
SKU or ID
JSON, XML or
Text, HTML doc
Product
Advertising
Campaign ID Ad Content
User Profile Login, Email,
UUID
User attributes
(often, JSON doc)
Image or Video
Content
Content Name, ID
or Integer
Image or video file
format
Session
Information
User/Session ID Session Data
Slide 19
Slide 19 text
Two APIs
1. HTTP (just like the web)
2. Protocol Buffers (thank you, Google)
Slide 20
Slide 20 text
Querying
GET/PUT/DELETE
MapReduce: Filtering product info by tag,
counting items, extracting links
Full-Text Search: Searching product info or
descriptions
Secondary Indexes (2i): Tagging products with
categories, promotion identifiers, etc.
Slide 21
Slide 21 text
Client Libraries
Ruby, Node.js, Java, Python, Perl,
OCaml, Erlang, PHP, C, Squeak,
Smalltalk, Pharoah, Clojure, Scala,
Haskell, Lisp, Go, .NET, Play, and
more (supported by either Basho or
the community).
Slide 22
Slide 22 text
Under the Hood
Slide 23
Slide 23 text
Hard problems in databases:
Single points of failure.
ALL NODES ARE DECLARED EQUAL.
write
read
read
write
write
write
read
write
read
Slide 29
Slide 29 text
Hard problems in databases:
Where to put the data.
Slide 30
Slide 30 text
Sharding in Relational Systems…
A - D
E - K
L - P
Q - T
U - Z
Slide 31
Slide 31 text
It Hurts.
• Hot spots
• Unevenly spread data and request patterns
• Resharding is operationally intensive,
often manual
A - D
E - K
L - P
Q - T
U - Z
Slide 32
Slide 32 text
Don’t Shard.
Riak’s Consistent Hashing
• Evenly spreads data around the cluster
• Automatically rebalances data when machines
are added
Slide 33
Slide 33 text
No content
Slide 34
Slide 34 text
No content
Slide 35
Slide 35 text
No content
Slide 36
Slide 36 text
No content
Slide 37
Slide 37 text
No content
Slide 38
Slide 38 text
No content
Slide 39
Slide 39 text
No content
Slide 40
Slide 40 text
No content
Slide 41
Slide 41 text
No content
Slide 42
Slide 42 text
No content
Slide 43
Slide 43 text
No content
Slide 44
Slide 44 text
Riak: when and why
Slide 45
Slide 45 text
When Might Riak Make Sense
When you have enough data to require >1
physical machine (preferably >5)
When availability is more important than
consistency (think “critical data”on “big
data”)
When your data can be modeled as keys and
values; don’t be afraid to denormalize
Slide 46
Slide 46 text
• Case study on Basho.com
• Millions of users
• Highly available, event-based shopping
experience
• “Riak is one of those things that just works
and doesn’t need our attention on a day-to-
day basis, saving both time and money.”
Slide 47
Slide 47 text
http://vimeo.com/54384814
Slide 48
Slide 48 text
Ad Serving
• OpenX will serve ~4T ad
in 2012
• Started with CouchDB
and Cassandra for
various parts of
infrastructure
• Now consolidating on
Riak and Riak Core
• Video on Ricon2012.com
Slide 49
Slide 49 text
Mobile Apps
• Bump – easy to share contact info, photos, other
objects
• Picked Riak for operational ease of use
• “It does what it’s supposed to do; nodes can go down but
Riak will still work. It’s great to be able to deal with node
failures the next day instead of at 3am.”
Slide 50
Slide 50 text
• Copious – eCommerce
marketplace
• Uses Riak to store all
registered accounts and
tokens for social media
login
• 100s of thousands of
keys
Slide 51
Slide 51 text
Application Essentials….
• Session storage
• Log files
• User data
Slide 52
Slide 52 text
Riak : Hybrid Solutions
• Riak with Postgres
• Riak with Elastic Search
• Riak with Hadoop
• Secondary analytics clusters
Slide 53
Slide 53 text
Try Us On…
• Amazon AMIs
• EngineYard beta (more details next week)
• Microsoft Azure VM Depot
• Riakon.com
Slide 54
Slide 54 text
Buy Some Software...
Slide 55
Slide 55 text
Riak Enterprise
• Multi-datacenter replication
• Real-time or full sync
Slide 56
Slide 56 text
Use Cases
• Data locality to serve clients and partners at low-
latency anywhere in the world
• Failover to other sites in the event of data center
failure
• Full sync and real-time sync, can be configured uni-
directionally or bi-directionally
Slide 57
Slide 57 text
Riak Cloud Storage
• Large object support
• S3-compatible API
• Multi-tenancy
• Reporting on usage