Introduction to Riak
...or why you need Riak ;)
Basho Technologies
Chris Molozian ([email protected])
1
Tuesday, 19 February 13
Slide 2
Slide 2 text
What is Riak?
• Key-Value Store + Extras
• Distributed, horizontally scalable
• Fault-tolerant
• Highly-available
• Built for the Web
• Inspired by Amazon’s Dynamo
2
Tuesday, 19 February 13
Slide 3
Slide 3 text
Key-Value
• Simple operations - GET, PUT, DELETE
• Value is opaque (mostly), with metadata
• Extras
• Secondary Indexes (2i)
• Links
• Full-text search (optional)
• Map/Reduce
3
Tuesday, 19 February 13
Slide 4
Slide 4 text
K/V Data Model
• All “Riak Object”(s) are referenced by keys
• Keys are grouped into buckets
(only a logical partitioning scheme!)
• Simple operations: GET, PUT, DELETE
• Object is composed of metadata and value
4
Tuesday, 19 February 13
Slide 5
Slide 5 text
key value
bucket
key value
key value
key value
cmolozian {!rstname: “Chris”, lastname: “Molozian”}
JSON, XML, YAML, BINARY...etc
5
Tuesday, 19 February 13
Slide 6
Slide 6 text
Distributed &
Horizontally Scalable
• Default Con!guration is optimized for a
cluster
• Query load and data are spread evenly
• Add more nodes and get more:
• ops/second
• storage capacity
• compute power (for Map/Reduce)
6
Tuesday, 19 February 13
Slide 7
Slide 7 text
Fault Tolerant (1)
• All nodes participate equally -
no single point of failure (SPOF)
• All data is replicated
• Cluster transparently survives...
• node failure
• network partitions
• Built on Erlang/OTP (designed for FT)
7
Tuesday, 19 February 13
Slide 8
Slide 8 text
Fault Tolerant (2)
• Voxer, use Riak extensively
Voxer is a Walkie Talkie application for smartphones.
Messages stream live as you talk and your friends join you
live or listen later.
• Fault tolerance, in the real world:
8
Tuesday, 19 February 13
Slide 9
Slide 9 text
Inspired by Amazon
Dynamo
• Masterless, peer-coordinated replication
• Consistent hashing
• Eventually consistent
• Quorum reads and writes
• Anti-Entropy - Read Repair & Hinted Hando"
9
Tuesday, 19 February 13
Slide 10
Slide 10 text
Consistent Hashing
• 160-bit integer keyspace
• divided into !xed number
of evenly-sized partitions
• partitions are claimed by
nodes in the cluster
• replicas go to the N
partitions following the
key
32 partitions
node 0
node 1
node 2
node 3
0
2160/2
2160/4
hash(“user_id”)
N=3
10
Tuesday, 19 February 13
Slide 11
Slide 11 text
Highly-Available
• Any node can serve client requests
• Fallbacks are used when nodes are down
• Always accepts read and write requests
• Per-request quorums
11
Tuesday, 19 February 13
Slide 12
Slide 12 text
Request Quorums
• Every request contacts all replicas of key
• N - number of replicas (default 3)
• R - read quorum
• W - write quorum
Quorum:
The quantity of replicas that must respond to a read or write request
before it is considered successful. (default 2)
Calculated as n_val / 2 + 1
12
Tuesday, 19 February 13
Slide 13
Slide 13 text
Disaster Scenario
• Node fails
• Requests go to fallback
• Node comes back
• “Hando"” - data returns
to recovered node
• Normal operations
resume
X
X
X
X
X
X
X
X
hash(“user_id”)
13
Tuesday, 19 February 13
Slide 14
Slide 14 text
Built for the Web
• HTTP is default (but not only) interface
• HTTP REST API (via Webmachine)
• HTTP Speci!cation Compliant -
Reverse Proxy Caches, Load Balancers,
Web Servers
• Suitable for many web applications
14
Tuesday, 19 February 13
Slide 15
Slide 15 text
Other Extras
• Pre/Post commit hooks
• Multiple Storage Engines
• Bitcask
• LevelDB
• Memory
• Multi
15
Tuesday, 19 February 13
Slide 16
Slide 16 text
Which Storage Engine?
• Bitcask - bounded data (like reference data)
i.e. !nancial instruments
• LevelDB - unbounded data or advanced query
• Memory - highly transient data
• Multi - No reason not to use it!
(approx 2x number of open !le handles)
16
Tuesday, 19 February 13
Slide 17
Slide 17 text
Application Design
• No intrinsic schema
• Your application de!nes:
• Structure
• Semantics
• Your application resolves con#icts (or uses
Last Write Wins)
17
Tuesday, 19 February 13
Slide 18
Slide 18 text
Con#ict Resolution
• Concurrent actors modifying the same data
cause data divergence.
• Riak provides two solutions to manage this:
• Last Write Wins
Naive approach but works for some use cases
• Vector Clocks
Retain “sibling” copies of data for merging
18
Tuesday, 19 February 13
Slide 19
Slide 19 text
Vector Clocks
• Every node has an ID
• Send last-seen vector clock in every “put” or
“delete” request
• Riak tracks history of updates
• Auto-resolves stale versions
• Let’s you handle con#icts
19
Tuesday, 19 February 13
Key-Value
• Content-Types
• Denormalize
• Meaningful or “application speci!c” keys
• Composite keys (e.g. Ranking List)
___
• Time-boxing
• References (value is a key or list of keys)
21
Tuesday, 19 February 13
Full-text Search
• Designed for searching prose
• Lucene/Solr-like query interface
• Automatically indexes k/v pairs
• Input to Map/Reduce
• Customizable index schemas
23
Tuesday, 19 February 13
Slide 24
Slide 24 text
Secondary Indexes (2i)
• De!ned as metadata
• Two index types: _int and _string
• Two query types: equal and range
• Input to Map/Reduce
24
Tuesday, 19 February 13
Slide 25
Slide 25 text
Map/Reduce (1)
• Typically to interact with data, we pull from a
database
• Costly, requires copying data into the app
• Moves the data processing to the data
Compute operations are sent to the database
• Advantages:
Scales more e$ciently and,
Takes advantage of compute power on the db server
25
Tuesday, 19 February 13
Slide 26
Slide 26 text
Map/Reduce (2)
• For more involved queries
• Specify the input keys
• Process data in “map” and “reduce”
functions
• Javascript or Erlang
• Not designed for real-time processing
26
Tuesday, 19 February 13
Slide 27
Slide 27 text
• HTTP REST or optimized binary interface (PB)
• O$cial Basho supported:
• Community: C#, C/C++, Haskell, Clojure, Scala,
Go, PHP and many others
Client Libraries
27
Tuesday, 19 February 13
Slide 28
Slide 28 text
28
Tuesday, 19 February 13
Slide 29
Slide 29 text
Riak Cloud Storage
• Released March 27, 2012
• S3 Protocol-compatible cloud storage
• Built on Riak
• Fault tolerant, distributed, highly-available
• Multi-tenancy, Multi billing, etc...
• Perfect for building your own private data
storage cloud
29
Tuesday, 19 February 13
Slide 30
Slide 30 text
Riak CS
Large
Object
Reporting
API
S3 API
Riak CS
Reporting
API
S3 API
Riak CS
Reporting
API
S3 API
Riak CS
Reporting
API
S3 API
Riak CS
Reporting
API
S3 API
Riak
Node
Riak
Node
Riak
Node
Riak
Node
Riak
Node
1mb 1mb
1mb 1mb
30
Tuesday, 19 February 13
Slide 31
Slide 31 text
Riak Use Cases
• Reliability, #exibility, scalability
• Session Data
• Serving Advertising
• Log and Sensor Data
• Content Addressable Storage (CAS)
• Private Cloud [S3 API] - Riak CS
• Wherever low latency increases revenue
31
Tuesday, 19 February 13
Slide 32
Slide 32 text
Basho Technologies
• Founded in 2008 by a group of engineers and
executives from Akamai Technologies, Inc.
• Design large scale distributed systems
• Develop Riak, open-source distributed
database
• Specialize in storing critical information, with
data integrity
• O$ces in US, Europe (London) and Japan
32
Tuesday, 19 February 13
Slide 33
Slide 33 text
Basho EMEA
33
Tuesday, 19 February 13
Slide 34
Slide 34 text
Questions?
Chris Molozian, [email protected]
34
Tuesday, 19 February 13
Slide 35
Slide 35 text
Want to know more?
We will come and give a Riak tech talk at your
organisation or group:
bit.ly/RiakTechTalk
35
Tuesday, 19 February 13