Slide 1

Slide 1 text

Introduction to Riak ...or why you need Riak ;) Basho Technologies Chris Molozian ([email protected]) 1 Tuesday, 19 February 13

Slide 2

Slide 2 text

What is Riak? • Key-Value Store + Extras • Distributed, horizontally scalable • Fault-tolerant • Highly-available • Built for the Web • Inspired by Amazon’s Dynamo 2 Tuesday, 19 February 13

Slide 3

Slide 3 text

Key-Value • Simple operations - GET, PUT, DELETE • Value is opaque (mostly), with metadata • Extras • Secondary Indexes (2i) • Links • Full-text search (optional) • Map/Reduce 3 Tuesday, 19 February 13

Slide 4

Slide 4 text

K/V Data Model • All “Riak Object”(s) are referenced by keys • Keys are grouped into buckets (only a logical partitioning scheme!) • Simple operations: GET, PUT, DELETE • Object is composed of metadata and value 4 Tuesday, 19 February 13

Slide 5

Slide 5 text

key value bucket key value key value key value cmolozian {!rstname: “Chris”, lastname: “Molozian”} JSON, XML, YAML, BINARY...etc 5 Tuesday, 19 February 13

Slide 6

Slide 6 text

Distributed & Horizontally Scalable • Default Con!guration is optimized for a cluster • Query load and data are spread evenly • Add more nodes and get more: • ops/second • storage capacity • compute power (for Map/Reduce) 6 Tuesday, 19 February 13

Slide 7

Slide 7 text

Fault Tolerant (1) • All nodes participate equally - no single point of failure (SPOF) • All data is replicated • Cluster transparently survives... • node failure • network partitions • Built on Erlang/OTP (designed for FT) 7 Tuesday, 19 February 13

Slide 8

Slide 8 text

Fault Tolerant (2) • Voxer, use Riak extensively Voxer is a Walkie Talkie application for smartphones. Messages stream live as you talk and your friends join you live or listen later. • Fault tolerance, in the real world: 8 Tuesday, 19 February 13

Slide 9

Slide 9 text

Inspired by Amazon Dynamo • Masterless, peer-coordinated replication • Consistent hashing • Eventually consistent • Quorum reads and writes • Anti-Entropy - Read Repair & Hinted Hando" 9 Tuesday, 19 February 13

Slide 10

Slide 10 text

Consistent Hashing • 160-bit integer keyspace • divided into !xed number of evenly-sized partitions • partitions are claimed by nodes in the cluster • replicas go to the N partitions following the key 32 partitions node 0 node 1 node 2 node 3 0 2160/2 2160/4 hash(“user_id”) N=3 10 Tuesday, 19 February 13

Slide 11

Slide 11 text

Highly-Available • Any node can serve client requests • Fallbacks are used when nodes are down • Always accepts read and write requests • Per-request quorums 11 Tuesday, 19 February 13

Slide 12

Slide 12 text

Request Quorums • Every request contacts all replicas of key • N - number of replicas (default 3) • R - read quorum • W - write quorum Quorum: The quantity of replicas that must respond to a read or write request before it is considered successful. (default 2) Calculated as n_val / 2 + 1 12 Tuesday, 19 February 13

Slide 13

Slide 13 text

Disaster Scenario • Node fails • Requests go to fallback • Node comes back • “Hando"” - data returns to recovered node • Normal operations resume X X X X X X X X hash(“user_id”) 13 Tuesday, 19 February 13

Slide 14

Slide 14 text

Built for the Web • HTTP is default (but not only) interface • HTTP REST API (via Webmachine) • HTTP Speci!cation Compliant - Reverse Proxy Caches, Load Balancers, Web Servers • Suitable for many web applications 14 Tuesday, 19 February 13

Slide 15

Slide 15 text

Other Extras • Pre/Post commit hooks • Multiple Storage Engines • Bitcask • LevelDB • Memory • Multi 15 Tuesday, 19 February 13

Slide 16

Slide 16 text

Which Storage Engine? • Bitcask - bounded data (like reference data) i.e. !nancial instruments • LevelDB - unbounded data or advanced query • Memory - highly transient data • Multi - No reason not to use it! (approx 2x number of open !le handles) 16 Tuesday, 19 February 13

Slide 17

Slide 17 text

Application Design • No intrinsic schema • Your application de!nes: • Structure • Semantics • Your application resolves con#icts (or uses Last Write Wins) 17 Tuesday, 19 February 13

Slide 18

Slide 18 text

Con#ict Resolution • Concurrent actors modifying the same data cause data divergence. • Riak provides two solutions to manage this: • Last Write Wins Naive approach but works for some use cases • Vector Clocks Retain “sibling” copies of data for merging 18 Tuesday, 19 February 13

Slide 19

Slide 19 text

Vector Clocks • Every node has an ID • Send last-seen vector clock in every “put” or “delete” request • Riak tracks history of updates • Auto-resolves stale versions • Let’s you handle con#icts 19 Tuesday, 19 February 13

Slide 20

Slide 20 text

Modeling Tools • Key-Value • Links • Full-text search • Secondary Indexes (2i) • Map/Reduce 20 Tuesday, 19 February 13

Slide 21

Slide 21 text

Key-Value • Content-Types • Denormalize • Meaningful or “application speci!c” keys • Composite keys (e.g. Ranking List) ___ • Time-boxing • References (value is a key or list of keys) 21 Tuesday, 19 February 13

Slide 23

Slide 23 text

Full-text Search • Designed for searching prose • Lucene/Solr-like query interface • Automatically indexes k/v pairs • Input to Map/Reduce • Customizable index schemas 23 Tuesday, 19 February 13

Slide 24

Slide 24 text

Secondary Indexes (2i) • De!ned as metadata • Two index types: _int and _string • Two query types: equal and range • Input to Map/Reduce 24 Tuesday, 19 February 13

Slide 25

Slide 25 text

Map/Reduce (1) • Typically to interact with data, we pull from a database • Costly, requires copying data into the app • Moves the data processing to the data Compute operations are sent to the database • Advantages: Scales more e$ciently and, Takes advantage of compute power on the db server 25 Tuesday, 19 February 13

Slide 26

Slide 26 text

Map/Reduce (2) • For more involved queries • Specify the input keys • Process data in “map” and “reduce” functions • Javascript or Erlang • Not designed for real-time processing 26 Tuesday, 19 February 13

Slide 27

Slide 27 text

• HTTP REST or optimized binary interface (PB) • O$cial Basho supported: • Community: C#, C/C++, Haskell, Clojure, Scala, Go, PHP and many others Client Libraries 27 Tuesday, 19 February 13

Slide 28

Slide 28 text

28 Tuesday, 19 February 13

Slide 29

Slide 29 text

Riak Cloud Storage • Released March 27, 2012 • S3 Protocol-compatible cloud storage • Built on Riak • Fault tolerant, distributed, highly-available • Multi-tenancy, Multi billing, etc... • Perfect for building your own private data storage cloud 29 Tuesday, 19 February 13

Slide 30

Slide 30 text

Riak CS Large Object Reporting API S3 API Riak CS Reporting API S3 API Riak CS Reporting API S3 API Riak CS Reporting API S3 API Riak CS Reporting API S3 API Riak Node Riak Node Riak Node Riak Node Riak Node 1mb 1mb 1mb 1mb 30 Tuesday, 19 February 13

Slide 31

Slide 31 text

Riak Use Cases • Reliability, #exibility, scalability • Session Data • Serving Advertising • Log and Sensor Data • Content Addressable Storage (CAS) • Private Cloud [S3 API] - Riak CS • Wherever low latency increases revenue 31 Tuesday, 19 February 13

Slide 32

Slide 32 text

Basho Technologies • Founded in 2008 by a group of engineers and executives from Akamai Technologies, Inc. • Design large scale distributed systems • Develop Riak, open-source distributed database • Specialize in storing critical information, with data integrity • O$ces in US, Europe (London) and Japan 32 Tuesday, 19 February 13

Slide 33

Slide 33 text

Basho EMEA 33 Tuesday, 19 February 13

Slide 34

Slide 34 text

Questions? Chris Molozian, [email protected] 34 Tuesday, 19 February 13

Slide 35

Slide 35 text

Want to know more? We will come and give a Riak tech talk at your organisation or group: bit.ly/RiakTechTalk 35 Tuesday, 19 February 13