Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Introduction to Riak

Introduction to Riak

Chris Molozian's intro to Riak talk given at the Munich Riak meet-up on February 18th 2013.

Basho Technologies

February 18, 2013
Tweet

More Decks by Basho Technologies

Other Decks in Technology

Transcript

  1. Introduction to Riak ...or why you need Riak ;) Basho

    Technologies Chris Molozian ([email protected]) 1 Tuesday, 19 February 13
  2. What is Riak? • Key-Value Store + Extras • Distributed,

    horizontally scalable • Fault-tolerant • Highly-available • Built for the Web • Inspired by Amazon’s Dynamo 2 Tuesday, 19 February 13
  3. Key-Value • Simple operations - GET, PUT, DELETE • Value

    is opaque (mostly), with metadata • Extras • Secondary Indexes (2i) • Links • Full-text search (optional) • Map/Reduce 3 Tuesday, 19 February 13
  4. K/V Data Model • All “Riak Object”(s) are referenced by

    keys • Keys are grouped into buckets (only a logical partitioning scheme!) • Simple operations: GET, PUT, DELETE • Object is composed of metadata and value 4 Tuesday, 19 February 13
  5. key value bucket key value key value key value cmolozian

    {!rstname: “Chris”, lastname: “Molozian”} JSON, XML, YAML, BINARY...etc 5 Tuesday, 19 February 13
  6. Distributed & Horizontally Scalable • Default Con!guration is optimized for

    a cluster • Query load and data are spread evenly • Add more nodes and get more: • ops/second • storage capacity • compute power (for Map/Reduce) 6 Tuesday, 19 February 13
  7. Fault Tolerant (1) • All nodes participate equally - no

    single point of failure (SPOF) • All data is replicated • Cluster transparently survives... • node failure • network partitions • Built on Erlang/OTP (designed for FT) 7 Tuesday, 19 February 13
  8. Fault Tolerant (2) • Voxer, use Riak extensively Voxer is

    a Walkie Talkie application for smartphones. Messages stream live as you talk and your friends join you live or listen later. • Fault tolerance, in the real world: 8 Tuesday, 19 February 13
  9. Inspired by Amazon Dynamo • Masterless, peer-coordinated replication • Consistent

    hashing • Eventually consistent • Quorum reads and writes • Anti-Entropy - Read Repair & Hinted Hando" 9 Tuesday, 19 February 13
  10. Consistent Hashing • 160-bit integer keyspace • divided into !xed

    number of evenly-sized partitions • partitions are claimed by nodes in the cluster • replicas go to the N partitions following the key 32 partitions node 0 node 1 node 2 node 3 0 2160/2 2160/4 hash(“user_id”) N=3 10 Tuesday, 19 February 13
  11. Highly-Available • Any node can serve client requests • Fallbacks

    are used when nodes are down • Always accepts read and write requests • Per-request quorums 11 Tuesday, 19 February 13
  12. Request Quorums • Every request contacts all replicas of key

    • N - number of replicas (default 3) • R - read quorum • W - write quorum Quorum: The quantity of replicas that must respond to a read or write request before it is considered successful. (default 2) Calculated as n_val / 2 + 1 12 Tuesday, 19 February 13
  13. Disaster Scenario • Node fails • Requests go to fallback

    • Node comes back • “Hando"” - data returns to recovered node • Normal operations resume X X X X X X X X hash(“user_id”) 13 Tuesday, 19 February 13
  14. Built for the Web • HTTP is default (but not

    only) interface • HTTP REST API (via Webmachine) • HTTP Speci!cation Compliant - Reverse Proxy Caches, Load Balancers, Web Servers • Suitable for many web applications 14 Tuesday, 19 February 13
  15. Other Extras • Pre/Post commit hooks • Multiple Storage Engines

    • Bitcask • LevelDB • Memory • Multi 15 Tuesday, 19 February 13
  16. Which Storage Engine? • Bitcask - bounded data (like reference

    data) i.e. !nancial instruments • LevelDB - unbounded data or advanced query • Memory - highly transient data • Multi - No reason not to use it! (approx 2x number of open !le handles) 16 Tuesday, 19 February 13
  17. Application Design • No intrinsic schema • Your application de!nes:

    • Structure • Semantics • Your application resolves con#icts (or uses Last Write Wins) 17 Tuesday, 19 February 13
  18. Con#ict Resolution • Concurrent actors modifying the same data cause

    data divergence. • Riak provides two solutions to manage this: • Last Write Wins Naive approach but works for some use cases • Vector Clocks Retain “sibling” copies of data for merging 18 Tuesday, 19 February 13
  19. Vector Clocks • Every node has an ID • Send

    last-seen vector clock in every “put” or “delete” request • Riak tracks history of updates • Auto-resolves stale versions • Let’s you handle con#icts 19 Tuesday, 19 February 13
  20. Modeling Tools • Key-Value • Links • Full-text search •

    Secondary Indexes (2i) • Map/Reduce 20 Tuesday, 19 February 13
  21. Key-Value • Content-Types • Denormalize • Meaningful or “application speci!c”

    keys • Composite keys (e.g. Ranking List) <engine>_<locale>_<keyword>_<url> • Time-boxing • References (value is a key or list of keys) 21 Tuesday, 19 February 13
  22. Links • Lightweight relationships, like <a> • Includes a “tag”

    • Built-in traversal operation (“walking”) GET /riak/b/k/[bucket],[tag],[keep] • Limited in number (part of metadata) 22 Tuesday, 19 February 13
  23. Full-text Search • Designed for searching prose • Lucene/Solr-like query

    interface • Automatically indexes k/v pairs • Input to Map/Reduce • Customizable index schemas 23 Tuesday, 19 February 13
  24. Secondary Indexes (2i) • De!ned as metadata • Two index

    types: _int and _string • Two query types: equal and range • Input to Map/Reduce 24 Tuesday, 19 February 13
  25. Map/Reduce (1) • Typically to interact with data, we pull

    from a database • Costly, requires copying data into the app • Moves the data processing to the data Compute operations are sent to the database • Advantages: Scales more e$ciently and, Takes advantage of compute power on the db server 25 Tuesday, 19 February 13
  26. Map/Reduce (2) • For more involved queries • Specify the

    input keys • Process data in “map” and “reduce” functions • Javascript or Erlang • Not designed for real-time processing 26 Tuesday, 19 February 13
  27. • HTTP REST or optimized binary interface (PB) • O$cial

    Basho supported: • Community: C#, C/C++, Haskell, Clojure, Scala, Go, PHP and many others Client Libraries 27 Tuesday, 19 February 13
  28. Riak Cloud Storage • Released March 27, 2012 • S3

    Protocol-compatible cloud storage • Built on Riak • Fault tolerant, distributed, highly-available • Multi-tenancy, Multi billing, etc... • Perfect for building your own private data storage cloud 29 Tuesday, 19 February 13
  29. Riak CS Large Object Reporting API S3 API Riak CS

    Reporting API S3 API Riak CS Reporting API S3 API Riak CS Reporting API S3 API Riak CS Reporting API S3 API Riak Node Riak Node Riak Node Riak Node Riak Node 1mb 1mb 1mb 1mb 30 Tuesday, 19 February 13
  30. Riak Use Cases • Reliability, #exibility, scalability • Session Data

    • Serving Advertising • Log and Sensor Data • Content Addressable Storage (CAS) • Private Cloud [S3 API] - Riak CS • Wherever low latency increases revenue 31 Tuesday, 19 February 13
  31. Basho Technologies • Founded in 2008 by a group of

    engineers and executives from Akamai Technologies, Inc. • Design large scale distributed systems • Develop Riak, open-source distributed database • Specialize in storing critical information, with data integrity • O$ces in US, Europe (London) and Japan 32 Tuesday, 19 February 13
  32. Want to know more? We will come and give a

    Riak tech talk at your organisation or group: bit.ly/RiakTechTalk 35 Tuesday, 19 February 13