Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Riak Overload

Mark Phillips
September 13, 2012

Riak Overload

A high-level look at Riak complete with use cases, computer science, and what's coming in future releases.

Mark Phillips

September 13, 2012
Tweet

More Decks by Mark Phillips

Other Decks in Technology

Transcript

  1. What’s in store? • At a High Level • For

    Developers • Under the Hood • When and Why • Riak and NoSQL • Etc. Thursday, September 13, 12
  2. • Dynamo-inspired key/value store • with some extras: search, MapReduce,

    2i, links, pre- and post-commit hooks, pluggable backends, HTTP and binary interfaces • Written in Erlang with C/C++ • Open source under Apache 2 License Riak Thursday, September 13, 12
  3. Riak History • Started internally at Basho in 2007 •

    Deployed in production the same year • Used at data store for Basho’s SaaS • Open sourced in August 2009; Basho “pivots” • Hit v.1.0 in September 2011 • Now being used by 1000s in production • Basho sells commercial extensions to Riak Thursday, September 13, 12
  4. Riak’s Design Goals (1) • High-availability • Low-latency • Horizontal

    Scalability • Fault Tolerance • Ops Friendliness • Predictability Thursday, September 13, 12
  5. Riak’s Design Goals (2) • Design Informed by Brewer’s CAP

    Theorem and Amazon’s Dynamo Paper • Riak is tuned to o!er availability above all else • Developers can tune for consistency (more on this later) Thursday, September 13, 12
  6. Riak is a database that stores keys against values. Keys

    are grouped into a higher-level namespace called buckets. Thursday, September 13, 12
  7. Riak doesn’t care what you store. It will accept any

    data type; things are stored on disk as binaries. Thursday, September 13, 12
  8. Two APIs 1. HTTP (just like the web) 2. Protocol

    Bu!ers (thank you, Google) Thursday, September 13, 12
  9. Tunable Consistency • n_val - number of replica to store;

    bucket- level setting. Defaults to “3”. • w - number of replicas required for a successful write; Defaults to “2”. • r - number of replica acks required for a successful read. request-level setting. Defaults to “2”. • Tweak consistency vs. availability Thursday, September 13, 12
  10. Client Libraries Ruby, Node.js, Java, Python, Perl, OCaml, Erlang, PHP,

    C, Squeak, Smalltalk, Pharoah, Clojure, Scala, Haskell, Lisp, Go, .NET, Play, and more (supported by either Basho or the community). Thursday, September 13, 12
  11. Consistent Hashing and Replicas Virtual Nodes Vector Clocks Gossiping Append-only

    stores Hando! and Rebalancing Erlang/OTP Thursday, September 13, 12
  12. Consistent Hashing • 160-bit integer keyspace • divided into !xed

    number of evenly-sized partitions 32 partitions 0 2160/2 2160/4 Thursday, September 13, 12
  13. Consistent Hashing • 160-bit integer keyspace • divided into !xed

    number of evenly-sized partitions • partitions are claimed by nodes in the cluster 32 partitions node 0 node 1 node 2 node 3 0 2160/2 2160/4 Thursday, September 13, 12
  14. Consistent Hashing • 160-bit integer keyspace • divided into !xed

    number of evenly-sized partitions • partitions are claimed by nodes in the cluster • replicas go to the N partitions following the key node 0 node 1 node 2 node 3 Thursday, September 13, 12
  15. Consistent Hashing • 160-bit integer keyspace • divided into !xed

    number of evenly-sized partitions • partitions are claimed by nodes in the cluster • replicas go to the N partitions following the key node 0 node 1 node 2 node 3 hash(“meetups/nycdevops”) N=3 Thursday, September 13, 12
  16. Disaster Scenario • node fails X X X X X

    X X X Thursday, September 13, 12
  17. Disaster Scenario • node fails • requests go to fallback

    X X X X X X X X hash(“meetups/nycdevops”) Thursday, September 13, 12
  18. Disaster Scenario • node fails • requests go to fallback

    • node comes back hash(“meetups/nycdevops”) Thursday, September 13, 12
  19. Disaster Scenario • node fails • requests go to fallback

    • node comes back • “Hando"” - data returns to recovered node hash(“meetups/nycdevops”) Thursday, September 13, 12
  20. Disaster Scenario • node fails • requests go to fallback

    • node comes back • “Hando"” - data returns to recovered node • normal operations resume hash(“meetups/nycdevops”) Thursday, September 13, 12
  21. Virtual Nodes • Each physical machine runs a certain number

    of Vnodes • Unit of addressing, concurrency in Riak • Storage not tied to physical assets • Enables dynamic rebalancing of data when cluster topology changes Thursday, September 13, 12
  22. Vector Clocks • Data structure used to reason about causality

    at the object level • Provides happened-before relationship between events • Each object in Riak has a vector clock* • Trade o! space, speed, complexity for safety Thursday, September 13, 12
  23. Hando! and Rebalancing • When cluster topology changes, data must

    be rebalanced • Hando! and rebalancing happen in the background; no manual intervention required* • Trade o! speed of convergence vs. e!ects on cluster performance Thursday, September 13, 12
  24. Gossip Protocol • Nodes “gossip” their view of cluster state

    • Enables nodes to store minimal cluster state • Can lead to network chatiness; in OTP, all nodes are fully-connected Thursday, September 13, 12
  25. Append-only Stores • Riak has a pluggable backend architecture •

    Bitcask, LevelDB are used the most in production depending on use-case • All writes are appends to a "le • This provide crash safety and fast writes • Tradeo! - periodic, background compaction is required Thursday, September 13, 12
  26. Erlang/OTP • Shared-nothing, immutable, message- passing, functional, concurrent • Distributed

    systems primitives in core language • OTP (Open Telecom Platform) • Ericsson AXD-301: 99.9999999% uptime (31ms/year) Thursday, September 13, 12
  27. When Might Riak Make Sense When you have enough data

    to require >1 physical machine (preferably >4) When availability is more important than consistency (think “critical data”on “big data”) When your data can be modeled as keys and values; don’t be afraid to denormalize Thursday, September 13, 12
  28. User/MetaData Store • User pro"le storage for x"nityTV Mobile app

    • Storage of metadata on content providers and licensing • Strict Latency requirements Thursday, September 13, 12
  29. Session Storage • First Basho customer in 2009 • Every

    hit to a Mochi web property results in at least one read, maybe write to Riak • Unavailability or high latency = lost ad revenue Thursday, September 13, 12
  30. Ad Serving • OpenX will serve ~4T ad in 2012

    • Started with CouchDB and Cassandra for various parts of infrastructure • Now consolidating on Riak and Riak Core Thursday, September 13, 12
  31. Voxer: Initial Stats • 11 Riak nodes (switched from CouchDB)

    • 100s of GBs • ~20k Peak Concurrent Users • ~4MM Daily Request Thursday, September 13, 12
  32. Voxer: Post Growth • ~60 Nodes total in prod •

    100s of TBs of data (>1TB daily) • ~400k Concurrent Users • Billions of daily Requests Thursday, September 13, 12
  33. • At small scale, everything works • NoSQL DBs trade

    o! traditional features to better support new and emerging use cases • Knowledge of the underlying system is essential • A lot of NoSQL Marketing is still bullshit Choosing a NoSQL Database Thursday, September 13, 12
  34. NoSQL by Data Model • Key/Value - Riak, Redis, Voldemort,

    Cassandra* • Document - MongoDB, CouchDB • Column(esque) - Hbase* • Graph - Neo4J Thursday, September 13, 12
  35. NoSQL by Distribution • Masterless - Riak, Voldemort, Cassandra •

    Master/Slave - MongoDB, Hbase*, CouchDB, Redis* Thursday, September 13, 12
  36. New in Riak 1.2 • LevelDB Improvements • FreeBSD Support

    • New Cluster Admin Tools • Folsom for Stats • KV and Search Repair work • Much much more Thursday, September 13, 12
  37. What needs "xing in Riak? • Active AE • Object

    Compactness • Rack Awareness • Ring Sizing Thursday, September 13, 12
  38. Future Work • Active Anti Entropy • Bona"de Data Types

    • Deeper Solr Integration • Consistency • Lots of other hotness Thursday, September 13, 12
  39. http://ricon2012.com When and where? Wednesday, October 10 through Thursday, October

    11 at the W Hotel in downtown San Francisco. Thursday, September 13, 12