Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Riak Overview - includes 1.3 features

Riak Overview - includes 1.3 features

Riak architecture, operations, APIs, data model and querying, plus features from the latest 1.3 release.

Basho Technologies

March 25, 2013
Tweet

More Decks by Basho Technologies

Other Decks in Technology

Transcript

  1. What`s in store? •  At a High Level •  For

    Developers •  Under the Hood •  When and Why •  Some Use Cases •  Commercial Extensions •  1.3 and Roadmap
  2. •  Dynamo-inspired key/value store •  with some extras: search, MapReduce,

    2i, links, pre- and post-commit hooks, pluggable backends, HTTP and binary interfaces •  Written in Erlang with C/C++ •  Open source under Apache 2 License Riak
  3. Riak’s Design Goals (1) •  High-availability •  Low-latency •  Horizontal

    Scalability •  Fault Tolerance •  Ops Friendliness •  Predictability
  4. Riak’s Design Goals (2) •  Design Informed by Brewer’s CAP

    Theorem and Amazon’s Dynamo Paper •  Riak is tuned to offer availability above all else
  5. Riak is a database that stores keys against values. Keys

    are grouped into a higher-level namespace called buckets.
  6. Riak doesn’t care what you store. It will accept any

    data type; things are stored on disk as binaries.
  7. Examples Application Type Key Value Session User/Session ID Session Data

    Advertising Campaign ID Ad Content Logs Date Log File Sensor Date, Date/Time Sensor Updates User Data Login, eMail, UUID User Attributes Content Title, Integer Text, JSON/XML/ HTTP document, images, etc.
  8. Client Libraries Ruby, Node.js, Java, Python, Perl, OCaml, Erlang, PHP,

    C, Squeak, Smalltalk, Pharoah, Clojure, Scala, Haskell, Lisp, Go, .NET, Play, and more (supported by either Basho or the community).
  9. •  Automatic, self-healing property •  Repairs divergent, missing or corrupt

    replicas caused by hardware failure, bad disks, data corruption, and other failure modes •  Useful for large clusters, long term storage •  Uses hash tree exchange •  Minimal performance impact •  More on our blog Active Anti-Entropy
  10. •  Expanded IPv6 support (protocol buffers and handoff interfaces) • 

    MapReduce backpressure •  Optionally send log messages to syslog Other Features
  11. When Might Riak Make Sense When you have enough data

    to require >1 physical machine (preferably >5) When availability is more important than consistency (think “critical data”on “big data”) When your data can be modeled as keys and values; don’t be afraid to denormalize
  12. •  Cloud infrastructure management •  Machine, customer and API data

    •  “Design for failure” architecture “Enstratius relies on Riak to ensure that our cloud infrastructure management platform scales seamlessly, without interruption and performance bottlenecks, while meeting and exceeding internal requirements for high availability and data durability.”
  13. •  Scaling writes in MySQL became a bottleneck •  Master/slave

    replication made master nodes a single point of failure •  Multi-site replication ß vimeo.com/bashotech
  14. Session Storage •  First Basho customer in 2009 •  Every

    hit to a Mochi web property results in at least one read, maybe write to Riak •  Unavailability or high latency = lost ad revenue
  15. Ad Serving •  OpenX served ~4T ad in 2012 • 

    Started with CouchDB and Cassandra for various parts of infrastructure •  Now consolidating on Riak and Riak Core
  16. Voxer: Post Growth •  ~60 Nodes total in prod • 

    100s of TBs of data (>1TB daily) •  ~400k Concurrent Users •  Billions of daily Requests
  17. Riak : Hybrid Solutions •  Riak with Postgres •  Riak

    with Elastic Search •  Riak with Hadoop •  Secondary analytics clusters
  18. Try Us On… •  Amazon AMIs •  Engine Yard • 

    Microsoft Azure VM Depot •  Riakon.com
  19. •  Faster, with more connections between clusters •  Easier set

    up and configuration •  Better per-connection statistics •  Already used in production by many Riak Enterprise customers Advanced Multi-Datacenter
  20. Riak Cloud Storage •  Large object support •  S3-compatible API

    •  Multi-tenancy •  Reporting on usage •  Now open source
  21. Future Work •  CRDTs •  Tight Solr integration •  Greater

    consistency •  Lots of other good stuff, check Github