Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Riak - Your Next (or Current) Favorite Database

Riak - Your Next (or Current) Favorite Database

A high level overview of Riak, how to use it, what to use it for, and what's in 1.3.

Mark Phillips

March 05, 2013
Tweet

More Decks by Mark Phillips

Other Decks in Technology

Transcript

  1. BASHO Founded 2008 by group of ex-Akamai, Mitre, Apple 120

    Employees; >50% Dev; Distributed Company Sponsors of Riak, the Apache 2.0-licensed project Basho sells add-ons to Riak -> Riak EDS and Riak CS We generate recurring revenue and are hiring. :) Wednesday, March 6, 13
  2. Any node serves requests Deployed as cluster of nodes (>=5)

    Automatic Failure Durable High-Availability RIAK Distributed, masterless, highly- available key/value store Built-in replication Wednesday, March 6, 13
  3. - critical data When you have: - that should always

    be available - and can be modeled as keys and values* (* Hint: at scale, almost everything looks like a k/v store. Don’t be afraid to denormalize.) WHEN TO USE RIAK Wednesday, March 6, 13
  4. Metadata Users/Profiles Object Storage Session Storage Sensor Data Logging Systems

    Record Systems Notification Systems RIAK USE CASES Wednesday, March 6, 13
  5. Riak Object {“KEY” : “VALUE”} Values are stored against keys

    Key/Value + metadata = object Fundamental unit of replication Any data type will work. Encoded as binaries on disk. Soft limit of ~4MB on object size Wednesday, March 6, 13
  6. Buckets <<BUCKET>>/<<KEY>> Virtual Namespace Bucket + Key = object address

    Buckets have properties All objects in buckets inherit properties No relationships between buckets Wednesday, March 6, 13
  7. INTERFACES HTTP API - Via a little piece of magic

    called Webmachine Protocol Buffers API - Thanks, Google! Largely-faithful REST implementation Compact, binary protocol Wednesday, March 6, 13
  8. CLIENT LIBS Python Ruby PHP OCaml Java Perl Erlang Node.js

    C/C++ Haskell Clojure Scala Go Dart .NET And more. Supported by either Basho or our community. Wednesday, March 6, 13
  9. 1 2 3 4 5 6 7 8 9 10

    11 12 13 14 15 16 17 18 CRUD // PUT PUT  /buckets/bucket/keys/key            //  User-­‐defined  key POST  /buckets/bucket/keys/key        //  Riak-­‐defined  key DELETE  /buckets/bucket/keys/key       GET  /buckets/bucket/keys/key         // GET // DELETE Wednesday, March 6, 13
  10. MapReduce Distributed processing system using Riak Pipe Efficient for targeted

    queries over known key range Write jobs in Erlang or JS. (Erlang more performant) Wednesday, March 6, 13
  11. Secondary Indexing (2i) riak_object riak_object X-Riak-Index-email_bin X-Riak-Index-value_int “[email protected]” “42” Tag

    objects with custom metadata on PUT Exact match and range queries No multi-index queries yet Pagination *should* be in 1.4 Wednesday, March 6, 13
  12. Riak Search Store and index documents (JSON, text, XML, etc)

    Current Riak Search supports subset of Solr API Next iteration (Yokozuna; in beta)will implement distributed Solr on Riak. It will be sexy. Looking for beta testers to help harden Yokozuna Wednesday, March 6, 13
  13. ARCHITECTURE The scalability and ease of operation goals inform architectural

    decisions. These come with tradeoffs. Consistent Hashing Virtual Nodes Append-only storage Handoff/Rebalancing Vector Clocks Active Anti-Entropy* Wednesday, March 6, 13
  14. Consistent Hashing Location of data in the Riak ring is

    determined based on hash of bucket + key. Provides even distribution of storage and query load Trades off advantages gained from locality - e.g. Range queries and aggregates Wednesday, March 6, 13
  15. Virtual Nodes Unit of addressing and concurrency in Riak Each

    physical host manage many vnodes Partition count / physical machines = vnodes/machine* Decouples physical assets from data distribution. This provides: - simplicity in cluster sizing - failure isolation Wednesday, March 6, 13
  16. Handoff/Rebalancing Mechanisms for data rebalancing When nodes join/leave cluster, handoff

    and rebalancing manage the date shuffling dynamically Trades off speed of convergence vs. effects on cluster performance - causes disk & network load Wednesday, March 6, 13
  17. Vector Clocks Data structure that provides “happened-before” relationship between events

    VCs used to rectify object consistency at READ time. Lots of knobs to turn; well-documented Trades off space, speed, and complexity for safety - will store all sibling objects until resolved - can lead to object size issues Wednesday, March 6, 13
  18. Append-Only Storage Riak provides a pluggable backend interface. (Write your

    own; we’ll probably hire you...) Bitcask, LevelDB are most-heavily used. Both are append - only Provides crash safety and speed. Trade off: periodic compaction/merge ops Wednesday, March 6, 13
  19. RIAK 1.3 (AKA “new hotness”) Active Anti Entropy MapReduce Improvements

    IPv6 Support Riaknostic included by default Much more Riak Control improvements Full release notes: https://github.com/basho/riak/blob/1.3/RELEASE-NOTES.md Wednesday, March 6, 13
  20. FUTURE WORK* (1.4 and beyond) (* all code subject to

    ship early, late, or not at all) Dynamic Ring Size Yokozuna CRDTs/Data Types Riak Object Consistency 2i Improvements Riak Pipe work Much more Wednesday, March 6, 13
  21. S3-API compatible and supports per-tenant reporting for billing and metering

    use cases. Additional APIs on the way. Multi-tenant cloud storage software for public and private clouds. Designed to provide simple, available, distributed cloud storage at any scale. Stores files of arbitrary size. Under the hood stores 1MB chunks along side a manifest. Stateless proxy (CS) does chunking. Riak does distribution, storage, etc. Wednesday, March 6, 13
  22. Data transfer is unidirectional (source -> sink). Bidirectional synchronization can

    be achieved by configuring a pair of connections between clusters. Extends Riak's capabilities with: - multi-datacenter replication - SNMP Configuration - JMX-Monitoring - 24x7 support from Basho Engineers One cluster acts as a "source cluster". The source cluster replicates its data to one or more "sink clusters" using either real-time or full sync. Wednesday, March 6, 13
  23. RIAK COMMUNITY Mailing List - 1300 developers IRC - 200+

    people every day yelling about software GitHub - 1000s of watchers; 200+ contributors to all projects Meetups - 10 Countries, 23 Cities, 3700+ Members Deployments - 1000s in production. Wednesday, March 6, 13
  24. May 13-14th in New York City ricon.io/east.html Talks, hacking, parties

    Dedicated to the future of Riak and distributed systems in production REGISTER NOW! https://ricon-east-2013.eventbrite.com/?discount=lovevnodes Wednesday, March 6, 13
  25. GETTING STARTED Downloads - http://docs.basho.com/riak/latest/downloads/ Docs - docs.basho.com Riak Source

    Code - github.com/basho/riak All Basho source Code - github.com/basho/ Riak Mailing List - http://bit.ly/FjChC Email me - [email protected] Wednesday, March 6, 13