Riak - An Intro With Windows Azure

Riak - An Intro With Windows Azure

This slide deck is the part of the talk, generally centered around the topics and details of the Riak Architecture & related material. It currently doesn't have the Azure sample commands or other elements around that, as it is the live part of the presentation. I'll likely add these parts in the future though.

B077605f4799abe20de52b6ded458e99?s=128

Adron Hall

March 08, 2013
Tweet

Transcript

  1. & Thursday, March 7, 13

  2. #WHOIS Adron Hall | @adron | Coder, Messenger, Recon Thursday,

    March 7, 13
  3. ഄা Thursday, March 7, 13

  4. Thursday, March 7, 13

  5. Thursday, March 7, 13

  6. Distributed, masterless, highly-available key/value store Thursday, March 7, 13

  7. Horizontal Scalability Fault-Tolerance Low-latency Ops Friendliness Predictability High-Availability DESIGN GOALS

    Thursday, March 7, 13
  8. When to use Riak... Thursday, March 7, 13

  9. Metadata Users/Profiles Object Storage Session Storage Sensor Data Logging Systems

    Record Systems Notification Systems RIAK USE CASES Thursday, March 7, 13
  10. IN PRODUCTION AT And 1000s more... Thursday, March 7, 13

  11. DATA MODEL Thursday, March 7, 13

  12. {“Key”:“Value”} • Values are stored against keys • Key/Value +

    Metadata = Object • Fundamental Unit of Replication • Any Datatype will work • Record to disk in binary format Thursday, March 7, 13
  13. <<BUCKET>>/<<KEY>> • Virtual Namespace • Bucket + Keys = Object

    Address • Buckets have properties • Objects in bucket inherit properties • No relationships between buckets Thursday, March 7, 13
  14. DATA ACCESS Thursday, March 7, 13

  15. INTERFACES HTTP API - Via a little piece of magic

    called Webmachine Protocol Buffers API - Thanks, Google! Largely-faithful REST implementation Compact, binary protocol Thursday, March 7, 13
  16. CLIENT LIBS Python Ruby PHP OCaml Java Perl Erlang Node.js

    C/C++ Haskell Clojure Scala Go Dart .NET And more. Supported by either Basho or our community. Thursday, March 7, 13
  17. RIAK GIVES YOU [FOUR] WAYS TO STORE, RETRIEVE, AND QUERY

    DATA Thursday, March 7, 13
  18. 1 2 3 4 5 6 7 8 9 10

    11 12 13 14 15 16 17 18 CRUD // PUT PUT  /buckets/bucket/keys/key            //  User-­‐defined  key POST  /buckets/bucket/keys/key        //  Riak-­‐defined  key DELETE  /buckets/bucket/keys/key       GET  /buckets/bucket/keys/key         // GET // DELETE Thursday, March 7, 13
  19. MapReduce Distributed processing system using Riak Pipe Efficient for targeted

    queries over known key range Write jobs in Erlang or JS. (Erlang more performant) Thursday, March 7, 13
  20. Secondary Indexing (2i) riak_object riak_object X-Riak-Index-email_bin X-Riak-Index-value_int “mark@basho.com” “42” Tag

    objects with custom metadata on PUT... Exact match and range queries... No multi-index queries yet... Pagination is on its way... Thursday, March 7, 13
  21. Riak Search Store and index documents (JSON, text, XML, etc)

    Current Riak Search supports subset of Solr API Next iteration (Yokozuna; in beta)will implement distributed Solr on Riak. It will be sexy. Looking for beta testers to help harden Yokozuna Thursday, March 7, 13
  22. ARCHITECTURE The scaleability and ease of operation goals inform architectural

    decisions. These come with tradeoffs. Consistent Hashing Virtual Nodes Append-only storage Handoff/Rebalancing Vector Clocks Active Anti-Entropy* Thursday, March 7, 13
  23. Consistent Hashing Location of data in the Riak ring is

    determined based on hash of bucket + key. Provides even distribution of storage and query load Trades off advantages gained from locality - e.g. Range queries and aggregates Thursday, March 7, 13
  24. Consistent Hashing Thursday, March 7, 13

  25. Virtual Nodes Unit of addressing and concurrency in Riak Each

    physical host manage many vnodes Partition count / physical machines = vnodes/machine* Decouples physical assets from data distribution. This provides: - simplicity in cluster sizing - failure isolation Thursday, March 7, 13
  26. Handoff/Rebalancing Mechanisms for data rebalancing When nodes join/leave cluster, handoff

    and rebalancing manage the date shuffling dynamically Trades off speed of convergence vs. effects on cluster performance - causes disk & network load Thursday, March 7, 13
  27. Vector Clocks VCs used to rectify object consistency at READ

    time. Lots of knobs to turn; well-documented Trades off space, speed, and complexity for safety - will store all sibling objects until resolved - can lead to object size issues Thursday, March 7, 13
  28. Append-Only Storage Riak provides a pluggable backend interface. (Write your

    own; we’ll probably hire you...) Bitcask, LevelDB are most-heavily used. Both are append - only Provides crash safety and speed. Trade off: periodic compaction/merge ops Thursday, March 7, 13
  29. RIAK 1.3 (AKA “new hotness”) Active Anti Entropy MapReduce Improvements

    IPv6 Support Riaknostic included by default Much more Riak Control improvements Full release notes: https://github.com/basho/riak/blob/1.3/RELEASE-NOTES.md Thursday, March 7, 13
  30. FUTURE WORK* (1.4 and beyond) (* all code subject to

    ship early, late, or not at all) Dynamic Ring Size Yokozuna CRDTs/Data Types Riak Object Consistency 2i Improvements Riak Pipe work Much more Thursday, March 7, 13
  31. S3-API compatible and supports per-tenant reporting for billing and metering

    use cases. Additional APIs on the way. Multi-tenant cloud storage software for public and private clouds. Designed to provide simple, available, distributed cloud storage at any scale. Stores files of arbitrary size. Under the hood stores 1MB chunks along side a manifest. Stateless proxy (CS) does chunking. Riak does distribution, storage, etc. Thursday, March 7, 13
  32. Data transfer is unidirectional (source -> sink). Bidirectional synchronization can

    be achieved by configuring a pair of connections between clusters. Extends Riak's capabilities with: - multi-datacenter replication - SNMP Configuration - JMX-Monitoring - 24x7 support from Basho Engineers One cluster acts as a "source cluster". The source cluster replicates its data to one or more "sink clusters" using either real-time or full sync. Thursday, March 7, 13
  33. RIAK COMMUNITY Mailing List - 1300 developers IRC - 200+

    people every day yelling about software GitHub - 1000s of watchers; 200+ contributors to all projects Meetups - 10 Countries, 23 Cities, 3700+ Members & growing fast! Deployments - 1000s in production. Thursday, March 7, 13
  34. May 13-14th in New York City ricon.io/east.html Talks, hacking, parties

    Dedicated to the future of Riak and distributed systems in production REGISTER NOW! https://ricon-east-2013.eventbrite.com/?discount=lovevnodes Thursday, March 7, 13
  35. GETTING STARTED Downloads - http://docs.basho.com/riak/latest/downloads/ Docs - http://docs.basho.com Riak Source

    Code - github.com/basho/riak All Basho source Code - github.com/basho/ Riak Mailing List - http://bit.ly/riak-list Email or Tweet me @adron or adron@basho.com Thursday, March 7, 13
  36. Let’s Talk UI & CLI - Demo Things Thursday, March

    7, 13
  37. #WHOIS Adron Hall | @adron | Coder, Messenger, Recon Thursday,

    March 7, 13