Slide 1

Slide 1 text

& Thursday, March 7, 13

Slide 2

Slide 2 text

#WHOIS Adron Hall | @adron | Coder, Messenger, Recon Thursday, March 7, 13

Slide 3

Slide 3 text

ഄা Thursday, March 7, 13

Slide 4

Slide 4 text

Thursday, March 7, 13

Slide 5

Slide 5 text

Thursday, March 7, 13

Slide 6

Slide 6 text

Distributed, masterless, highly-available key/value store Thursday, March 7, 13

Slide 7

Slide 7 text

Horizontal Scalability Fault-Tolerance Low-latency Ops Friendliness Predictability High-Availability DESIGN GOALS Thursday, March 7, 13

Slide 8

Slide 8 text

When to use Riak... Thursday, March 7, 13

Slide 9

Slide 9 text

Metadata Users/Profiles Object Storage Session Storage Sensor Data Logging Systems Record Systems Notification Systems RIAK USE CASES Thursday, March 7, 13

Slide 10

Slide 10 text

IN PRODUCTION AT And 1000s more... Thursday, March 7, 13

Slide 11

Slide 11 text

DATA MODEL Thursday, March 7, 13

Slide 12

Slide 12 text

{“Key”:“Value”} • Values are stored against keys • Key/Value + Metadata = Object • Fundamental Unit of Replication • Any Datatype will work • Record to disk in binary format Thursday, March 7, 13

Slide 13

Slide 13 text

<>/<> • Virtual Namespace • Bucket + Keys = Object Address • Buckets have properties • Objects in bucket inherit properties • No relationships between buckets Thursday, March 7, 13

Slide 14

Slide 14 text

DATA ACCESS Thursday, March 7, 13

Slide 15

Slide 15 text

INTERFACES HTTP API - Via a little piece of magic called Webmachine Protocol Buffers API - Thanks, Google! Largely-faithful REST implementation Compact, binary protocol Thursday, March 7, 13

Slide 16

Slide 16 text

CLIENT LIBS Python Ruby PHP OCaml Java Perl Erlang Node.js C/C++ Haskell Clojure Scala Go Dart .NET And more. Supported by either Basho or our community. Thursday, March 7, 13

Slide 17

Slide 17 text

RIAK GIVES YOU [FOUR] WAYS TO STORE, RETRIEVE, AND QUERY DATA Thursday, March 7, 13

Slide 18

Slide 18 text

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 CRUD // PUT PUT  /buckets/bucket/keys/key            //  User-­‐defined  key POST  /buckets/bucket/keys/key        //  Riak-­‐defined  key DELETE  /buckets/bucket/keys/key       GET  /buckets/bucket/keys/key         // GET // DELETE Thursday, March 7, 13

Slide 19

Slide 19 text

MapReduce Distributed processing system using Riak Pipe Efficient for targeted queries over known key range Write jobs in Erlang or JS. (Erlang more performant) Thursday, March 7, 13

Slide 20

Slide 20 text

Secondary Indexing (2i) riak_object riak_object X-Riak-Index-email_bin X-Riak-Index-value_int “[email protected]” “42” Tag objects with custom metadata on PUT... Exact match and range queries... No multi-index queries yet... Pagination is on its way... Thursday, March 7, 13

Slide 21

Slide 21 text

Riak Search Store and index documents (JSON, text, XML, etc) Current Riak Search supports subset of Solr API Next iteration (Yokozuna; in beta)will implement distributed Solr on Riak. It will be sexy. Looking for beta testers to help harden Yokozuna Thursday, March 7, 13

Slide 22

Slide 22 text

ARCHITECTURE The scaleability and ease of operation goals inform architectural decisions. These come with tradeoffs. Consistent Hashing Virtual Nodes Append-only storage Handoff/Rebalancing Vector Clocks Active Anti-Entropy* Thursday, March 7, 13

Slide 23

Slide 23 text

Consistent Hashing Location of data in the Riak ring is determined based on hash of bucket + key. Provides even distribution of storage and query load Trades off advantages gained from locality - e.g. Range queries and aggregates Thursday, March 7, 13

Slide 24

Slide 24 text

Consistent Hashing Thursday, March 7, 13

Slide 25

Slide 25 text

Virtual Nodes Unit of addressing and concurrency in Riak Each physical host manage many vnodes Partition count / physical machines = vnodes/machine* Decouples physical assets from data distribution. This provides: - simplicity in cluster sizing - failure isolation Thursday, March 7, 13

Slide 26

Slide 26 text

Handoff/Rebalancing Mechanisms for data rebalancing When nodes join/leave cluster, handoff and rebalancing manage the date shuffling dynamically Trades off speed of convergence vs. effects on cluster performance - causes disk & network load Thursday, March 7, 13

Slide 27

Slide 27 text

Vector Clocks VCs used to rectify object consistency at READ time. Lots of knobs to turn; well-documented Trades off space, speed, and complexity for safety - will store all sibling objects until resolved - can lead to object size issues Thursday, March 7, 13

Slide 28

Slide 28 text

Append-Only Storage Riak provides a pluggable backend interface. (Write your own; we’ll probably hire you...) Bitcask, LevelDB are most-heavily used. Both are append - only Provides crash safety and speed. Trade off: periodic compaction/merge ops Thursday, March 7, 13

Slide 29

Slide 29 text

RIAK 1.3 (AKA “new hotness”) Active Anti Entropy MapReduce Improvements IPv6 Support Riaknostic included by default Much more Riak Control improvements Full release notes: https://github.com/basho/riak/blob/1.3/RELEASE-NOTES.md Thursday, March 7, 13

Slide 30

Slide 30 text

FUTURE WORK* (1.4 and beyond) (* all code subject to ship early, late, or not at all) Dynamic Ring Size Yokozuna CRDTs/Data Types Riak Object Consistency 2i Improvements Riak Pipe work Much more Thursday, March 7, 13

Slide 31

Slide 31 text

S3-API compatible and supports per-tenant reporting for billing and metering use cases. Additional APIs on the way. Multi-tenant cloud storage software for public and private clouds. Designed to provide simple, available, distributed cloud storage at any scale. Stores files of arbitrary size. Under the hood stores 1MB chunks along side a manifest. Stateless proxy (CS) does chunking. Riak does distribution, storage, etc. Thursday, March 7, 13

Slide 32

Slide 32 text

Data transfer is unidirectional (source -> sink). Bidirectional synchronization can be achieved by configuring a pair of connections between clusters. Extends Riak's capabilities with: - multi-datacenter replication - SNMP Configuration - JMX-Monitoring - 24x7 support from Basho Engineers One cluster acts as a "source cluster". The source cluster replicates its data to one or more "sink clusters" using either real-time or full sync. Thursday, March 7, 13

Slide 33

Slide 33 text

RIAK COMMUNITY Mailing List - 1300 developers IRC - 200+ people every day yelling about software GitHub - 1000s of watchers; 200+ contributors to all projects Meetups - 10 Countries, 23 Cities, 3700+ Members & growing fast! Deployments - 1000s in production. Thursday, March 7, 13

Slide 34

Slide 34 text

May 13-14th in New York City ricon.io/east.html Talks, hacking, parties Dedicated to the future of Riak and distributed systems in production REGISTER NOW! https://ricon-east-2013.eventbrite.com/?discount=lovevnodes Thursday, March 7, 13

Slide 35

Slide 35 text

GETTING STARTED Downloads - http://docs.basho.com/riak/latest/downloads/ Docs - http://docs.basho.com Riak Source Code - github.com/basho/riak All Basho source Code - github.com/basho/ Riak Mailing List - http://bit.ly/riak-list Email or Tweet me @adron or [email protected] Thursday, March 7, 13

Slide 36

Slide 36 text

Let’s Talk UI & CLI - Demo Things Thursday, March 7, 13

Slide 37

Slide 37 text

#WHOIS Adron Hall | @adron | Coder, Messenger, Recon Thursday, March 7, 13