Slide 1

Slide 1 text

basho Core Concepts Introduction to Riak AKQA 24th July 2013 Friday, 26 July 13

Slide 2

Slide 2 text

WHO AM I? Joel Jacobson Technical Evangelist Basho Technologies @joeljacobson Friday, 26 July 13

Slide 3

Slide 3 text

Distributed computing is HARD. Friday, 26 July 13

Slide 4

Slide 4 text

PROBLEMS? • Concurrency and latency at scale • Data consistency • Uptime/failover • Multi Tenancy • SLA’s Friday, 26 July 13

Slide 5

Slide 5 text

WHAT IS RIAK? • Key-Value store + extras • Distributed and horizontally scalable • Fault-tolerant • Highly available • Built for the web Friday, 26 July 13

Slide 6

Slide 6 text

INSPIRED BY AMAZON DYNAMO • White paper released to describe a database system to be used for their shopping cart • Masterless, peer-coordinated replication • Dynamo inspired data-stores; Riak, Cassandra, Voldemort etc. • Consistent hashing - no sharding :-) • Eventually consistent Friday, 26 July 13

Slide 7

Slide 7 text

RIAK KEY-VALUE STORE • Simple operations - GET, PUT, DELETE • Value is opaque, with metadata • Extras, e.g. • Secondary Indexes (2i) • MapReduce • Full text search Friday, 26 July 13

Slide 8

Slide 8 text

HORIZONTALLY SCALABLE • Near linear scalability • Query load and data are spread evenly • Add more nodes and get more: • ops/second • storage capacity • compute power (for Map/Reduce) Friday, 26 July 13

Slide 9

Slide 9 text

FAULT TOLERANT • All nodes participate equally - no single point of failure (SPOF) • All data is replicated • Clusters self heal - Handoff, Active Anti-Entropy • Cluster transparently survives... • node failure • network partitions • Built on Erlang/OTP (designed for FT) Friday, 26 July 13

Slide 10

Slide 10 text

HIGHLY AVAILABLE • Any node can serve client requests • Fallbacks are used when nodes are down • Always accepts read and write requests • Per-request quorums Friday, 26 July 13

Slide 11

Slide 11 text

QUORUMS - N/R/W • Tunable down to bucket level • n_val = 3 by default • w / r = 2 by default • w = 1 - Quicker response time, read could be inconsistent in short term • w = all - Slower response, increased data consistency Friday, 26 July 13

Slide 12

Slide 12 text

CAP THEOREM • C = Consistency • A = Availability • P = Partition Tolerance • Cap theorem states that a distributed shared data system can at most support 2 out of these 3 properties DB DB DB Client Client Network/Data Partition Friday, 26 July 13

Slide 13

Slide 13 text

THE RING Friday, 26 July 13

Slide 14

Slide 14 text

REPLICATION • Replicated to 3 nodes by default (n_val =3, which is configurable) Friday, 26 July 13

Slide 15

Slide 15 text

DISASTER SCENARIO • Node fails • Request goes to fallback • Node comes back • Handoff - data retuned to recovered node • Normal operations resume automatically Friday, 26 July 13

Slide 16

Slide 16 text

DISASTER SCENARIO • Node fails • Request goes to fallback • Node comes back • Handoff - data retuned to recovered node • Normal operations resume automatically hash(“user_id”) Friday, 26 July 13

Slide 17

Slide 17 text

ACTIVE ANTI-ENTROPY • Automatically repair inconsistencies in data • Active Anti-Entropy was new in 1.3.0 and uses Merkle trees to compare data in partitions and periodically ensure consistency • Active Anti-Entropy runs as a background process • Can also be configured as a manual process Friday, 26 July 13

Slide 18

Slide 18 text

CONFLICT RESOLUTION • Network partitions and concurrent actors modifying the same data cause data divergence • Riak provides two solutions to manage this that can be set on bucket level: • Last Write Wins - an approach used for some use cases • Vector Clocks - Retain “sibling” copies of data for merging Friday, 26 July 13

Slide 19

Slide 19 text

VECTOR CLOCKS • Every node has an ID • Send last-seen vector clock in every “put” request • Can be viewed as ‘commit history’ e.g Git • Lets you decide conflicts Friday, 26 July 13

Slide 20

Slide 20 text

SIBLING CREATION 0 3 2 1 Object v1 Object v1 [{a,3}] [{a,2},{b,1}] 1) 2) [{a,3}] [{a,2},{b,1}] 0 3 2 1 Object v1 Object v1 Object v1 • Siblings can be created by: • Simultaneous writes (based on same object version) • Network partitions • Writes to existing key without submitting vector clock Friday, 26 July 13

Slide 21

Slide 21 text

STORAGE BACKENDS • Bitcask • LevelDB • Memory • Multi Friday, 26 July 13

Slide 22

Slide 22 text

BITCASK • A fast, append-only key-value store • In memory key lookup table (key_dir) data on disk • Closed files are immutable • Merging cleans up old data • Developed by Basho Technologies • Suitable for bounded data, e.g. reference data Friday, 26 July 13

Slide 23

Slide 23 text

LEVELDB • Key-Value storage developed by Google • Append-only for very large data sets • Multiple levels of SSTable-like data structures • Allows for more advanced querying (2i) • It includes compression (Snappy algorithm) • Suitable for unbounded data or advanced querying Friday, 26 July 13

Slide 24

Slide 24 text

MEMORY • Data is never persisted to disk • Typically used for “test” databases (unit tests... etc) • Definable memory limits per vnode • Configurable object expiry • Useful for highly transient data Friday, 26 July 13

Slide 25

Slide 25 text

MULTI • Configure multiple storage engines for different types of data • Configure the “default” storage engine • Choose storage engine on per bucket basis • No reason not to use it Friday, 26 July 13

Slide 26

Slide 26 text

CLIENT APIS • Riak supports two main client types: • REST based HTTP Interface • Easy to use from command line and simple scripts • Useful if using intermediate caching layer, e.g. Varnish • Protocol Buffers • Optimized binary encoding standard developed by Google • More performant than HTTP interface Friday, 26 July 13

Slide 27

Slide 27 text

CLIENT LIBRARIES • Client libraries supported by Basho: • Community supported languages and frameworks: • C/C++, Clojure, Common Lisp, Dart, Django, Go, Grails, Griffon, Groovy, Erlang, Haskell, Java, .NET, Node.js, OCaml , Perl, PHP, Play, Python, Racket, Ruby, Scala, Smalltalk Friday, 26 July 13

Slide 28

Slide 28 text

• Using Riak as datastore for all back-end systems supporting Angry Birds • Game-state storage, ID/Login, Payments, Push notifications, analytics, advertisements • 9 clusters in use with over 100 nodes • 263 million active monthly users Friday, 26 July 13

Slide 29

Slide 29 text

• Spine2 project - storing patient data (80 million+) • 500 complex messages per second • 20,000 integrated end points • 0 data loss • 99.9% availability SLA Friday, 26 July 13

Slide 30

Slide 30 text

• Push to talk application • Billions of requests daily • > 50 dedicated servers • Everything stored in Riak • https://github.com/mranney/node_riak Friday, 26 July 13

Slide 31

Slide 31 text

MULTI DATACENTER REPLICATION (MDC) • Allows data to be replicated between clusters in different data centers. Can handle larger latencies. • Two synchronization modes that can be used together: real- time and full sync • Set up as uni-directional or bi-directional replication • Can be used for global load-balancing, business continuity and back-ups Friday, 26 July 13

Slide 32

Slide 32 text

RIAK-CS • Built on top of Riak and supports MDC • S3 compatible object storage • Supports multi-tenancy • Per-tenant usage data and statistics on network I/O • Supports Objects of Arbitrary Content Type Up to 5TB • Often used to build private cloud storage Friday, 26 July 13

Slide 33

Slide 33 text

PLAY AROUND WITH RIAK? • https://github.com/joeljacobson/riak-dev-cluster • https://github.com/joeljacobson/vagrant-riak-cluster Friday, 26 July 13

Slide 34

Slide 34 text

THANK YOU [email protected] basho Friday, 26 July 13