Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Riak Overview and Intro to 1.4

Riak Overview and Intro to 1.4

Originally delivered at the Baltimore Riak Meetup on 7/18/2013, this talk is an introduction to Riak followed by some details on the new 1.4 release.

Mark Phillips

July 22, 2013
Tweet

More Decks by Mark Phillips

Other Decks in Technology

Transcript

  1. About Basho About Riak Riak Data Access, APIs, and Languages

    Querying Riak 1.4 Selected Use Cases Getting Started and becoming a Riak Fanboy ROUGH AGENDA Monday, July 22, 13
  2. Founded late 2007 by group of ex-Akamai, Mitre, Apple 130

    Employees; >60% Dev; Distributed Company Sponsors of Riak, the Apache 2.0-licensed project Basho sells Riak Enterprise (Riak / Riak CS + Multi DC Repl) We generate recurring revenue and are hiring* :) Monday, July 22, 13
  3. Written by Basho to satisfy internal use case Apache 2.0-licensed

    First OSS release August 2009; 1.0 in Sept 2011 Mostly-written in Erlang with some C/C++ Dynamo-inspired Monday, July 22, 13
  4. Any node serves requests Deployed as cluster of nodes (>=5)

    Automatic Failover Durable No SPOF RIAK Distributed, masterless, highly- available key/value store Built-in replication (n=3) Dynamic data repartitioning Monday, July 22, 13
  5. RIAK DESIGN GOALS High availability Low latency (and durable!) Horizontal

    Scalability Fault tolerance Ops-friendly Predictability Monday, July 22, 13
  6. {“KEY” : “VALUE”} Values are stored against keys Key/Value +

    metadata = object Fundamental unit of replication Any data type will work. Encoded as binaries on disk. Soft limit of ~4MB on object size. Riak CS for larger values. Monday, July 22, 13
  7. <<BUCKET>>/<<KEY>> Virtual Namespace Bucket + Key = object address Buckets

    have properties All objects in buckets inherit properties No relationships between buckets Monday, July 22, 13
  8. INTERFACES HTTP API - Via a little piece of magic

    called Webmachine Protocol Buffers API - Thanks, Google! Largely-faithful REST implementation Compact, binary protocol Monday, July 22, 13
  9. CLIENT LIBS Python Ruby PHP OCaml Java Perl Erlang Node.js

    C/C++ Haskell Clojure Scala Go Dart .NET And more. Supported by either Basho or our community. Monday, July 22, 13
  10. 1 2 3 4 5 6 7 8 9 10

    11 12 13 14 15 16 17 18 CRUD // PUT PUT  /buckets/bucket/keys/key          //  User-­‐defined  key POST  /buckets/bucket/keys/key        //  Riak-­‐defined  key DELETE  /buckets/bucket/keys/key       GET  /buckets/bucket/keys/key         // GET // DELETE Monday, July 22, 13
  11. MapReduce Distributed processing system using Riak Pipe Efficient for targeted

    queries over known key range Write jobs in Erlang Monday, July 22, 13
  12. Riak Search Store and index documents (JSON, text, XML, etc)

    Current Riak Search supports subset of Solr API Next iteration (Yokozuna; in beta)will implement distributed Solr on Riak. It will be sexy. Looking for beta testers Monday, July 22, 13
  13. Secondary Indexing (2i) riak_object riak_object X-Riak-Index-email_bin X-Riak-Index-value_int “[email protected]” “42” Tag

    objects with custom metadata on PUT Exact match and range queries No multi-index queries yet Pagination and index-term return added in 1.4 Monday, July 22, 13
  14. - critical data When you have: - that should always

    be available - and can be modeled as keys and values* (* Hint: at scale, almost everything looks like a k/v store. Don’t be afraid to denormalize.) WHEN TO USE RIAK Monday, July 22, 13
  15. Metadata Users/Profiles Object Storage Sessions Sensor Data Logging Systems Record

    Systems Notification Systems RIAK USE CASES Monday, July 22, 13
  16. 2i Enhancements - Pagination and streaming results now possible -

    Results now sorted by index values and then keys - Matched index value is returned on ranges upon request Monday, July 22, 13
  17. Riak Control Enhancements - Riak Control is the Basho supported,

    OSS GUI - Staged clustering changes from 1.2 now in 1.4 Control - Standalone Node Management added for single-node ops Monday, July 22, 13
  18. Client API Enhancements - Client-specified timeouts added - Protocol Buffers

    supports all bucket props - Streaming list-buckets - PB interface now binds to multiple interfaces and ports Monday, July 22, 13
  19. Data Types - Counters - PN Counter is now available;

    goes up and down :) - Accessible via newly-added PB and HTTP endpoint - Type of CRDTs (first of many) - Like buttons, upvotes, etc. Monday, July 22, 13
  20. Object Storage Compactness - New binary format - Reduces storage

    overhead (especially for small objects) - Default in 1.4; must enable if upgrading Monday, July 22, 13
  21. Additional 1.4 Hotness - Improvements to ‘riak-admin transfers’ - Lager

    upgraded to 2.0 - ‘riak attach’ modified to use ‘-remsh’ - More than 170 bugs and issues resolved Monday, July 22, 13
  22. VOXER Using Riak for all operational storage and serving of

    data Super-useful communication platform for people and businesses Monday, July 22, 13
  23. INITIAL STATS 11 Riak Nodes ~ 500GB dataset ~ 20k

    peak concurrent users ~ 4MM Daily request Monday, July 22, 13
  24. GROWTH STATS* 100 Nodes ~1TB Data incoming/day 400k concurrent users

    2 billion requests/day Grew from 11 to 80 nodes in ~30 days Monday, July 22, 13
  25. Moved from Cassandra to Riak 100s Nodes, several clusters Add

    storage, serving Custom C backend Billions requests/day https://vimeo.com/53480727 Impression counting Using Riak Enterprise for MDC Monday, July 22, 13
  26. S3-API compatible and supports per-tenant reporting for billing and metering

    use cases. Additional APIs on the way. Multi-tenant cloud storage software for public and private clouds. Designed to provide simple, available, distributed cloud storage at any scale. Stores files of arbitrary size. Under the hood stores 1MB chunks along side a manifest. Stateless proxy (CS) does chunking. Riak does distribution, storage, etc. Monday, July 22, 13
  27. Data transfer is unidirectional (source -> sink). Bidirectional synchronization can

    be achieved by configuring a pair of connections between clusters. Extends Riak's capabilities with: - multi-datacenter replication - SNMP Configuration - JMX-Monitoring - 24x7 support from Basho Engineers One cluster acts as a "source cluster". The source cluster replicates its data to one or more "sink clusters" using either real-time or full sync. Monday, July 22, 13
  28. MULTI DC REPL Cluster-to-cluster replication over N data centers “Source”

    cluster talk to one or more “sink” clusters Hot failover between source and sink. Two forms of replication - Full sync - periodic exchanges of deltas (via merkle) - Real time - bi directional repl between more than one Monday, July 22, 13
  29. RIAK COMMUNITY Mailing List - 1500 developers IRC - 200+

    people every day yelling about software GitHub - 1000s of watchers; 300+ contributors to all projects Meetups - 10 Countries, 23 Cities, 3700+ Members Deployments - 1000s in production. Monday, July 22, 13
  30. GETTING STARTED Docs - docs.basho.com Riak Source Code - github.com/basho/riak

    All Basho source Code - github.com/basho/ Riak Mailing List - http://bit.ly/FjChC Email me - [email protected] Downloads - http://docs.basho.com/riak/latest/downloads/ Monday, July 22, 13
  31. October 29-30 in San Francisco ricon.io/west.html Talks, hacking, parties Dedicated

    to the future of Riak and distributed systems in production REGISTER NOW! http://ricon-west-2013.eventbrite.com/ ricon.io/west.html Monday, July 22, 13