Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Relational to NoSQL Riak techtalk

Relational to NoSQL Riak techtalk

Silicon valley riak meetup talk presentation

Pavan Venkatesh

February 13, 2013
Tweet

More Decks by Pavan Venkatesh

Other Decks in Technology

Transcript

  1. AGENDA •  Relational database architecture - Issues & pain points

    •  Riak overview and how it addresses these pain points •  Riak architecture •  Use cases •  Riak EDS & CS •  Q & A
  2. About Basho Our Mission is to Be The Leader in

    Distributed Systems •  Founded January 2008 •  More than 100 employees •  Headquartered in Cambridge, with regional offices in San Francisco, Washington DC, London and Tokyo •  Makers of Riak- A popular distributed key- value store •  Thousands of Users Worldwide including over 20% of the Fortune 50 •  Strategic Partners include Citrix, IDC Frontier, Yahoo! Japan, and Microsoft
  3. Key-Value Data Model •  Keys are grouped into buckets. • 

    All data (objects) are referenced by keys •  Object is composed of metadata and value Object/key Operations KEY VALUE KEY VALUE KEY VALUE bucket
  4. Master Slave Slave Slave Single Point of Failure !! No

    Write Scalability !! Master-Slave Architecture Log shipping
  5. Sharding Architecture Shard 1 Shard 2 Shard 3 Shard 4

    Shard 5 Clients Shard 6 Write Scalability Slaves Cannot dynamically scale !! Hard coded and high maintenance !! Read Scalability Writes
  6. Masterless & Highly Available Any node can serve client requests

    Fallbacks are used when nodes are down Always accepts read and write requests Per-request quorums
  7. Consistent Hashing & The Ring •  160-bit integer keyspace • 

    Divided into fixed number of evenly-sized partitions •  Partitions are claimed by nodes in the cluster •  Replicas go to the N partitions following the key 32 partitions N=3 node 0 node 1 node 2 node 3 hash(“product/iphone”) 2160/4 2160/2 0
  8. Failure Scenario •  Node fails •  Requests go to fallback

    nodes hash(“product/iphone”) node 0 node 1 node 2 node 3 X X X X X X X X
  9. Hinted Handoff •  Node comes back •  “Handoff” - data

    returns to recovered node •  Normal operations resume hash(“product/iphone”) node 0 node 1 node 2 node 3
  10. Riak’s core capability Fault Tolerant All nodes participate equally (no

    SPOF) All data is replicated (n=3 by default) Cluster transparently survives node failure & network partition
  11. Vector Clocks Riak uses vector clocks for versioning Clients resolve

    conflicts or choose last- write-wins policy (settable per-bucket)
  12. Two APIs HTTP (just like the web) Protocol Bu"ers (thank

    you, Google) Client Libraries Ruby, Node.js, Java, Python, Perl, OCaml, Erlang, PHP, C, Squeak, Smalltalk, Pharoah, Clojure, Scala, Haskell, Lisp, Go, .NET, Play, and more (supported by either Basho or the community).
  13. Riak Backend • Riak has a pluggable backend architecture • Bitcask, LevelDB

    are used the most in production depending on use-case • All writes are appends to a file • This provides crash safety and fast writes
  14. Tunable Consistency •  n_val - number of replica to store;

    bucket-level setting. Defaults to “3”. •  w - number of replicas required for a successful write; Defaults to “2”. •  r - number of replica acks required for a successful read. request-level setting. Defaults to “2”. •  pr, pw & dw • Tweak consistency vs. availability
  15. Accessing Data in Riak Retrieving Single Objects •  Support for

    retrieving the object associated with a particular bucket / key •  Support for retrieving all of the keys associated with a particular bucket Object/Key Operations Collecting, Parsing, and Storing Data •  Distributed, full-text search engine with an easy-to-use query language, a Solr-like HTTP interface and a Apache Lucene-style query syntax •  Support for a wide variety of mime types, including JSON, plain text, XML and Erlang) •  Ideal for indexing JSON documents, as indexes are built automatically from a schema. Riak Search Seeking Reverse Lookups on Data Stored •  Provides the ability, at write time, to tag an object stored in Riak with one or more values (key/value metadata), which can then be queried •  Useful for finding data that is based on terms other than an objects’ bucket/key pair, or for adding metadata values to a binary object or opaque blob Secondary Indexes (Riak 2i) Processing a Large Dataset •  Provides the general ability to analyze and aggregate data in phases with data locality •  Features Javascript support and Erlang for performance benefit MapReduce Riak Search and 2I Query Results Can be Used as an Input to MapReduce
  16. What Riak Isn’t •  Not relational •  No fixed schema

    •  No transactional support •  Not right for every project •  Large objects (Riak CS is a good fit here) •  Dynamic queries(SQL)
  17. Ideal Riak Scenarios •  When you have enough data to

    require >1 physical machine (preferably > 4) •  When availability is the top requirement •  When your data can be modeled as keys and values When to Use Popular Use Cases •  Ad Networks •  Digital Media •  On-Line Games •  Social Networks •  Social Analysis •  Cloud Operators •  Messaging Services •  Product Catalogs •  Document Management •  Health Care Information Management
  18. Enstratus •  It is a cloud infrastructure management solution for

    deploying and managing enterprise-class applications •  Moved from MySQL to Riak. Reasons- •  Write Scalability •  Resilience to failure across multiple datacenters •  Stores machine and state information, and data supporting analytics and audit control. •  George Reese gave an excellent talk during Ricon last year, link below http://vimeo.com/54887751 “As I’ve looked at a number of problem domains from customers and our own systems, you see this pattern where a relational database has been used just because it’s the default… and the reality is that more of the world is eventually consistent than not”, said George Reese, CTO of enStratus
  19. Mad Mimi •  Mad Mimi is an email marketing service

    that allows users to create, send, and track email campaigns in a fresh, novel way without using templates •  Their data was growing fast and had to choose between sharding MySQL or switch to distributed database •  They use Riak to track email statistics on up to 20 million emails per day, •  Based on the success they’ve had with Riak, they hope to move all of their email tracking statistics to Riak and eliminate MySQL entirely “Riak is built to be a distributed data store.. it’s a great tool and it does just what is says on the box, requiring the least amount of operational effort compared to alternatives”, said Marc Heiligers, CTO Mad Mimi
  20. Web / Mobile App Growth Case Study for Top Rated

    Apple App Store App •  #4 most popular Apple App Store Social Networking App at EOY behind Facebook, Skype and Twitter •  Truly Viral Growth: Scaled 10x between Thanksgiving and New Years Day •  Required scaling across multiple IaaS / hosting providers •  Surpassed one billion operations per day
  21. Mobile-to-Mobile Content Store Bump – Low Latency and Always Available

    •  800 million pieces of structural data in Riak, including Photos, Chats, and Contact Cards. •  10 million active users •  77 million downloads to date •  Switched to Riak in August 2011 •  #7 Most Downloaded iPhone App
  22. Active-Passive Architecture Master Slave Slave Slave Passive Master Wastage of

    resources !! No write or read scalability !! Failover takes time !!
  23. Active-Active Architecture Master Slave Slave Slave Master Slave Slave Slave

    No conflict detection/resolution !! Difficult to manage !! Operational complexity during failover !!
  24. Riak MDC- (EDS) Cloud Mobile Social Data Center #2 Data

    Center #3 Data Center #1 Multi-Data Center Replication Applications, Users and Machines Generate Data 1 2 Riak Stores and Manages Data Efficiently and Effectively •  Clusters are local to regional users to solve latency •  Replication is uni-directional, remote clusters can be setup to replicate data back to a primary cluster, thus synchronizing bi-directionally. •  Easily deploy in many regional zones •  Write everywhere solution •  Easy to scale, can easily add additional data centers
  25. Multi-Device Session Store Case Study Showcases Seamless User Experience The

    Global Session Store Manages a Seamless User Session throughout a Customer’s multi-mode experience, from Web to device Philadelphia Data Center Denver
  26. Basho’s Product Family Distributed Data Technology is Our Passion EnterpriseDS

    Open Source Distributed Database Commercial Distributed Database Distributed Cloud Storage Platform • Always-available, scalable, low-cost NoSQL database • Over 35,000 Downloads per Month • Thousands of users worldwide • Available Since Sept 2009 • Version 1.0 unveiled September 2011 • Adds multi-data center replication, monitoring & 24x7 support • Requires commercial contract and secure download • Version 1.1 launched with Riak Control in Feb 2012 • Version 1.2 launched in August 2012 • Expands with multi- tenancy, large object support, metering and Amazon S3 API • Requires commercial contract • Launched on March 27, 2012 • Used by multiple global cloud operators
  27. What is Riak CS? Key features: •  Multi-Tenant support • 

    User Authentication and Authorization •  Amazon S3 API-compatibility •  Per-Tenant visibility •  Provisioning, Metering, Billing and Reporting •  Support for objects up to 5 GB in size
  28. Reporting Large Objects AuthZ Riak CS Use Cases Storage for

    Cloud Computing S3 Without AWS Cloud Drive (General Content Storage) Backup-as-a- Service Archival and Preservation Integration with Workflow Multi-Tenancy
  29. Riak EDS or CS Does data unavailability costs thousands of

    $/minute? Riak EDS (Enterprise Data Store) Do you want to build a cloud storage service for your business? Riak CS (Cloud Storage)
  30. Backups •  Bitcask and LevelDB are both Log-structure stores; cp,

    rsync, tar, custom backup tools will work •  FS-level snapshots of directory; can be done while node is running •  Backups aren't yet perfected and that future releases will have more efficient, specialized backup methods for each backend
  31. Stats and Monitoring (1) • Riak exposes data about current operating

    status (counters, histograms, etc.) via the HTTP /stats endpoint or ‘riak- admin status’ • Anything that speaks HTTP can be plugged into Riak • Plugins exist for most OSS monitoring tools (munin, cacti, nagios, graphite, statsd)
  32. Stats and Monitoring (2) •  ‘riaknostic’ is a suite of

    diagnostic checks that can used to debug your cluster before it’s in production; checks for common misconfigurations •  Riak Control is a full-fledged management GUI that Basho develops and maintains.
  33. Riak 1.3 – RC Now •  Active Anti Entropy • 

    Replication enhancements for MDC •  IPv6 support •  New Look for Riak Control
  34. Take Away •  Master-Slave architecture •  Application sharding •  Distributed

    model •  Always available •  Active-Active with write scalability •  Active-Passive with read scalability Say “NO” to Say “Yes” to
  35. Resources Basho docs http://docs.basho.com/ Riak fast track http://docs.basho.com/riak/latest/tutorials/fast-track/ One of

    our engineers Sean Cribbs gave a talk on Schema Design for NoSQL data in Riak http://glennas.wordpress.com/2011/03/12/schema-design-for- nosql-data-in-riak-sean-cribbs-of-basho/ Basho Blog http://basho.com/blog/technical/