Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Suuchi - Application Layer Sharding

Suuchi - Application Layer Sharding

Talk presented as part of Indian Linux User Group - Chennai (ILUGC) December '16 and Digital Ocean January '17 Meetups

References - https://github.com/ashwanthkumar/suuchi-sharding-talk

Ashwanth Kumar

December 10, 2016
Tweet

More Decks by Ashwanth Kumar

Other Decks in Technology

Transcript

  1. rise of KV stores distributed, replicated, fault-tolerant (optionally sorted) 2006

    BigTable from Google 2007 Dynamo from Amazon 2009 VoldemortDB from LinkedIn Cassandra from facebook 2008
  2. components - membership static dynamic fault tolerance in case of

    node/process failure scaling up/down needs downtime of the system
  3. components - request routing Consistent hashing and random trees: Distributed

    caching protocols for relieving hot spots on the World Wide Web node 2 node 1 node 3 node 4
  4. components - request routing Consistent hashing and random trees: Distributed

    caching protocols for relieving hot spots on the World Wide Web node 2 node 1 node 3 node 4
  5. components - request routing Consistent hashing and random trees: Distributed

    caching protocols for relieving hot spots on the World Wide Web node 2 node 1 node 3 node 4
  6. components - request routing Consistent hashing and random trees: Distributed

    caching protocols for relieving hot spots on the World Wide Web node 2 node 3 node 4
  7. - Peer to Peer system - no single point of

    contact - Each node handles or forwards requests transparently - Uses pluggable partitioner scheme - Can be customized as weighted distribution / Rendezvous Hash etc. components - request routing
  8. - define a gRPC service using proto2 (or proto3) -

    generate the stubs in java / scala - implement the services - connect them together using Suuchi - Server abstraction getting started
  9. - HTML Archive System - Handles 1000+ rps - write

    heavy system - Stores ~120TB of url and timestamp indexed HTML pages - Stats (as Monoids) Storage System* - All we want was approximate aggregates real-time - Real-time scheduler for our crawlers* - Finds out which of the 20 urls to crawl now out of 3+ billion urls - Helps crawler crawl 20+ million urls everyday suuchi @indix
  10. idea behind suuchi membership, request routing / sharding 2011 Gizzard

    from Twitter 2016 Suuchi 2016 Slicer from Google 2015 RingPop from Uber