Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Suuchi - Application Layer Sharding

Suuchi - Application Layer Sharding

Talk presented as part of Indian Linux User Group - Chennai (ILUGC) December '16 and Digital Ocean January '17 Meetups

References - https://github.com/ashwanthkumar/suuchi-sharding-talk

D90acaa01cb59a2b8b7e986958953eee?s=128

Ashwanth Kumar

December 10, 2016
Tweet

Transcript

  1. ashwanth kumar @_ashwanthkumar principal engineer suuchi - toolkit for application

    layer sharding
  2. from simple to unmanageable beasts how we scale systems? an

    opinionated view
  3. Service

  4. Service

  5. Service Service

  6. Service Service Service

  7. Service Service Service

  8. Service Service Service

  9. Service Service Service

  10. rise of KV stores distributed, replicated, fault-tolerant (optionally sorted) 2006

    BigTable from Google 2007 Dynamo from Amazon 2009 VoldemortDB from LinkedIn Cassandra from facebook 2008
  11. Distributed (NoSQL) Datastores Service Service Service

  12. Distributed (NoSQL) Datastores Service Service Service Network Latency

  13. data locality for low latency / data intensive applications

  14. Service Service Service Co-locate data to improve performance

  15. Sharded and replicated to improve throughput Service Service Service

  16. Service Service Service Deal with complex distributed system problems at

    the application layer
  17. suuchi github.com/ashwanthkumar/suuchi toolkit for application layer sharding

  18. components - transport uses http/2 with streaming

  19. components - membership static dynamic fault tolerance in case of

    node/process failure scaling up/down needs downtime of the system
  20. components - request routing Consistent hashing and random trees: Distributed

    caching protocols for relieving hot spots on the World Wide Web node 2 node 1 node 3 node 4
  21. components - request routing Consistent hashing and random trees: Distributed

    caching protocols for relieving hot spots on the World Wide Web node 2 node 1 node 3 node 4
  22. components - request routing Consistent hashing and random trees: Distributed

    caching protocols for relieving hot spots on the World Wide Web node 2 node 1 node 3 node 4
  23. components - request routing Consistent hashing and random trees: Distributed

    caching protocols for relieving hot spots on the World Wide Web node 2 node 3 node 4
  24. - Peer to Peer system - no single point of

    contact - Each node handles or forwards requests transparently - Uses pluggable partitioner scheme - Can be customized as weighted distribution / Rendezvous Hash etc. components - request routing
  25. - define a gRPC service using proto2 (or proto3) -

    generate the stubs in java / scala - implement the services - connect them together using Suuchi - Server abstraction getting started
  26. let’s see some code see reference for actual links

  27. suuchi.proto

  28. suuchi.proto

  29. implement proto service

  30. connect using Server abstraction

  31. - HTML Archive System - Handles 1000+ rps - write

    heavy system - Stores ~120TB of url and timestamp indexed HTML pages - Stats (as Monoids) Storage System* - All we want was approximate aggregates real-time - Real-time scheduler for our crawlers* - Finds out which of the 20 urls to crawl now out of 3+ billion urls - Helps crawler crawl 20+ million urls everyday suuchi @indix
  32. idea behind suuchi membership, request routing / sharding 2011 Gizzard

    from Twitter 2016 Suuchi 2016 Slicer from Google 2015 RingPop from Uber
  33. questions? references available at github.com/ashwanthkumar/suuchi-sharding-talk