Why we built a distributed system - DSConf 2018

Why we built a distributed system - DSConf 2018

D90acaa01cb59a2b8b7e986958953eee?s=128

Ashwanth Kumar

April 21, 2018
Tweet

Transcript

  1. 11.

    BigTable, 2006 Dynamo, 2007 Cassandra, 2008 Voldemort, 2009 rise of

    KV Stores distributed, replicated, fault-tolerant, sorted*
  2. 17.

    boils down to... Distributed Data Store + CoProcessors (Bigtable /

    HBase) …run arbitrary code “next” to each shard
  3. 18.

    Distributed Data Store + CoProcessors (Bigtable / HBase) - Business

    logic upgrade is painful - CoProcessors are not services, more an afterthought - Failure semantics are not well established - More applications means multiple coproc or single bloated coproc - Noisy neighbours / Impedance due to a shared datastore
  4. 20.

    In-house Vs Off-the-shelf In-house Off-the-shelf Features Subset Superset Moving parts

    Fully Controllable Community Controlled Ownership Implicit Acquired / Cultural Upfront cost High Low Expertise Hired / Retained / Nurtured Community
  5. 22.

    पांग ப Communication key=”foo” key=”bar” key=”baz” Request Routing Sync /

    Async Replication Replication Data Sharding Cluster Membership Primitives in a Distributed System
  6. 24.

    Suuchi Provides support for ... - underlying communication channel -

    routing queries to appropriate member - detecting your cluster members - replicating your data based on your strategy - local state via embedded KV store per node (optionally) github.com/ashwanthkumar/suuchi
  7. 26.

    Sharding / Routing + Consistent Hash Ring - Your own

    sharding technique? node 2 node 1 node 3 node 4 Consistent hashing and random trees: Distributed caching protocols for relieving hot spots on the World Wide Web
  8. 27.

    Sharding / Routing + Consistent Hash Ring - Your own

    sharding technique? node 2 node 1 node 3 node 4 Consistent hashing and random trees: Distributed caching protocols for relieving hot spots on the World Wide Web
  9. 28.

    Sharding / Routing + Consistent Hash Ring - Your own

    sharding technique? node 2 node 1 node 3 node 4 Consistent hashing and random trees: Distributed caching protocols for relieving hot spots on the World Wide Web
  10. 29.

    Sharding / Routing + Consistent Hash Ring - Your own

    sharding technique? Consistent hashing and random trees: Distributed caching protocols for relieving hot spots on the World Wide Web node 2 node 3 node 4
  11. 30.
  12. 31.

    Replication Provides high availability for write heavy systems at the

    cost of consistency sync async* every request is successful only if all the replicas succeeded
  13. 32.
  14. 33.

    Suuchi @ Indix • HTML Archive ◦ Handles 1000+ tps

    - write heavy system ◦ Stores 120 TB of url & timestamp indexed HTML pages • Stats (as Monoids) Aggregation System ◦ Approximate real-time aggregates ◦ Timeline & windowed queries • Real time scheduler for our Crawlers ◦ Prioritising which next batch of urls to crawl ◦ Helps crawl 20+ million urls per day
  15. 34.

    Ringpop from Uber, 2015 Gizzard from Twitter, 2011 Slicer from

    Google, 2016 Suuchi, 2016 Idea behind Suuchi Membership, Request Routing, Sharding etc.
  16. 35.