Why we built a distributed system - DSConf, Pune 2018

3ad75e16884079ab99b47f7e0bc52577?s=47 Sriram
April 21, 2018

Why we built a distributed system - DSConf, Pune 2018

In the first edition of DSConf, we spoke about our reasons for why we built an in-house distributed system and how Suuchi - toolkit to build such systems evolved.

3ad75e16884079ab99b47f7e0bc52577?s=128

Sriram

April 21, 2018
Tweet

Transcript

  1. Why we built a distributed system DSConf 2018

  2. Sriram Ramachandrasekaran Principal Engineer, Indix https://github.com/brewkode

  3. 1B+ Products 67K+ Brands 2.5B+ Offers 6K+ Categories

  4. Crawl Parse Dedup Classify Extract Match Index Data Pipeline @

    Indix
  5. Crawl Parse Dedup Classify Extract Match Index Data Pipeline @

    Indix
  6. Desirable Properties • Handle Scale - order of TBs •

    Fault Tolerant • Operability
  7. Traditionally... • Tiered architecture • Scale individual tiers • Until...

  8. Traditionally... • Tiered architecture • Scale individual tiers ◦ Web

    Tier ◦ Service Tier • Until...
  9. Traditionally... • Tiered architecture • Scale individual tiers ◦ Web

    Tier ◦ Service Tier • Until...
  10. Essentially, we are looking to Scale data systems

  11. BigTable, 2006 Dynamo, 2007 Cassandra, 2008 Voldemort, 2009 rise of

    KV Stores distributed, replicated, fault-tolerant, sorted*
  12. Service Service Service Distributed Data Store

  13. Service Service Service Distributed Data Store Latency

  14. Distributed Service

  15. Distributed Service Data locality kills latency Increases Application Complexity

  16. Just having a distributed store isn’t enough! We need something

    more...
  17. boils down to... Distributed Data Store + CoProcessors (Bigtable /

    HBase) …run arbitrary code “next” to each shard
  18. Distributed Data Store + CoProcessors (Bigtable / HBase) - Business

    logic upgrade is painful - CoProcessors are not services, more an afterthought - Failure semantics are not well established - More applications means multiple coproc or single bloated coproc - Noisy neighbours / Impedance due to a shared datastore
  19. Applications need to OWN Scaling

  20. In-house Vs Off-the-shelf In-house Off-the-shelf Features Subset Superset Moving parts

    Fully Controllable Community Controlled Ownership Implicit Acquired / Cultural Upfront cost High Low Expertise Hired / Retained / Nurtured Community
  21. Ashwanth Kumar Principal Engineer, Indix https://github.com/ashwanthkumar

  22. पांग ப Communication key=”foo” key=”bar” key=”baz” Request Routing Sync /

    Async Replication Replication Data Sharding Cluster Membership Primitives in a Distributed System
  23. Introducing Suuchi DIY kit for building distributed systems github.com/ashwanthkumar/suuchi

  24. Suuchi Provides support for ... - underlying communication channel -

    routing queries to appropriate member - detecting your cluster members - replicating your data based on your strategy - local state via embedded KV store per node (optionally) github.com/ashwanthkumar/suuchi
  25. Communication + HandleOrForward + Scatter Gather uses http/2 with streaming

  26. Sharding / Routing + Consistent Hash Ring - Your own

    sharding technique? node 2 node 1 node 3 node 4 Consistent hashing and random trees: Distributed caching protocols for relieving hot spots on the World Wide Web
  27. Sharding / Routing + Consistent Hash Ring - Your own

    sharding technique? node 2 node 1 node 3 node 4 Consistent hashing and random trees: Distributed caching protocols for relieving hot spots on the World Wide Web
  28. Sharding / Routing + Consistent Hash Ring - Your own

    sharding technique? node 2 node 1 node 3 node 4 Consistent hashing and random trees: Distributed caching protocols for relieving hot spots on the World Wide Web
  29. Sharding / Routing + Consistent Hash Ring - Your own

    sharding technique? Consistent hashing and random trees: Distributed caching protocols for relieving hot spots on the World Wide Web node 2 node 3 node 4
  30. Membership static dynamic fault tolerance in case of node/process failure

    scaling up/down needs downtime of the system
  31. Replication Provides high availability for write heavy systems at the

    cost of consistency sync async* every request is successful only if all the replicas succeeded
  32. Storage + KeyValue + RocksDB - Your own abstraction? embedded

    KV store from FB for server workloads
  33. Suuchi @ Indix • HTML Archive ◦ Handles 1000+ tps

    - write heavy system ◦ Stores 120 TB of url & timestamp indexed HTML pages • Stats (as Monoids) Aggregation System ◦ Approximate real-time aggregates ◦ Timeline & windowed queries • Real time scheduler for our Crawlers ◦ Prioritising which next batch of urls to crawl ◦ Helps crawl 20+ million urls per day
  34. Ringpop from Uber, 2015 Gizzard from Twitter, 2011 Slicer from

    Google, 2016 Suuchi, 2016 Idea behind Suuchi Membership, Request Routing, Sharding etc.
  35. Slides designed by www.swathiravichandran.com