Why we built a distributed system - DSConf 2018

Why we built a distributed system - DSConf 2018

D90acaa01cb59a2b8b7e986958953eee?s=128

Ashwanth Kumar

April 21, 2018
Tweet

Transcript

  1. Why we built a distributed system DSConf 2018

  2. Sriram Ramachandrasekaran Principal Engineer, Indix https://github.com/brewkode

  3. 1B+ Products 67K+ Brands 2.5B+ Offers 6K+ Categories

  4. Crawl Parse Dedup Classify Extract Match Index Data Pipeline @

    Indix
  5. Crawl Parse Dedup Classify Extract Match Index Data Pipeline @

    Indix
  6. Desirable Properties • Handle Scale - order of TBs •

    Fault Tolerant • Operability
  7. Traditionally... • Tiered architecture • Scale individual tiers • Until...

  8. Traditionally... • Tiered architecture • Scale individual tiers ◦ Web

    Tier ◦ Service Tier • Until...
  9. Traditionally... • Tiered architecture • Scale individual tiers ◦ Web

    Tier ◦ Service Tier • Until...
  10. Essentially, we are looking to Scale data systems

  11. BigTable, 2006 Dynamo, 2007 Cassandra, 2008 Voldemort, 2009 rise of

    KV Stores distributed, replicated, fault-tolerant, sorted*
  12. Service Service Service Distributed Data Store

  13. Service Service Service Distributed Data Store Latency

  14. Distributed Service

  15. Distributed Service Data locality kills latency Increases Application Complexity

  16. Just having a distributed store isn’t enough! We need something

    more...
  17. boils down to... Distributed Data Store + CoProcessors (Bigtable /

    HBase) …run arbitrary code “next” to each shard
  18. Distributed Data Store + CoProcessors (Bigtable / HBase) - Business

    logic upgrade is painful - CoProcessors are not services, more an afterthought - Failure semantics are not well established - More applications means multiple coproc or single bloated coproc - Noisy neighbours / Impedance due to a shared datastore
  19. Applications need to OWN Scaling

  20. In-house Vs Off-the-shelf In-house Off-the-shelf Features Subset Superset Moving parts

    Fully Controllable Community Controlled Ownership Implicit Acquired / Cultural Upfront cost High Low Expertise Hired / Retained / Nurtured Community
  21. Ashwanth Kumar Principal Engineer, Indix https://github.com/ashwanthkumar

  22. पांग ப Communication key=”foo” key=”bar” key=”baz” Request Routing Sync /

    Async Replication Replication Data Sharding Cluster Membership Primitives in a Distributed System
  23. Introducing Suuchi DIY kit for building distributed systems github.com/ashwanthkumar/suuchi

  24. Suuchi Provides support for ... - underlying communication channel -

    routing queries to appropriate member - detecting your cluster members - replicating your data based on your strategy - local state via embedded KV store per node (optionally) github.com/ashwanthkumar/suuchi
  25. Communication + HandleOrForward + Scatter Gather uses http/2 with streaming

  26. Sharding / Routing + Consistent Hash Ring - Your own

    sharding technique? node 2 node 1 node 3 node 4 Consistent hashing and random trees: Distributed caching protocols for relieving hot spots on the World Wide Web
  27. Sharding / Routing + Consistent Hash Ring - Your own

    sharding technique? node 2 node 1 node 3 node 4 Consistent hashing and random trees: Distributed caching protocols for relieving hot spots on the World Wide Web
  28. Sharding / Routing + Consistent Hash Ring - Your own

    sharding technique? node 2 node 1 node 3 node 4 Consistent hashing and random trees: Distributed caching protocols for relieving hot spots on the World Wide Web
  29. Sharding / Routing + Consistent Hash Ring - Your own

    sharding technique? Consistent hashing and random trees: Distributed caching protocols for relieving hot spots on the World Wide Web node 2 node 3 node 4
  30. Membership static dynamic fault tolerance in case of node/process failure

    scaling up/down needs downtime of the system
  31. Replication Provides high availability for write heavy systems at the

    cost of consistency sync async* every request is successful only if all the replicas succeeded
  32. Storage + KeyValue + RocksDB - Your own abstraction? embedded

    KV store from FB for server workloads
  33. Suuchi @ Indix • HTML Archive ◦ Handles 1000+ tps

    - write heavy system ◦ Stores 120 TB of url & timestamp indexed HTML pages • Stats (as Monoids) Aggregation System ◦ Approximate real-time aggregates ◦ Timeline & windowed queries • Real time scheduler for our Crawlers ◦ Prioritising which next batch of urls to crawl ◦ Helps crawl 20+ million urls per day
  34. Ringpop from Uber, 2015 Gizzard from Twitter, 2011 Slicer from

    Google, 2016 Suuchi, 2016 Idea behind Suuchi Membership, Request Routing, Sharding etc.
  35. Thank you