Why we built a distributed system - DSConf 2018

Slide 1

Slide 1 text

Why we built a distributed system DSConf 2018

Slide 2

Slide 2 text

Sriram Ramachandrasekaran Principal Engineer, Indix https://github.com/brewkode

Slide 3

Slide 3 text

1B+ Products 67K+ Brands 2.5B+ Offers 6K+ Categories

Slide 4

Slide 4 text

Crawl Parse Dedup Classify Extract Match Index Data Pipeline @ Indix

Slide 5

Slide 5 text

Crawl Parse Dedup Classify Extract Match Index Data Pipeline @ Indix

Slide 6

Slide 6 text

Desirable Properties ● Handle Scale - order of TBs ● Fault Tolerant ● Operability

Slide 7

Slide 7 text

Traditionally... ● Tiered architecture ● Scale individual tiers ● Until...

Slide 8

Slide 8 text

Traditionally... ● Tiered architecture ● Scale individual tiers ○ Web Tier ○ Service Tier ● Until...

Slide 9

Slide 9 text

Traditionally... ● Tiered architecture ● Scale individual tiers ○ Web Tier ○ Service Tier ● Until...

Slide 10

Slide 10 text

Essentially, we are looking to Scale data systems

Slide 11

Slide 11 text

BigTable, 2006 Dynamo, 2007 Cassandra, 2008 Voldemort, 2009 rise of KV Stores distributed, replicated, fault-tolerant, sorted*

Slide 12

Slide 12 text

Service Service Service Distributed Data Store

Slide 13

Slide 13 text

Service Service Service Distributed Data Store Latency

Slide 14

Slide 14 text

Distributed Service

Slide 15

Slide 15 text

Distributed Service Data locality kills latency Increases Application Complexity

Slide 16

Slide 16 text

Just having a distributed store isn’t enough! We need something more...

Slide 17

Slide 17 text

boils down to... Distributed Data Store + CoProcessors (Bigtable / HBase) …run arbitrary code “next” to each shard

Slide 18

Slide 18 text

Distributed Data Store + CoProcessors (Bigtable / HBase) - Business logic upgrade is painful - CoProcessors are not services, more an afterthought - Failure semantics are not well established - More applications means multiple coproc or single bloated coproc - Noisy neighbours / Impedance due to a shared datastore

Slide 19

Slide 19 text

Applications need to OWN Scaling

Slide 20

Slide 20 text

In-house Vs Off-the-shelf In-house Off-the-shelf Features Subset Superset Moving parts Fully Controllable Community Controlled Ownership Implicit Acquired / Cultural Upfront cost High Low Expertise Hired / Retained / Nurtured Community

Slide 21

Slide 21 text

Ashwanth Kumar Principal Engineer, Indix https://github.com/ashwanthkumar

Slide 22

Slide 22 text

पांग ப Communication key=”foo” key=”bar” key=”baz” Request Routing Sync / Async Replication Replication Data Sharding Cluster Membership Primitives in a Distributed System

Slide 23

Slide 23 text

Introducing Suuchi DIY kit for building distributed systems github.com/ashwanthkumar/suuchi

Slide 24

Slide 24 text

Suuchi Provides support for ... - underlying communication channel - routing queries to appropriate member - detecting your cluster members - replicating your data based on your strategy - local state via embedded KV store per node (optionally) github.com/ashwanthkumar/suuchi

Slide 25

Slide 25 text

Communication + HandleOrForward + Scatter Gather uses http/2 with streaming

Slide 26

Slide 26 text

Sharding / Routing + Consistent Hash Ring - Your own sharding technique? node 2 node 1 node 3 node 4 Consistent hashing and random trees: Distributed caching protocols for relieving hot spots on the World Wide Web

Slide 27

Slide 27 text

Slide 28

Slide 28 text

Slide 29

Slide 29 text

Sharding / Routing + Consistent Hash Ring - Your own sharding technique? Consistent hashing and random trees: Distributed caching protocols for relieving hot spots on the World Wide Web node 2 node 3 node 4

Slide 30

Slide 30 text

Membership static dynamic fault tolerance in case of node/process failure scaling up/down needs downtime of the system

Slide 31

Slide 31 text

Replication Provides high availability for write heavy systems at the cost of consistency sync async* every request is successful only if all the replicas succeeded

Slide 32

Slide 32 text

Storage + KeyValue + RocksDB - Your own abstraction? embedded KV store from FB for server workloads

Slide 33

Slide 33 text

Suuchi @ Indix ● HTML Archive ○ Handles 1000+ tps - write heavy system ○ Stores 120 TB of url & timestamp indexed HTML pages ● Stats (as Monoids) Aggregation System ○ Approximate real-time aggregates ○ Timeline & windowed queries ● Real time scheduler for our Crawlers ○ Prioritising which next batch of urls to crawl ○ Helps crawl 20+ million urls per day

Slide 34

Slide 34 text

Ringpop from Uber, 2015 Gizzard from Twitter, 2011 Slicer from Google, 2016 Suuchi, 2016 Idea behind Suuchi Membership, Request Routing, Sharding etc.

Slide 35

Slide 35 text

Thank you