Service Service Service
Distributed Data Store
Latency
Slide 14
Slide 14 text
Distributed Service
Slide 15
Slide 15 text
Distributed Service
Data locality
kills
latency
Increases
Application
Complexity
Slide 16
Slide 16 text
Just having a distributed store
isn’t enough!
We need something more...
Slide 17
Slide 17 text
boils down to...
Distributed Data Store
+
CoProcessors
(Bigtable / HBase)
…run arbitrary code “next” to each shard
Slide 18
Slide 18 text
Distributed Data Store + CoProcessors
(Bigtable / HBase)
- Business logic upgrade is painful
- CoProcessors are not services, more an afterthought
- Failure semantics are not well established
- More applications means multiple coproc or single bloated coproc
- Noisy neighbours / Impedance due to a shared datastore
Slide 19
Slide 19 text
Applications need to
OWN
Scaling
Slide 20
Slide 20 text
In-house Vs Off-the-shelf
In-house Off-the-shelf
Features Subset Superset
Moving parts Fully Controllable Community Controlled
Ownership Implicit Acquired / Cultural
Upfront cost High Low
Expertise Hired / Retained / Nurtured Community
Slide 21
Slide 21 text
Ashwanth
Kumar
Principal Engineer, Indix
https://github.com/ashwanthkumar
Slide 22
Slide 22 text
पांग
ப
Communication
key=”foo”
key=”bar”
key=”baz”
Request Routing
Sync / Async
Replication
Replication
Data Sharding
Cluster Membership
Primitives in a Distributed System
Slide 23
Slide 23 text
Introducing Suuchi
DIY kit for building distributed systems
github.com/ashwanthkumar/suuchi
Slide 24
Slide 24 text
Suuchi
Provides support for ...
- underlying communication channel
- routing queries to appropriate member
- detecting your cluster members
- replicating your data based on your strategy
- local state via embedded KV store per node (optionally)
github.com/ashwanthkumar/suuchi
Slide 25
Slide 25 text
Communication
+ HandleOrForward
+ Scatter Gather
uses http/2 with streaming
Slide 26
Slide 26 text
Sharding / Routing
+ Consistent Hash Ring
- Your own sharding technique?
node 2
node 1
node 3
node 4
Consistent hashing and random trees: Distributed
caching protocols for relieving hot spots on the
World Wide Web
Slide 27
Slide 27 text
Sharding / Routing
+ Consistent Hash Ring
- Your own sharding technique?
node 2
node 1
node 3
node 4
Consistent hashing and random trees: Distributed
caching protocols for relieving hot spots on the
World Wide Web
Slide 28
Slide 28 text
Sharding / Routing
+ Consistent Hash Ring
- Your own sharding technique?
node 2
node 1
node 3
node 4
Consistent hashing and random trees: Distributed
caching protocols for relieving hot spots on the
World Wide Web
Slide 29
Slide 29 text
Sharding / Routing
+ Consistent Hash Ring
- Your own sharding technique?
Consistent hashing and random trees: Distributed
caching protocols for relieving hot spots on the
World Wide Web
node 2
node 3
node 4
Slide 30
Slide 30 text
Membership
static dynamic
fault tolerance in case of
node/process failure
scaling up/down needs downtime
of the system
Slide 31
Slide 31 text
Replication
Provides high availability for write heavy
systems at the cost of consistency
sync async*
every request is successful only if
all the replicas succeeded
Slide 32
Slide 32 text
Storage
+ KeyValue
+ RocksDB
- Your own abstraction?
embedded KV store from FB for server
workloads
Slide 33
Slide 33 text
Suuchi @ Indix
● HTML Archive
○ Handles 1000+ tps - write heavy system
○ Stores 120 TB of url & timestamp indexed HTML pages
● Stats (as Monoids) Aggregation System
○ Approximate real-time aggregates
○ Timeline & windowed queries
● Real time scheduler for our Crawlers
○ Prioritising which next batch of urls to crawl
○ Helps crawl 20+ million urls per day
Slide 34
Slide 34 text
Ringpop
from
Uber,
2015
Gizzard
from
Twitter,
2011
Slicer from
Google,
2016
Suuchi, 2016
Idea behind Suuchi
Membership, Request Routing, Sharding etc.