Suuchi - Application Layer Sharding

ashwanth kumar @_ashwanthkumar principal engineer suuchi - toolkit for application
layer sharding

from simple to unmanageable beasts how we scale systems? an
opinionated view

Service

Service Service

Service Service Service

rise of KV stores distributed, replicated, fault-tolerant (optionally sorted) 2006
BigTable from Google 2007 Dynamo from Amazon 2009 VoldemortDB from LinkedIn Cassandra from facebook 2008

Distributed (NoSQL) Datastores Service Service Service

Distributed (NoSQL) Datastores Service Service Service Network Latency

data locality for low latency / data intensive applications

Service Service Service Co-locate data to improve performance

Sharded and replicated to improve throughput Service Service Service

Service Service Service Deal with complex distributed system problems at
the application layer

suuchi github.com/ashwanthkumar/suuchi toolkit for application layer sharding

components - transport uses http/2 with streaming

components - membership static dynamic fault tolerance in case of
node/process failure scaling up/down needs downtime of the system

components - request routing Consistent hashing and random trees: Distributed
caching protocols for relieving hot spots on the World Wide Web node 2 node 1 node 3 node 4

components - request routing Consistent hashing and random trees: Distributed
caching protocols for relieving hot spots on the World Wide Web node 2 node 3 node 4

- Peer to Peer system - no single point of
contact - Each node handles or forwards requests transparently - Uses pluggable partitioner scheme - Can be customized as weighted distribution / Rendezvous Hash etc. components - request routing

- define a gRPC service using proto2 (or proto3) -
generate the stubs in java / scala - implement the services - connect them together using Suuchi - Server abstraction getting started

let’s see some code see reference for actual links

suuchi.proto

implement proto service

connect using Server abstraction

- HTML Archive System - Handles 1000+ rps - write
heavy system - Stores ~120TB of url and timestamp indexed HTML pages - Stats (as Monoids) Storage System* - All we want was approximate aggregates real-time - Real-time scheduler for our crawlers* - Finds out which of the 20 urls to crawl now out of 3+ billion urls - Helps crawler crawl 20+ million urls everyday suuchi @indix

idea behind suuchi membership, request routing / sharding 2011 Gizzard
from Twitter 2016 Suuchi 2016 Slicer from Google 2015 RingPop from Uber

questions? references available at github.com/ashwanthkumar/suuchi-sharding-talk

Suuchi - Application Layer Sharding

Suuchi - Application Layer Sharding

Ashwanth Kumar

More Decks by Ashwanth Kumar

Other Decks in Technology

Featured

Transcript

ashwanth kumar @_ashwanthkumar principal engineer suuchi - toolkit for application

from simple to unmanageable beasts how we scale systems? an

Service

Service

Service Service

Service Service Service

Service Service Service

Service Service Service

Service Service Service

rise of KV stores distributed, replicated, fault-tolerant (optionally sorted) 2006

Distributed (NoSQL) Datastores Service Service Service

Distributed (NoSQL) Datastores Service Service Service Network Latency

data locality for low latency / data intensive applications

Service Service Service Co-locate data to improve performance

Sharded and replicated to improve throughput Service Service Service

Service Service Service Deal with complex distributed system problems at

suuchi github.com/ashwanthkumar/suuchi toolkit for application layer sharding

components - transport uses http/2 with streaming

components - membership static dynamic fault tolerance in case of

components - request routing Consistent hashing and random trees: Distributed

components - request routing Consistent hashing and random trees: Distributed

components - request routing Consistent hashing and random trees: Distributed

components - request routing Consistent hashing and random trees: Distributed

- Peer to Peer system - no single point of

- define a gRPC service using proto2 (or proto3) -

let’s see some code see reference for actual links

suuchi.proto

suuchi.proto

implement proto service

connect using Server abstraction

- HTML Archive System - Handles 1000+ rps - write

idea behind suuchi membership, request routing / sharding 2011 Gizzard

questions? references available at github.com/ashwanthkumar/suuchi-sharding-talk