suuchi - distributed data systems toolkit

Slide 1

Slide 1 text

suuchi - distributed data systems toolkit bit.ly/suuchi-toolkit

Slide 2

Slide 2 text

blank yep, it’s intentional

Slide 3

Slide 3 text

ashwanth kumar @_ashwanthkumar ashwanthkumar.in principal engineer

Slide 4

Slide 4 text

data system paradigms things you always knew but never heard

Slide 5

Slide 5 text

data shipping function shipping

Slide 6

Slide 6 text

sum of numbers on data shipping paradigm

Slide 7

Slide 7 text

data shipping paradigm select col from table;

Slide 8

Slide 8 text

select col from table; data shipping paradigm <>

Slide 9

Slide 9 text

select col from table; data shipping paradigm <>

Slide 10

Slide 10 text

sum of numbers on function shipping paradigm

Slide 11

Slide 11 text

function shipping paradigm select sum(col) from table;

Slide 12

Slide 12 text

function shipping paradigm select sum(col) from table;

Slide 13

Slide 13 text

<> function shipping paradigm select sum(col) from table;

Slide 14

Slide 14 text

data locality for low latency / data intensive applications

Slide 15

Slide 15 text

- traditional 3 tier applications - state is maintained outside the app - usually the dbs become the bottleneck - resort to pre-computes for performance increasing complexity data shipping function shipping - data locality - low latency - high performance - low network transfer - modern big-data compute systems - Hadoop MR - Spark - Storm

Slide 16

Slide 16 text

recursive reduction aggregations in distributed systems

Slide 17

Slide 17 text

recursive reduction select sum(col) from table;

Slide 18

Slide 18 text

recursive reduction select sum(col) from table;

Slide 19

Slide 19 text

recursive reduction select sum(col) from table; <>

Slide 20

Slide 20 text

recursive reduction - sum / multiplication / custom aggregation - (sorted) top-K elements - operations on a graph - eg. link reach on twitter graph - any operation that is both associative and commutative

Slide 21

Slide 21 text

Modelled after Big Table Built for key based lookup Later added CoProc - Limited capability - External dependencies like Apache Phoenix real world examples

Slide 22

Slide 22 text

- Hive / Hadoop MR / Spark / Storm - Provides generic computing framework with UDF support - Datastores provides Hadoop integrations - Optimized for Batch processing but hardly for serving online content - Lot of operational overhead - And still no data locality :( real world aggregations

Slide 23

Slide 23 text

suuchi toolkit for building distributed function shipping applications github.com/ashwanthkumar/suuchi

Slide 24

Slide 24 text

blocks

Slide 25

Slide 25 text

components - transport uses http/2 with streaming

Slide 26

Slide 26 text

components - membership static dynamic fault tolerance in case of node/process failure scaling up/down needs downtime of the system

Slide 27

Slide 27 text

components - dynamic membership raft swim consistency vs availability

Slide 28

Slide 28 text

components - request routing Consistent hashing and random trees: Distributed caching protocols for relieving hot spots on the World Wide Web node 2 node 1 node 3 node 4

Slide 29

Slide 29 text

components - request routing Consistent hashing and random trees: Distributed caching protocols for relieving hot spots on the World Wide Web node 2 node 1 node 3 node 4

Slide 30

Slide 30 text

components - request routing Consistent hashing and random trees: Distributed caching protocols for relieving hot spots on the World Wide Web node 2 node 1 node 3 node 4

Slide 31

Slide 31 text

components - request routing Consistent hashing and random trees: Distributed caching protocols for relieving hot spots on the World Wide Web node 2 node 1 node 3 node 4

Slide 32

Slide 32 text

components - request routing Consistent hashing and random trees: Distributed caching protocols for relieving hot spots on the World Wide Web node 2 node 3 node 4

Slide 33

Slide 33 text

components - replication Res Req

Slide 34

Slide 34 text

components - store store agnostic ops - Get - Put / Remove WIP - Scan - Batch

Slide 35

Slide 35 text

components - store Embedded Fast persistent KV store Optimized for SSDs suuchi-rocksdb VersionedStore ShardedStore

Slide 36

Slide 36 text

- define a gRPC service using proto2 (or proto3) - generate the stubs in java / scala - implement the services - connect them together using Suuchi - Server how?

Slide 37

Slide 37 text

see some code see notes section for links

Slide 38

Slide 38 text

- Used to build Finder - Internal HTML archive store - Handles 1000+ rps - write heavy system - Stores 120TB of data across 10 nodes suuchi @indix

Slide 39

Slide 39 text

questions? references available at github.com/ashwanthkumar/suuchi-talk