Slide 1

Slide 1 text

A distributed systems challenge: Stripe CTF Bogdan Gâza Big Data #5 Bogdan Gâza

Slide 2

Slide 2 text

Big Data #5 Bogdan Gâza $whoami

Slide 3

Slide 3 text

Big Data #5 Bogdan Gâza

Slide 4

Slide 4 text

Big Data #5 Bogdan Gâza Level 0 The mysterious program

Slide 5

Slide 5 text

Big Data #5 Bogdan Gâza

Slide 6

Slide 6 text

Big Data #5 Bogdan Gâza Different hash: open addressing v hash(V) hash(X) x

Slide 7

Slide 7 text

Big Data #5 Bogdan Gâza Bloom filters N hash functions k1 k2 k3 kn probabilistic fals positives / no fals negatives

Slide 8

Slide 8 text

Big Data #5 Bogdan Gâza Level 1 Gitcoins

Slide 9

Slide 9 text

Big Data #5 Bogdan Gâza Level 1 Gitcoins

Slide 10

Slide 10 text

Big Data #5 Bogdan Gâza

Slide 11

Slide 11 text

Big Data #5 Bogdan Gâza SHA1 - ridiculous parallel bash: 400 Hash/s stripe go miners: 1.9 MHash/s gpu: 1-2 GHash/s

Slide 12

Slide 12 text

Big Data #5 Bogdan Gâza Level 2 DDos Defense

Slide 13

Slide 13 text

Big Data #5 Bogdan Gâza Proxy Node Node Node

Slide 14

Slide 14 text

Big Data #5 Bogdan Gâza Load balancing algorithm 4 req / ip request requests < 25 ms apart Solution

Slide 15

Slide 15 text

Big Data #5 Bogdan Gâza Level 3 Instant code search

Slide 16

Slide 16 text

Big Data #5 Bogdan Gâza LB Indexer Indexer Indexer 4 minutes to index 4 nodes with 500MB of RAM Scala Latency based scoring To pass the level < 0.15s / query Around 100M words Arbitrary substring

Slide 17

Slide 17 text

Big Data #5 Bogdan Gâza Twitter Stack: finagle / twitter server Problem

Slide 18

Slide 18 text

Big Data #5 Bogdan Gâza

Slide 19

Slide 19 text

Big Data #5 Bogdan Gâza 1st approach: inverted index vs substring search Solution Trie vs substring search marisa / patricia Trie / radix tree / suffix tree vs substring search DAWG vs substring search Sharding

Slide 20

Slide 20 text

Big Data #5 Bogdan Gâza Level 4 SQLCluster

Slide 21

Slide 21 text

Big Data #5 Bogdan Gâza SQLite SQLite SQLite SQLite SQLite SQLite Unreliable network ! octopus octopus simulates: netsplit / lagsplit / SPOF

Slide 22

Slide 22 text

Big Data #5 Bogdan Gâza Consensus Reliability in the presence of faulty processes examples: who can commit to the DB who is the leader state machine replication

Slide 23

Slide 23 text

Big Data #5 Bogdan Gâza ZAB - High performance broadcast in primary backup systems - 2011 RAFT - Understandable consensus algorithm - 2013 Paxos - Part-time parliament - Laslie Lamport ‘90 Consensus

Slide 24

Slide 24 text

Big Data #5 Bogdan Gâza ZAB - High performance broadcast in primary backup systems - 2011 RAFT - Understandable consensus algorithm - 2013 Paxos - Part-time parliament - Laslie Lamport ‘90 Consensus

Slide 25

Slide 25 text

Big Data #5 Bogdan Gâza

Slide 26

Slide 26 text

Big Data #5 Bogdan Gâza

Slide 27

Slide 27 text

Big Data #5 Bogdan Gâza

Slide 28

Slide 28 text

Big Data #5 Bogdan Gâza Thanks!