A distributed systems challenge:
Stripe CTF
Bogdan Gâza
Big Data #5 Bogdan Gâza
Slide 2
Slide 2 text
Big Data #5 Bogdan Gâza
$whoami
Slide 3
Slide 3 text
Big Data #5 Bogdan Gâza
Slide 4
Slide 4 text
Big Data #5 Bogdan Gâza
Level 0
The mysterious program
Slide 5
Slide 5 text
Big Data #5 Bogdan Gâza
Slide 6
Slide 6 text
Big Data #5 Bogdan Gâza
Different hash: open addressing
v
hash(V) hash(X)
x
Slide 7
Slide 7 text
Big Data #5 Bogdan Gâza
Bloom filters
N hash functions
k1
k2
k3
kn
probabilistic
fals positives / no fals negatives
Slide 8
Slide 8 text
Big Data #5 Bogdan Gâza
Level 1
Gitcoins
Slide 9
Slide 9 text
Big Data #5 Bogdan Gâza
Level 1
Gitcoins
Slide 10
Slide 10 text
Big Data #5 Bogdan Gâza
Slide 11
Slide 11 text
Big Data #5 Bogdan Gâza
SHA1 - ridiculous parallel
bash: 400 Hash/s
stripe go miners: 1.9 MHash/s
gpu: 1-2 GHash/s
Slide 12
Slide 12 text
Big Data #5 Bogdan Gâza
Level 2
DDos Defense
Slide 13
Slide 13 text
Big Data #5 Bogdan Gâza
Proxy
Node
Node
Node
Slide 14
Slide 14 text
Big Data #5 Bogdan Gâza
Load balancing algorithm
4 req / ip
request requests < 25 ms apart
Solution
Slide 15
Slide 15 text
Big Data #5 Bogdan Gâza
Level 3
Instant code search
Slide 16
Slide 16 text
Big Data #5 Bogdan Gâza
LB
Indexer
Indexer
Indexer
4 minutes to index
4 nodes with 500MB of RAM
Scala
Latency based scoring
To pass the level < 0.15s / query
Around 100M words
Arbitrary substring
Slide 17
Slide 17 text
Big Data #5 Bogdan Gâza
Twitter Stack: finagle / twitter server
Problem
Slide 18
Slide 18 text
Big Data #5 Bogdan Gâza
Slide 19
Slide 19 text
Big Data #5 Bogdan Gâza
1st approach: inverted index vs substring search
Solution
Trie vs substring search
marisa / patricia Trie / radix tree / suffix tree
vs substring search
DAWG vs substring search
Sharding
Big Data #5 Bogdan Gâza
Consensus
Reliability in the presence of faulty processes
examples:
who can commit to the DB
who is the leader
state machine replication
Slide 23
Slide 23 text
Big Data #5 Bogdan Gâza
ZAB - High performance broadcast in primary
backup systems - 2011
RAFT - Understandable consensus algorithm - 2013
Paxos - Part-time parliament - Laslie Lamport ‘90
Consensus
Slide 24
Slide 24 text
Big Data #5 Bogdan Gâza
ZAB - High performance broadcast in primary
backup systems - 2011
RAFT - Understandable consensus algorithm - 2013
Paxos - Part-time parliament - Laslie Lamport ‘90
Consensus