Slide 1

Slide 1 text

LEARNING TO BUILD DISTRIBUTED SYSTEMS THE HARD WAY @iconara

Slide 2

Slide 2 text

LEARNING TO BUILD DISTRIBUTED SYSTEMS THE HARD WAY BIG DATA @iconara

Slide 3

Slide 3 text

speakerdeck.com/u/iconara (real time!)

Slide 4

Slide 4 text

Theo / @iconara

Slide 5

Slide 5 text

chief architect at BURT

Slide 6

Slide 6 text

let’s make online advertising a great experience

Slide 7

Slide 7 text

No content

Slide 8

Slide 8 text

MAKING THIS

Slide 9

Slide 9 text

INTO THIS

Slide 10

Slide 10 text

HOW HARD CAN IT BE?

Slide 11

Slide 11 text

No content

Slide 12

Slide 12 text

30K REQUESTS PER SECOND more than a billion requests per day, over 1 TB raw data

Slide 13

Slide 13 text

ONE VISIT CAN CHANGE UP TO 100K COUNTERS hundreds of millions of individual counters per day, plus counting uniques and visitor histories

Slide 14

Slide 14 text

IN REAL TIME or near real time, if you want to be pedantic ×

Slide 15

Slide 15 text

HOW HARD CAN IT BE?

Slide 16

Slide 16 text

START WITH TWO OF EVERYTHING going from one to two is the hardest, solve the scaling problem up front

Slide 17

Slide 17 text

START WITH TWO OF EVERYTHING you’ll solve the scaling problem, and need less overcapacity THREE

Slide 18

Slide 18 text

GIVE A LOT OF THOUGHT TO KEYS AND IDS and think about your queries first

Slide 19

Slide 19 text

MEIHO0 JME57Z monotonically increasing, sorts nicely a timestamp something random

Slide 20

Slide 20 text

JME57Z MEIHO0 uniformly distributed, works nicely with sharding something random a timestamp

Slide 21

Slide 21 text

CONSISTENCY IS OVERRATED don’t fear R + W < N

Slide 22

Slide 22 text

PRECOMPUTE ALL THE THINGS your users most likely don’t know what they want, so why let them do ad hoc queries?

Slide 23

Slide 23 text

SEPARATE PROCESSING FROM STORAGE that way you can scale each independently

Slide 24

Slide 24 text

PLAN HOW TO GET RID OF YOUR DATA deleting stuff is harder than you might think × × × × × × ×

Slide 25

Slide 25 text

NoDB keep things streaming ×

Slide 26

Slide 26 text

DIVIDE THE LOAD big data systems are all about routing and partitioning

Slide 27

Slide 27 text

RANDOM when you have no interdependencies between things it’s easy to scale out

Slide 28

Slide 28 text

CONSISTENT when there are interdependencies you need to route using some property of the objects, but make sure you get a uniform distribution

Slide 29

Slide 29 text

NUMEROLOGY

Slide 30

Slide 30 text

12

Slide 31

Slide 31 text

2 | 12 3 | 12 4 | 12 6 | 12

Slide 32

Slide 32 text

8 | 24 5 | 60

Slide 33

Slide 33 text

A DIVERSION ABOUT COUNTING TO 60 the reason why there’s 60 seconds to a minute, and 360 degrees to a circle × ×

Slide 34

Slide 34 text

3 SEGMENTS ON EACH FINGER = 12

Slide 35

Slide 35 text

3 SEGMENTS ON EACH FINGER = 12 FIVE FINGERS ON OTHER HAND = 60

Slide 36

Slide 36 text

12, 60, 120, 360 superior highly composite numbers

Slide 37

Slide 37 text

12, 60, 120, 360 superior highly composite numbers

Slide 38

Slide 38 text

12, 60, 120, 360 superior highly composite numbers

Slide 39

Slide 39 text

12, 60, 120, 360 superior highly composite numbers

Slide 40

Slide 40 text

12, 60, 120, 360 superior highly composite numbers

Slide 41

Slide 41 text

12, 60, 120, 360 superior highly composite numbers

Slide 42

Slide 42 text

12, 60, 120, 360 superior highly composite numbers

Slide 43

Slide 43 text

12, 60, 120, 360 superior highly composite numbers

Slide 44

Slide 44 text

12, 60, 120, 360 superior highly composite numbers

Slide 45

Slide 45 text

12, 60, 120, 360 superior highly composite numbers

Slide 46

Slide 46 text

12, 60, 120, 360 superior highly composite numbers

Slide 47

Slide 47 text

12, 60, 120, 360 superior highly composite numbers

Slide 48

Slide 48 text

use multiples of 12 to scale without always having to double

Slide 49

Slide 49 text

BLAH BLAH BLAH use multiples of 12 to scale without always having to double

Slide 50

Slide 50 text

log2(366) ≈ 31

Slide 51

Slide 51 text

$-$ (ASCII code 36)-----

Slide 52

Slide 52 text

log2(366) ≈ 31

Slide 53

Slide 53 text

log2(366) ≈ 31 six characters 0-9, A-Z can represent 31 bits, which is kind of almost very close to four bytes

Slide 54

Slide 54 text

MEIHO0

Slide 55

Slide 55 text

MEIHO0 a timestamp Time.now.to_i.to_s(36).upcase

Slide 56

Slide 56 text

No content

Slide 57

Slide 57 text

YOU CAN’T SCALE TO REAL TIME and don’t trust code that doesn’t run continuously ×

Slide 58

Slide 58 text

DO YOU REALLY NEED A BACKUP? if you got 3x replication over multiple availability zones, is that backup really worth it?

Slide 59

Slide 59 text

PRODUCTION IS THE ONLY REAL TEST ENVIRONMENT when thousands of things happen every second, new, weird and unforeseen things happen all the time, your tests can only cover the foreseeable =

Slide 60

Slide 60 text

GÖTEBORG, DISTRIBUTED @gbgdistr

Slide 61

Slide 61 text

KTHXBAI @iconara github.com/iconara architecturalatrocities.com burtcorp.com