Slide 1

Slide 1 text

LEARNING TO BUILD DISTRIBUTED SYSTEMS THE HARD WAY @iconara

Slide 2

Slide 2 text

speakerdeck.com/u/iconara (real time!)

Slide 3

Slide 3 text

Theo / @iconara

Slide 4

Slide 4 text

Chief Architect at

Slide 5

Slide 5 text

let’s make online advertising a great experience

Slide 6

Slide 6 text

MAKING THIS

Slide 7

Slide 7 text

INTO THIS

Slide 8

Slide 8 text

HOW HARD CAN IT BE?

Slide 9

Slide 9 text

TRACKING AD IMPRESSIONS track page views and all their ads track visibility and send updates on changes track events, track activity, sync cookies, and track visits

Slide 10

Slide 10 text

track page views and all their ads track visibility and send updates on changes track events, track activity, sync cookies, and track visits LOADED VISIBLE HIDDEN VISIBLE LOADED

Slide 11

Slide 11 text

ASSEMBLING SESSIONS assemble ad impressions, page views and visits, to be able to calculate things like total visible duration mix in demographics, revenue, and third-party data

Slide 12

Slide 12 text

assemble ad impressions, page views and visits, to be able to calculate things like total visible duration mix in demographics, revenue, and third-party data WAS LOADED BECAME ACTIVE BECAME VISIBLE WAS HIDDEN BECAME VISIBLE AGAIN A CLICK! { "user_id": "M9L6R5TD0YXK", "session_id": "MAI3QAGNAIYT", "timestamp": 1347896675038, "placement_name": "example", "category": "frontpage", "embed_url": "http://example.com/", "visible_duration": 1340 "browser": "Chrome", "device_type": "computer", "click": true, "ad_dimensions":"980x300" } 3rd PARTY DATA & OTHER GOODIES

Slide 13

Slide 13 text

ANALYTICS precompute metrics, count uniques, build visitor histories for attribution

Slide 14

Slide 14 text

precompute metrics, count uniques, build visitor histories for attribution

Slide 15

Slide 15 text

HOW HARD CAN IT BE?

Slide 16

Slide 16 text

25K REQUESTS PER SECOND ~1 billion requests per day, 1 TB raw data

Slide 17

Slide 17 text

ONE VISIT CAN CHANGE UP TO 100K COUNTERS hundreds of millions of individual counters per day, plus counting uniques and visitor histories

Slide 18

Slide 18 text

IN REAL TIME or near real time, if you want to be pedantic ×

Slide 19

Slide 19 text

START WITH TWO OF EVERYTHING going from one to two is the hardest

Slide 20

Slide 20 text

GIVE A LOT OF THOUGHT TO YOUR KEYS AND IDS it will save you lots of pain

Slide 21

Slide 21 text

MANLO0 JME57Z monotonically increasing, sorts nicely a timestamp something random

Slide 22

Slide 22 text

JME57Z MANLO0 uniformly distributed, works nicely with sharding something random a timestamp

Slide 23

Slide 23 text

PUT BUFFERS BETWEEN LAYERS queues can even out peaks, let you scale layers independently, and let you restart services without loosing data

Slide 24

Slide 24 text

SEPARATE PROCESSING FROM STORAGE that way you can scale each independently

Slide 25

Slide 25 text

PLAN HOW TO GET RID OF YOUR DATA deleting stuff is harder than you might think × × × × × × ×

Slide 26

Slide 26 text

NoDB keep things streaming ×

Slide 27

Slide 27 text

STREAM PARTITIONING

Slide 28

Slide 28 text

RANDOMLY when you have no interdependencies between things it’s easy to scale out (or round robin, it’s basically the same)

Slide 29

Slide 29 text

CONSISTENTLY when there are interdependencies you need to route using some property of the objects, but make sure you get a uniform distribution

Slide 30

Slide 30 text

NUMEROLOGY

Slide 31

Slide 31 text

12

Slide 32

Slide 32 text

2 | 12 3 | 12 4 | 12 6 | 12

Slide 33

Slide 33 text

8 | 24 5 | 60

Slide 34

Slide 34 text

12, 60, 120, 360 superior highly composite numbers

Slide 35

Slide 35 text

12, 60, 120, 360 superior highly composite numbers

Slide 36

Slide 36 text

12, 60, 120, 360 superior highly composite numbers

Slide 37

Slide 37 text

12, 60, 120, 360 superior highly composite numbers

Slide 38

Slide 38 text

12, 60, 120, 360 superior highly composite numbers

Slide 39

Slide 39 text

12, 60, 120, 360 superior highly composite numbers

Slide 40

Slide 40 text

12, 60, 120, 360 superior highly composite numbers

Slide 41

Slide 41 text

12, 60, 120, 360 superior highly composite numbers

Slide 42

Slide 42 text

for maximal flexibility partition with multiples of 12

Slide 43

Slide 43 text

for maximal flexibility partition with multiples of 12

Slide 44

Slide 44 text

A SHORT DIVERSION ABOUT COUNTING TO 60 the reason why there’s 60 seconds to a minute, and 360 degrees to a circle

Slide 45

Slide 45 text

3 SEGMENTS ON EACH FINGER = 12

Slide 46

Slide 46 text

3 SEGMENTS ON EACH FINGER = 12 FIVE FINGERS ON OTHER HAND = 60

Slide 47

Slide 47 text

log2(366) ≈ 31

Slide 48

Slide 48 text

$-$ (ASCII code 36)-----

Slide 49

Slide 49 text

log2(366) ≈ 31

Slide 50

Slide 50 text

log2(366) ≈ 31 six characters 0-9, A-Z can represent 31 bits, which is kind of almost very close to four bytes

Slide 51

Slide 51 text

MANLO0

Slide 52

Slide 52 text

MANLO0 a timestamp Time.now.to_i.to_s(36).upcase

Slide 53

Slide 53 text

DO YOU REALLY NEED A BACKUP? if you got 3x replication over multiple availability zones, is that backup really worth it?

Slide 54

Slide 54 text

PRODUCTION IS THE ONLY REAL TEST ENVIRONMENT when thousands of things happen every second, new, weird and unforeseen things happen all the time, your tests can only cover the foreseeable =

Slide 55

Slide 55 text

KTHXBAI @iconara github.com/iconara architecturalatrocities.com burtcorp.com

Slide 56

Slide 56 text

COME TO SWEDEN IN MARCH AND TALK ABOUT BIG DATA scandevconf.se/2013/call-for-proposals

Slide 57

Slide 57 text

IDEMPOTENCE

Slide 58

Slide 58 text

f(f(x)) = f(x) doing something again doesn’t change the outcome

Slide 59

Slide 59 text

IDEMPOTENCE if you don’t have to worry about things accidentally happening twice, everything becomes much simpler

Slide 60

Slide 60 text

COUNTING UNIQUES when adding to a set it doesn’t matter how many times you do it, the end result is the same

Slide 61

Slide 61 text

INC X VS SET X increments are not idempotent, and very scary, if you can avoid non-idempotent operations, try

Slide 62

Slide 62 text

KTHXBAI @iconara github.com/iconara architecturalatrocities.com burtcorp.com