Slide 1

Slide 1 text

Derek Nelson | Software Engineer Rolling with

Slide 2

Slide 2 text

Pixel “fires”

Slide 3

Slide 3 text

Pixel “fires” Serve ad?

Slide 4

Slide 4 text

Pixel “fires” Serve ad? Ad served

Slide 5

Slide 5 text

No content

Slide 6

Slide 6 text

Dash  data  

Slide 7

Slide 7 text

Pixel impressions, clicks, impressions…

Slide 8

Slide 8 text

7/2011 ~50GB/day

Slide 9

Slide 9 text

7/2011 ~50GB/day 4/2013 ~5TB/day ~10 billion events per day

Slide 10

Slide 10 text

Pixel impressions, clicks, impressions…

Slide 11

Slide 11 text

? Pixel impressions, clicks, impressions…

Slide 12

Slide 12 text

No content

Slide 13

Slide 13 text

Overkill Driver issues ✗  

Slide 14

Slide 14 text

Overkill Driver issues ✗   ✗   ByteOrderedPartitioner L 2B column limit

Slide 15

Slide 15 text

Overkill Driver issues ✗   ✗   ByteOrderedPartitioner L 2B column limit ✔  

Slide 16

Slide 16 text

Pixel impressions, clicks, impressions… 6 – 8 nodes c1.xlarges AWS HBase 0.92

Slide 17

Slide 17 text

… { main:impressions: 45595, main:clicks: 71, main:cost: 60.80 main:conversions: 25 }

Slide 18

Slide 18 text

LiquidAds/Products  

Slide 19

Slide 19 text

LiquidAds/Products  

Slide 20

Slide 20 text

main:image_path main:description main:price

Slide 21

Slide 21 text

Similarity: 0.27 Similarity: 0.73 product0 product1 product2

Slide 22

Slide 22 text

Less Regions… is more better

Slide 23

Slide 23 text

Less Regions… is more better 2 GB 2 GB 2 GB 2 GB Anything  bigger  causes   compac9on  issues  for  us  

Slide 24

Slide 24 text

Denormalize all the things org.apache.hadoop.hbase.mapreduce.Table{Mapper, Reducer} granular_data  

Slide 25

Slide 25 text

Denormalize all the things org.apache.hadoop.hbase.mapreduce.Table{Mapper, Reducer} granular_data   less_granular_data  

Slide 26

Slide 26 text

Denormalize all the things Key inversion for global rollups     org.apache.hadoop.hbase.mapreduce.Table{Mapper, Reducer} granular_data   less_granular_data  

Slide 27

Slide 27 text

Don’t  run  MR  on  Hbase  (motorcycle  pic)  

Slide 28

Slide 28 text

Don’t  run  MR  on  Hbase  (motorcycle  pic)   8 cores + 7GB RAM - HBase - HDFS = Not a lot of room for MapReduce

Slide 29

Slide 29 text

Pixel impressions, clicks, impressions…

Slide 30

Slide 30 text

No content

Slide 31

Slide 31 text

SequenceFile in MapReduce cluster’s HDFS Reuse old TableMapper, TableReducer code on SequenceFiles local to MapReduce cluster!

Slide 32

Slide 32 text

Redundant Thrift endpoints Redundant Thrift endpoints Redundant Thrift endpoints Redundant Thrift endpoints

Slide 33

Slide 33 text

Disaster  recovery  (crater  pic  w/  Shit)   Well, shit.

Slide 34

Slide 34 text

Disaster  recovery  (crater  pic  w/  Shit)   Well, shit. > hbase restore –region=preferrably_another_one!

Slide 35

Slide 35 text

Disaster  recovery  (crater  pic  w/  Shit)   Well, shit. > hbase restore –region=preferrably_another_one! Daily backups taken from a separate cluster using emr.hbase.backup.Main –hbase_master=some-other-quorum!

Slide 36

Slide 36 text

Pixel impressions, clicks, impressions… < 1s Storm

Slide 37

Slide 37 text

Pixel impressions, clicks, impressions… Storm Everything  is  a  Counter   {DEFERRED_LOG_FLUSH => 'true’}

Slide 38

Slide 38 text

Counting uniques across arbitrary date ranges 8277e0910d750195b448797616e091ad cce492688e30ea1eeaaa637df7e44eed 7815696ecbf1c96e6894b779456d330e 0 0 1 0 0 0 1 0 0 0 0 0 1 0 0 0 Day 0

Slide 39

Slide 39 text

Counting uniques across arbitrary date ranges 8277e0910d750195b448797616e091ad cce492688e30ea1eeaaa637df7e44eed 7815696ecbf1c96e6894b779456d330e 0 0 1 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1 0 0 0 dc0c4b561ca23d844401eddeba869bc3 cce492688e30ea1eeaaa637df7e44eed 7815696ecbf1c96e6894b779456d330e Day 0 Day 1

Slide 40

Slide 40 text

Counting uniques across arbitrary date ranges 8277e0910d750195b448797616e091ad cce492688e30ea1eeaaa637df7e44eed 7815696ecbf1c96e6894b779456d330e 0 0 1 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1 0 0 0 dc0c4b561ca23d844401eddeba869bc3 cce492688e30ea1eeaaa637df7e44eed 7815696ecbf1c96e6894b779456d330e Take a bitwise OR with AggregationClient#sum and a custom CI… Day 0 Day 1

Slide 41

Slide 41 text

Counting uniques across arbitrary date ranges 8277e0910d750195b448797616e091ad cce492688e30ea1eeaaa637df7e44eed 7815696ecbf1c96e6894b779456d330e 0 0 1 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 1 0 0 0 dc0c4b561ca23d844401eddeba869bc3 cce492688e30ea1eeaaa637df7e44eed 7815696ecbf1c96e6894b779456d330e Take a bitwise OR with AggregationClient#sum and a custom CI… Day 0 Day 1 4 total uniques

Slide 42

Slide 42 text

? [email protected] Thanks! We’re hiring! [email protected] Derek Nelson