Link
Embed
Share
Beginning
This slide
Copy link URL
Copy link URL
Copy iframe embed code
Copy iframe embed code
Copy javascript embed code
Copy javascript embed code
Share
Tweet
Share
Tweet
Slide 1
Slide 1 text
Derek Nelson | Software Engineer Rolling with
Slide 2
Slide 2 text
Pixel “fires”
Slide 3
Slide 3 text
Pixel “fires” Serve ad?
Slide 4
Slide 4 text
Pixel “fires” Serve ad? Ad served
Slide 5
Slide 5 text
No content
Slide 6
Slide 6 text
Dash data
Slide 7
Slide 7 text
Pixel impressions, clicks, impressions…
Slide 8
Slide 8 text
7/2011 ~50GB/day
Slide 9
Slide 9 text
7/2011 ~50GB/day 4/2013 ~5TB/day ~10 billion events per day
Slide 10
Slide 10 text
Pixel impressions, clicks, impressions…
Slide 11
Slide 11 text
? Pixel impressions, clicks, impressions…
Slide 12
Slide 12 text
No content
Slide 13
Slide 13 text
Overkill Driver issues ✗
Slide 14
Slide 14 text
Overkill Driver issues ✗ ✗ ByteOrderedPartitioner L 2B column limit
Slide 15
Slide 15 text
Overkill Driver issues ✗ ✗ ByteOrderedPartitioner L 2B column limit ✔
Slide 16
Slide 16 text
Pixel impressions, clicks, impressions… 6 – 8 nodes c1.xlarges AWS HBase 0.92
Slide 17
Slide 17 text
… { main:impressions: 45595, main:clicks: 71, main:cost: 60.80 main:conversions: 25 }
Slide 18
Slide 18 text
LiquidAds/Products
Slide 19
Slide 19 text
LiquidAds/Products
Slide 20
Slide 20 text
main:image_path main:description main:price
Slide 21
Slide 21 text
Similarity: 0.27 Similarity: 0.73 product0 product1 product2
Slide 22
Slide 22 text
Less Regions… is more better
Slide 23
Slide 23 text
Less Regions… is more better 2 GB 2 GB 2 GB 2 GB Anything bigger causes compac9on issues for us
Slide 24
Slide 24 text
Denormalize all the things org.apache.hadoop.hbase.mapreduce.Table{Mapper, Reducer} granular_data
Slide 25
Slide 25 text
Denormalize all the things org.apache.hadoop.hbase.mapreduce.Table{Mapper, Reducer} granular_data less_granular_data
Slide 26
Slide 26 text
Denormalize all the things Key inversion for global rollups org.apache.hadoop.hbase.mapreduce.Table{Mapper, Reducer} granular_data less_granular_data
Slide 27
Slide 27 text
Don’t run MR on Hbase (motorcycle pic)
Slide 28
Slide 28 text
Don’t run MR on Hbase (motorcycle pic) 8 cores + 7GB RAM - HBase - HDFS = Not a lot of room for MapReduce
Slide 29
Slide 29 text
Pixel impressions, clicks, impressions…
Slide 30
Slide 30 text
No content
Slide 31
Slide 31 text
SequenceFile in MapReduce cluster’s HDFS Reuse old TableMapper, TableReducer code on SequenceFiles local to MapReduce cluster!
Slide 32
Slide 32 text
Redundant Thrift endpoints Redundant Thrift endpoints Redundant Thrift endpoints Redundant Thrift endpoints
Slide 33
Slide 33 text
Disaster recovery (crater pic w/ Shit) Well, shit.
Slide 34
Slide 34 text
Disaster recovery (crater pic w/ Shit) Well, shit. > hbase restore –region=preferrably_another_one!
Slide 35
Slide 35 text
Disaster recovery (crater pic w/ Shit) Well, shit. > hbase restore –region=preferrably_another_one! Daily backups taken from a separate cluster using emr.hbase.backup.Main –hbase_master=some-other-quorum!
Slide 36
Slide 36 text
Pixel impressions, clicks, impressions… < 1s Storm
Slide 37
Slide 37 text
Pixel impressions, clicks, impressions… Storm Everything is a Counter {DEFERRED_LOG_FLUSH => 'true’}
Slide 38
Slide 38 text
Counting uniques across arbitrary date ranges 8277e0910d750195b448797616e091ad cce492688e30ea1eeaaa637df7e44eed 7815696ecbf1c96e6894b779456d330e 0 0 1 0 0 0 1 0 0 0 0 0 1 0 0 0 Day 0
Slide 39
Slide 39 text
Counting uniques across arbitrary date ranges 8277e0910d750195b448797616e091ad cce492688e30ea1eeaaa637df7e44eed 7815696ecbf1c96e6894b779456d330e 0 0 1 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1 0 0 0 dc0c4b561ca23d844401eddeba869bc3 cce492688e30ea1eeaaa637df7e44eed 7815696ecbf1c96e6894b779456d330e Day 0 Day 1
Slide 40
Slide 40 text
Counting uniques across arbitrary date ranges 8277e0910d750195b448797616e091ad cce492688e30ea1eeaaa637df7e44eed 7815696ecbf1c96e6894b779456d330e 0 0 1 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1 0 0 0 dc0c4b561ca23d844401eddeba869bc3 cce492688e30ea1eeaaa637df7e44eed 7815696ecbf1c96e6894b779456d330e Take a bitwise OR with AggregationClient#sum and a custom CI… Day 0 Day 1
Slide 41
Slide 41 text
Counting uniques across arbitrary date ranges 8277e0910d750195b448797616e091ad cce492688e30ea1eeaaa637df7e44eed 7815696ecbf1c96e6894b779456d330e 0 0 1 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 1 0 0 0 dc0c4b561ca23d844401eddeba869bc3 cce492688e30ea1eeaaa637df7e44eed 7815696ecbf1c96e6894b779456d330e Take a bitwise OR with AggregationClient#sum and a custom CI… Day 0 Day 1 4 total uniques
Slide 42
Slide 42 text
?
[email protected]
Thanks! We’re hiring!
[email protected]
Derek Nelson