Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Rolling with HBase

Rolling with HBase

How HBase is used as a key component of AdRoll's data infrastructure

Derek Nelson

April 12, 2013
Tweet

More Decks by Derek Nelson

Other Decks in Technology

Transcript

  1. <advertiser, 2013-04-01, campaign> <advertiser, 2013-04-02, campaign> <advertiser, 2013-04-03, campaign> <advertiser,

    2013-04-05, campaign> … <advertiser, 2013-04-04, campaign> { main:impressions: 45595, main:clicks: 71, main:cost: 60.80 main:conversions: 25 }
  2. Less Regions… is more better 2 GB 2 GB 2

    GB 2 GB Anything  bigger  causes   compac9on  issues  for  us  
  3. Denormalize all the things Key inversion for global rollups <adver9ser,

     day>   <day,  adver9ser>   org.apache.hadoop.hbase.mapreduce.Table{Mapper, Reducer} granular_data   less_granular_data  
  4. Don’t  run  MR  on  Hbase  (motorcycle  pic)   8 cores

    + 7GB RAM - HBase - HDFS = Not a lot of room for MapReduce
  5. <ImmutableBytesWritable, Result> <ImmutableBytesWritable, Result> <ImmutableBytesWritable, Result> <ImmutableBytesWritable, Result> <ImmutableBytesWritable, Result>

    <ImmutableBytesWritable, Result> <ImmutableBytesWritable, Result> SequenceFile in MapReduce cluster’s HDFS Reuse old TableMapper, TableReducer code on SequenceFiles local to MapReduce cluster!
  6. Disaster  recovery  (crater  pic  w/  Shit)   Well, shit. >

    hbase restore –region=preferrably_another_one!
  7. Disaster  recovery  (crater  pic  w/  Shit)   Well, shit. >

    hbase restore –region=preferrably_another_one! Daily backups taken from a separate cluster using emr.hbase.backup.Main –hbase_master=some-other-quorum!
  8. Counting uniques across arbitrary date ranges 8277e0910d750195b448797616e091ad cce492688e30ea1eeaaa637df7e44eed 7815696ecbf1c96e6894b779456d330e 0

    0 1 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1 0 0 0 dc0c4b561ca23d844401eddeba869bc3 cce492688e30ea1eeaaa637df7e44eed 7815696ecbf1c96e6894b779456d330e Day 0 Day 1
  9. Counting uniques across arbitrary date ranges 8277e0910d750195b448797616e091ad cce492688e30ea1eeaaa637df7e44eed 7815696ecbf1c96e6894b779456d330e 0

    0 1 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1 0 0 0 dc0c4b561ca23d844401eddeba869bc3 cce492688e30ea1eeaaa637df7e44eed 7815696ecbf1c96e6894b779456d330e Take a bitwise OR with AggregationClient#sum and a custom CI… Day 0 Day 1
  10. Counting uniques across arbitrary date ranges 8277e0910d750195b448797616e091ad cce492688e30ea1eeaaa637df7e44eed 7815696ecbf1c96e6894b779456d330e 0

    0 1 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 1 0 0 0 dc0c4b561ca23d844401eddeba869bc3 cce492688e30ea1eeaaa637df7e44eed 7815696ecbf1c96e6894b779456d330e Take a bitwise OR with AggregationClient#sum and a custom CI… Day 0 Day 1 4 total uniques