Rolling with HBase

Rolling with HBase

How HBase is used as a key component of AdRoll's data infrastructure

Dab13937f819208384a4ddb33ed56a58?s=128

Derek Nelson

April 12, 2013
Tweet

Transcript

  1. Derek Nelson | Software Engineer Rolling with

  2. Pixel “fires”

  3. Pixel “fires” Serve ad?

  4. Pixel “fires” Serve ad? Ad served

  5. None
  6. Dash  data  

  7. Pixel impressions, clicks, impressions…

  8. 7/2011 ~50GB/day

  9. 7/2011 ~50GB/day 4/2013 ~5TB/day ~10 billion events per day

  10. Pixel impressions, clicks, impressions…

  11. ? Pixel impressions, clicks, impressions…

  12. None
  13. Overkill Driver issues ✗  

  14. Overkill Driver issues ✗   ✗   ByteOrderedPartitioner L 2B

    column limit
  15. Overkill Driver issues ✗   ✗   ByteOrderedPartitioner L 2B

    column limit ✔  
  16. Pixel impressions, clicks, impressions… 6 – 8 nodes c1.xlarges AWS

    HBase 0.92
  17. <advertiser, 2013-04-01, campaign> <advertiser, 2013-04-02, campaign> <advertiser, 2013-04-03, campaign> <advertiser,

    2013-04-05, campaign> … <advertiser, 2013-04-04, campaign> { main:impressions: 45595, main:clicks: 71, main:cost: 60.80 main:conversions: 25 }
  18. LiquidAds/Products  

  19. LiquidAds/Products  

  20. <advertiser, product> main:image_path main:description main:price

  21. Similarity: 0.27 Similarity: 0.73 <product0, product1, algorithm> product0 product1 product2

    <product0, product1, algorithm>
  22. Less Regions… is more better

  23. Less Regions… is more better 2 GB 2 GB 2

    GB 2 GB Anything  bigger  causes   compac9on  issues  for  us  
  24. Denormalize all the things org.apache.hadoop.hbase.mapreduce.Table{Mapper, Reducer} granular_data  

  25. Denormalize all the things org.apache.hadoop.hbase.mapreduce.Table{Mapper, Reducer} granular_data   less_granular_data  

  26. Denormalize all the things Key inversion for global rollups <adver9ser,

     day>   <day,  adver9ser>   org.apache.hadoop.hbase.mapreduce.Table{Mapper, Reducer} granular_data   less_granular_data  
  27. Don’t  run  MR  on  Hbase  (motorcycle  pic)  

  28. Don’t  run  MR  on  Hbase  (motorcycle  pic)   8 cores

    + 7GB RAM - HBase - HDFS = Not a lot of room for MapReduce
  29. Pixel impressions, clicks, impressions…

  30. None
  31. <ImmutableBytesWritable, Result> <ImmutableBytesWritable, Result> <ImmutableBytesWritable, Result> <ImmutableBytesWritable, Result> <ImmutableBytesWritable, Result>

    <ImmutableBytesWritable, Result> <ImmutableBytesWritable, Result> SequenceFile in MapReduce cluster’s HDFS Reuse old TableMapper, TableReducer code on SequenceFiles local to MapReduce cluster!
  32. Redundant Thrift endpoints Redundant Thrift endpoints Redundant Thrift endpoints Redundant

    Thrift endpoints
  33. Disaster  recovery  (crater  pic  w/  Shit)   Well, shit.

  34. Disaster  recovery  (crater  pic  w/  Shit)   Well, shit. >

    hbase restore –region=preferrably_another_one!
  35. Disaster  recovery  (crater  pic  w/  Shit)   Well, shit. >

    hbase restore –region=preferrably_another_one! Daily backups taken from a separate cluster using emr.hbase.backup.Main –hbase_master=some-other-quorum!
  36. Pixel impressions, clicks, impressions… < 1s Storm

  37. Pixel impressions, clicks, impressions… Storm Everything  is  a  Counter  

    {DEFERRED_LOG_FLUSH => 'true’}
  38. Counting uniques across arbitrary date ranges 8277e0910d750195b448797616e091ad cce492688e30ea1eeaaa637df7e44eed 7815696ecbf1c96e6894b779456d330e 0

    0 1 0 0 0 1 0 0 0 0 0 1 0 0 0 Day 0
  39. Counting uniques across arbitrary date ranges 8277e0910d750195b448797616e091ad cce492688e30ea1eeaaa637df7e44eed 7815696ecbf1c96e6894b779456d330e 0

    0 1 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1 0 0 0 dc0c4b561ca23d844401eddeba869bc3 cce492688e30ea1eeaaa637df7e44eed 7815696ecbf1c96e6894b779456d330e Day 0 Day 1
  40. Counting uniques across arbitrary date ranges 8277e0910d750195b448797616e091ad cce492688e30ea1eeaaa637df7e44eed 7815696ecbf1c96e6894b779456d330e 0

    0 1 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1 0 0 0 dc0c4b561ca23d844401eddeba869bc3 cce492688e30ea1eeaaa637df7e44eed 7815696ecbf1c96e6894b779456d330e Take a bitwise OR with AggregationClient#sum and a custom CI… Day 0 Day 1
  41. Counting uniques across arbitrary date ranges 8277e0910d750195b448797616e091ad cce492688e30ea1eeaaa637df7e44eed 7815696ecbf1c96e6894b779456d330e 0

    0 1 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 1 0 0 0 dc0c4b561ca23d844401eddeba869bc3 cce492688e30ea1eeaaa637df7e44eed 7815696ecbf1c96e6894b779456d330e Take a bitwise OR with AggregationClient#sum and a custom CI… Day 0 Day 1 4 total uniques
  42. ? jobs.engineers@adroll.com Thanks! We’re hiring! derek@adroll.com Derek Nelson