Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Blackbird Collections: In-situ Stream Processi...

Sponsored · Your Podcast. Everywhere. Effortlessly. Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.

Blackbird Collections: In-situ Stream Processing in HBase

Blackbird is a large-scale object store built at Rocket Fuel, which stores 100+ TB of data and provides real time access to 10 billion+ objects in a 2-3 milliseconds at a rate of 1 million+ times per second. In this talk (an update from HBaseCon 2014), we will describe Blackbird's comprehensive collections API and various examples of how it can be used to model collections like sets, maps, and aggregates on these collections like counters, etc. We will also illustrate the flexibility and power of the API by modeling custom collection types that are unique to the Rocket Fuel context.

Avatar for Ishan Chhabra

Ishan Chhabra

May 07, 2015
Tweet

More Decks by Ishan Chhabra

Other Decks in Technology

Transcript

  1. Problem: Concurrency Bug Read existing 100 elements Event 1 Write

    101 elements 100 elements Read existing 100 elements Write 101 elements Event 2 101 elements
  2. Problem: Asymptotic network usage DC 1 DC 2 Write 1:

    101 elements Write 2: 48 elements
  3. Append Only Collections: Be sympathetic to HBase Internals Trivial example:

    Lists But what about Sets, Maps, Counters, etc. and domain specific collections?
  4. Working example: SegmentSet Keep the latest entry only for a

    segment Atmost 1000 of the most recently updated segments
  5. Blackbird Collections: Logical Model Collection of entries Can only add

    elements to it Apply series of functions during read to enforce properties
  6. Blackbird Collections: Logical Model For every collection: Define the structure

    Define the series of functions f1 , f2 , f3 … to apply during read
  7. Combined Column: 100 entries views:4587 1 entry views:2398 Logical Collection

    2 entries views:6798 1 entry views:2983 Step 1: Write appends to separate columns
  8. Step 2: Apply the functions during reads Combined Column: 100

    entries 1 entry 2 entries 1 entry 104 entries 92 entries f1 , f2 , …
  9. Step 3: Normalization 2 kinds of runs: nightly and weekly

    Nightly run only looks at subset of data (data changed that day) Weekly run looks at all the data
  10. Step 3: Normalization Heavily optimized: < 1h for nightly run

    and 2-3h for weekly run (~50TB of data) Made fast by MR over snapshots and bulkloads No impact on live read performance
  11. Blackbird Collections: Updated Logical Model Collection of entries Can only

    add elements to it Apply series of functions during reads Apply series of functions during daily normalization Apply series of functions during weekly normalization
  12. Another Example: Transient Counters Be able to increment/decrement counts Remove

    entries if timestamp + time to live < current time Keep the latest 1000 entries only
  13. Another Example: Transient Counters aggregate() expire() During read: aggregate() expire()

    limit_to_1000() Daily Normalization: aggregate() expire() limit_to_1000() Weekly Normalization: