Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Building a search engine - in less than 15 minutes

Building a search engine - in less than 15 minutes

In this talk we dive into two popular compression algorithms in search engines.

Ward Bekker

July 12, 2012
Tweet

More Decks by Ward Bekker

Other Decks in Technology

Transcript

  1. Me • Ward Bekker • TTY.nl • Product owner •

    @wardbekker • https://github.com/wardbekker maandag 16 juli 12
  2. Elias ɣ Encoding • 1, 2, 3 • 00000000 00000001,

    00000000 00000010, 00000000 00000011 • 6 bytes = 48 bits maandag 16 juli 12
  3. Elias ɣ Encoding n => .. => length + 1

    value 1 => 1 => 1 2 => 10 => 0 10 3 => 11 => 0 11 4 => 100 => 00 100 5 => 101 => 00 101 maandag 16 juli 12
  4. Elias ɣ Encoding • 0000000000000001, 0000000000000010, 0000000000000011 • 48 bits

    • 1 010 011 • 7 bits • compression ratio: 0.15 maandag 16 juli 12
  5. Elias ɣ Encoding • erts_debug:size([1,2,3,4,5]) => • 10 words =

    10 * 8 bytes = 640 bits • erlang:byte_size(binary:encode_unsigned(2#10100110010000101)). • 3 bytes = 24 bits • compression ratio: 0.04 maandag 16 juli 12
  6. Delta Gap Compression • [3, 7, 8, 9, 12, 13,

    14, 15, 16] • 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 • 3 - 1 - 3 - 3 - 2 - 5 • compression ratio: 0.67 maandag 16 juli 12
  7. Combined • 64-bit Erlang • [3, 7, 8, 9, 12,

    13, 14, 15, 16] • [3, 1, 3 , 3, 2, 5] • 011101101100101 • 1152 vs 16 bits • compression ratio: 0.014 maandag 16 juli 12