Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Building a search engine - in less than 15 minutes
Search
Ward Bekker
July 12, 2012
Technology
3
130
Building a search engine - in less than 15 minutes
In this talk we dive into two popular compression algorithms in search engines.
Ward Bekker
July 12, 2012
Tweet
Share
More Decks by Ward Bekker
See All by Ward Bekker
Automated testing with Erlang - these go to eleven -
wardbekker
3
270
Erltricity
wardbekker
1
72
Other Decks in Technology
See All in Technology
元インフラエンジニアに成る / Human Resources to Human Relations
bobtani
2
710
巨大なテーブルのテーブル定義を無停止で安全に誰でも変更できるようにする / Table-definitions-for-huge-tables-can-be-modified-by-anyone-safely-and-non-disruptively
freee
1
720
"好き"との生活/Regularly update profile with GitHub Actions
judeeeee
0
150
TransitGatewayの基礎
toru_kubota
0
230
〜小さく始めて大きく育てる〜データ分析基盤の開発から活用まで
kniino
0
1.9k
NgRx Signal Store
rainerhahnekamp
0
100
HEXA OSINT CTF V3 作戦会議
meow_noisy
0
110
SIEMを用いて、セキュリティログ分析の可視化と分析を実現し、PDCAサイクルを回してみた
coconala_engineer
0
200
WebアプリケーションにおけるPDOの使い方入門 / phpcon odawara 2024
meihei3
2
420
SREとその組織類型
tatsuo48
8
1.5k
Databricksを活用してDELISH KITCHENのレシピレコメンドを開発した話
furu8
0
250
少数チームで挑む: SwiftUI, TCA, KMPを用いた 新規動画配信アプリ 「ABEMA Live」の開発について
tomu28
0
520
Featured
See All Featured
Build The Right Thing And Hit Your Dates
maggiecrowley
23
2k
Put a Button on it: Removing Barriers to Going Fast.
kastner
58
3k
Cheating the UX When There Is Nothing More to Optimize - PixelPioneers
stephaniewalter
272
13k
How To Stay Up To Date on Web Technology
chriscoyier
781
250k
Atom: Resistance is Futile
akmur
258
25k
Bootstrapping a Software Product
garrettdimon
PRO
301
110k
CoffeeScript is Beautiful & I Never Want to Write Plain JavaScript Again
sstephenson
153
14k
Keith and Marios Guide to Fast Websites
keithpitt
408
22k
Making Projects Easy
brettharned
107
5.5k
Building Flexible Design Systems
yeseniaperezcruz
318
37k
Six Lessons from altMBA
skipperchong
19
3k
Fantastic passwords and where to find them - at NoRuKo
philnash
36
2.5k
Transcript
Me • Ward Bekker • TTY.nl • Product owner •
@wardbekker • https://github.com/wardbekker maandag 16 juli 12
Building a search engine in less than 15 minutes maandag
16 juli 12
Documents maandag 16 juli 12
Index maandag 16 juli 12
Querying maandag 16 juli 12
Result maandag 16 juli 12
maandag 16 juli 12
maandag 16 juli 12
Tuples: InvertedIndex: maandag 16 juli 12
Querying maandag 16 juli 12
maandag 16 juli 12
Elias ɣ Encoding • 1, 2, 3 • 00000000 00000001,
00000000 00000010, 00000000 00000011 • 6 bytes = 48 bits maandag 16 juli 12
Elias ɣ Encoding n => .. => length + 1
value 1 => 1 => 1 2 => 10 => 0 10 3 => 11 => 0 11 4 => 100 => 00 100 5 => 101 => 00 101 maandag 16 juli 12
Elias ɣ Encoding • 0000000000000001, 0000000000000010, 0000000000000011 • 48 bits
• 1 010 011 • 7 bits • compression ratio: 0.15 maandag 16 juli 12
Elias ɣ Encoding • erts_debug:size([1,2,3,4,5]) => • 10 words =
10 * 8 bytes = 640 bits • erlang:byte_size(binary:encode_unsigned(2#10100110010000101)). • 3 bytes = 24 bits • compression ratio: 0.04 maandag 16 juli 12
Delta Gap Compression • [3, 7, 8, 9, 12, 13,
14, 15, 16] • 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 • 3 - 1 - 3 - 3 - 2 - 5 • compression ratio: 0.67 maandag 16 juli 12
Combined • 64-bit Erlang • [3, 7, 8, 9, 12,
13, 14, 15, 16] • [3, 1, 3 , 3, 2, 5] • 011101101100101 • 1152 vs 16 bits • compression ratio: 0.014 maandag 16 juli 12
More information • https://github.com/wardbekker/compression/ • Modern Information Retrieval • http://www.mir2ed.org/
• @wardbekker maandag 16 juli 12