Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Building a search engine - in less than 15 minutes
Search
Ward Bekker
July 12, 2012
Technology
3
140
Building a search engine - in less than 15 minutes
In this talk we dive into two popular compression algorithms in search engines.
Ward Bekker
July 12, 2012
Tweet
Share
More Decks by Ward Bekker
See All by Ward Bekker
Automated testing with Erlang - these go to eleven -
wardbekker
3
270
Erltricity
wardbekker
1
83
Other Decks in Technology
See All in Technology
RSCの時代にReactとフレームワークの境界を探る
uhyo
9
2.9k
2025年になってもまだMySQLが好き
yoku0825
8
3.9k
役割は変わっても、変わらないもの 〜スクラムマスターからEMへの転身で学んだ信頼構築の本質〜 / How to build trust
shinop
0
160
個人CLAUDE.md紹介と設定から学んだこと/introduce-my-claude-md
shibayu36
0
190
開発者を支える Internal Developer Portal のイマとコレカラ / To-day and To-morrow of Internal Developer Portals: Supporting Developers
aoto
PRO
1
140
まだ間に合う! StrandsとBedrock AgentCoreでAIエージェント構築に入門しよう
minorun365
PRO
11
870
エラーとアクセシビリティ
schktjm
0
890
ヘブンバーンズレッドにおける、世界観を活かしたミニゲーム企画の作り方
gree_tech
PRO
0
510
フィンテック養成勉強会#56
finengine
0
120
Function Body Macros で、SwiftUI の View に Accessibility Identifier を自動付与する/Function Body Macros: Autogenerate accessibility identifiers for SwiftUI Views
miichan
2
170
Obsidian応用活用術
onikun94
0
370
サンドボックス技術でAI利活用を促進する
koh_naga
0
180
Featured
See All Featured
Stop Working from a Prison Cell
hatefulcrawdad
271
21k
Large-scale JavaScript Application Architecture
addyosmani
512
110k
How to Think Like a Performance Engineer
csswizardry
26
1.9k
CSS Pre-Processors: Stylus, Less & Sass
bermonpainter
358
30k
RailsConf & Balkan Ruby 2019: The Past, Present, and Future of Rails at GitHub
eileencodes
139
34k
[Rails World 2023 - Day 1 Closing Keynote] - The Magic of Rails
eileencodes
36
2.5k
Fight the Zombie Pattern Library - RWD Summit 2016
marcelosomers
234
17k
Learning to Love Humans: Emotional Interface Design
aarron
273
40k
Typedesign – Prime Four
hannesfritz
42
2.8k
Being A Developer After 40
akosma
90
590k
How to Ace a Technical Interview
jacobian
279
23k
Docker and Python
trallard
45
3.5k
Transcript
Me • Ward Bekker • TTY.nl • Product owner •
@wardbekker • https://github.com/wardbekker maandag 16 juli 12
Building a search engine in less than 15 minutes maandag
16 juli 12
Documents maandag 16 juli 12
Index maandag 16 juli 12
Querying maandag 16 juli 12
Result maandag 16 juli 12
maandag 16 juli 12
maandag 16 juli 12
Tuples: InvertedIndex: maandag 16 juli 12
Querying maandag 16 juli 12
maandag 16 juli 12
Elias ɣ Encoding • 1, 2, 3 • 00000000 00000001,
00000000 00000010, 00000000 00000011 • 6 bytes = 48 bits maandag 16 juli 12
Elias ɣ Encoding n => .. => length + 1
value 1 => 1 => 1 2 => 10 => 0 10 3 => 11 => 0 11 4 => 100 => 00 100 5 => 101 => 00 101 maandag 16 juli 12
Elias ɣ Encoding • 0000000000000001, 0000000000000010, 0000000000000011 • 48 bits
• 1 010 011 • 7 bits • compression ratio: 0.15 maandag 16 juli 12
Elias ɣ Encoding • erts_debug:size([1,2,3,4,5]) => • 10 words =
10 * 8 bytes = 640 bits • erlang:byte_size(binary:encode_unsigned(2#10100110010000101)). • 3 bytes = 24 bits • compression ratio: 0.04 maandag 16 juli 12
Delta Gap Compression • [3, 7, 8, 9, 12, 13,
14, 15, 16] • 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 • 3 - 1 - 3 - 3 - 2 - 5 • compression ratio: 0.67 maandag 16 juli 12
Combined • 64-bit Erlang • [3, 7, 8, 9, 12,
13, 14, 15, 16] • [3, 1, 3 , 3, 2, 5] • 011101101100101 • 1152 vs 16 bits • compression ratio: 0.014 maandag 16 juli 12
More information • https://github.com/wardbekker/compression/ • Modern Information Retrieval • http://www.mir2ed.org/
• @wardbekker maandag 16 juli 12