Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Building a search engine - in less than 15 minutes
Search
Ward Bekker
July 12, 2012
Technology
140
3
Share
Building a search engine - in less than 15 minutes
In this talk we dive into two popular compression algorithms in search engines.
Ward Bekker
July 12, 2012
More Decks by Ward Bekker
See All by Ward Bekker
Automated testing with Erlang - these go to eleven -
wardbekker
3
270
Erltricity
wardbekker
1
86
Other Decks in Technology
See All in Technology
A Harness for Behaviour: how to get AI to generate code that does what we intend, or "TDD in the age of AI"
xpmatteo
1
530
なぜハノーバーメッセに行くべきなのか 〜初参加だから語れること〜
tanakaseiya
0
190
電子辞書Brainをネットに繋げてみた(自力編)
raspython3
0
400
ChatworkとBPaaS 異なる特性で学んだAI機能開発の ベストプラクティス
kubell_hr
2
570
形式手法特論:公平性制約の位相的特徴づけ #kernelvm / Kernel VM Study Kansai 12th
ytaka23
1
660
組織の中で自分を経営する技術
shoota
0
230
大学生が本気でDatabricksを活用してDiscordサークルをデータ駆動させてみた
phantomjuju
1
310
AIが変えた"品質の守り方"
kkakizaki
13
5.5k
Claude code Orchestra
ozakiomumkj
3
830
Sony_KMP_Journey_KotlinConf2026
sony
1
190
さきさん文庫の書籍ができるまで
sakiengineer
0
320
美味しいスイスチーズを作ろう🧀🐭
taigamikami
1
200
Featured
See All Featured
The Organizational Zoo: Understanding Human Behavior Agility Through Metaphoric Constructive Conversations (based on the works of Arthur Shelley, Ph.D)
kimpetersen
PRO
0
340
Build your cross-platform service in a week with App Engine
jlugia
234
18k
Amusing Abliteration
ianozsvald
1
190
jQuery: Nuts, Bolts and Bling
dougneiner
66
8.5k
I Don’t Have Time: Getting Over the Fear to Launch Your Podcast
jcasabona
34
2.8k
RailsConf 2023
tenderlove
30
1.5k
The Art of Delivering Value - GDevCon NA Keynote
reverentgeek
16
2k
The innovator’s Mindset - Leading Through an Era of Exponential Change - McGill University 2025
jdejongh
PRO
1
190
Balancing Empowerment & Direction
lara
6
1.1k
Leveraging Curiosity to Care for An Aging Population
cassininazir
1
260
Side Projects
sachag
455
43k
HDC tutorial
michielstock
2
680
Transcript
Me • Ward Bekker • TTY.nl • Product owner •
@wardbekker • https://github.com/wardbekker maandag 16 juli 12
Building a search engine in less than 15 minutes maandag
16 juli 12
Documents maandag 16 juli 12
Index maandag 16 juli 12
Querying maandag 16 juli 12
Result maandag 16 juli 12
maandag 16 juli 12
maandag 16 juli 12
Tuples: InvertedIndex: maandag 16 juli 12
Querying maandag 16 juli 12
maandag 16 juli 12
Elias ɣ Encoding • 1, 2, 3 • 00000000 00000001,
00000000 00000010, 00000000 00000011 • 6 bytes = 48 bits maandag 16 juli 12
Elias ɣ Encoding n => .. => length + 1
value 1 => 1 => 1 2 => 10 => 0 10 3 => 11 => 0 11 4 => 100 => 00 100 5 => 101 => 00 101 maandag 16 juli 12
Elias ɣ Encoding • 0000000000000001, 0000000000000010, 0000000000000011 • 48 bits
• 1 010 011 • 7 bits • compression ratio: 0.15 maandag 16 juli 12
Elias ɣ Encoding • erts_debug:size([1,2,3,4,5]) => • 10 words =
10 * 8 bytes = 640 bits • erlang:byte_size(binary:encode_unsigned(2#10100110010000101)). • 3 bytes = 24 bits • compression ratio: 0.04 maandag 16 juli 12
Delta Gap Compression • [3, 7, 8, 9, 12, 13,
14, 15, 16] • 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 • 3 - 1 - 3 - 3 - 2 - 5 • compression ratio: 0.67 maandag 16 juli 12
Combined • 64-bit Erlang • [3, 7, 8, 9, 12,
13, 14, 15, 16] • [3, 1, 3 , 3, 2, 5] • 011101101100101 • 1152 vs 16 bits • compression ratio: 0.014 maandag 16 juli 12
More information • https://github.com/wardbekker/compression/ • Modern Information Retrieval • http://www.mir2ed.org/
• @wardbekker maandag 16 juli 12