Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Building a search engine - in less than 15 minutes
Search
Ward Bekker
July 12, 2012
Technology
140
3
Share
Building a search engine - in less than 15 minutes
In this talk we dive into two popular compression algorithms in search engines.
Ward Bekker
July 12, 2012
More Decks by Ward Bekker
See All by Ward Bekker
Automated testing with Erlang - these go to eleven -
wardbekker
3
270
Erltricity
wardbekker
1
86
Other Decks in Technology
See All in Technology
マーケットプレイス版Oracle WebCenter Content For OCI
oracle4engineer
PRO
5
1.8k
Claude Codeを組織で使いこなす— サーバサイドAIエージェント運用の実践知
techtekt
PRO
0
150
地元にいないローカルオーガナイザーの立ち回り
uvb_76
1
410
Gradle×GitHub_ActionsでCI時間を約50%短縮 ジョブ分割の設計と落とし穴 / Cutting CI Time by ~50% with Gradle and GitHub Actions: Job-Splitting Design and Pitfalls
takatty
0
570
Oracle AI Database@AWS:サービス概要のご紹介
oracle4engineer
PRO
4
2.8k
イベントストーミングとKiroの仕様駆動開発で実現する要件の認識合わせプロセス
syobochim
7
1k
個人の発見を、組織の知恵に 〜生成AI活用を"探索"から"組織の仕組み"へ〜
kintotechdev
2
430
React、まだ楽しくて草
uhyo
7
3.6k
Spring Boot における AOT Cache 活用テクニックと 起動時間改善事例
ntt_dsol_java
0
190
開発を止めない CI/CD ~CI Visibilityによる継続的最適化~
pensuke628
0
230
ポスター発表&デモと総括 / Poster Presentations & Demonstrations and Summary
ks91
PRO
0
180
はじめてのDatadog
kairim0
0
250
Featured
See All Featured
Heart Work Chapter 1 - Part 1
lfama
PRO
7
36k
Google's AI Overviews - The New Search
badams
0
1k
Stewardship and Sustainability of Urban and Community Forests
pwiseman
0
220
Embracing the Ebb and Flow
colly
88
5.1k
Connecting the Dots Between Site Speed, User Experience & Your Business [WebExpo 2025]
tammyeverts
11
930
How People are Using Generative and Agentic AI to Supercharge Their Products, Projects, Services and Value Streams Today
helenjbeal
1
200
JavaScript: Past, Present, and Future - NDC Porto 2020
reverentgeek
52
5.9k
Design of three-dimensional binary manipulators for pick-and-place task avoiding obstacles (IECON2024)
konakalab
0
440
Data-driven link building: lessons from a $708K investment (BrightonSEO talk)
szymonslowik
1
1.1k
SEO in 2025: How to Prepare for the Future of Search
ipullrank
3
3.5k
<Decoding/> the Language of Devs - We Love SEO 2024
nikkihalliwell
1
230
Context Engineering - Making Every Token Count
addyosmani
9
930
Transcript
Me • Ward Bekker • TTY.nl • Product owner •
@wardbekker • https://github.com/wardbekker maandag 16 juli 12
Building a search engine in less than 15 minutes maandag
16 juli 12
Documents maandag 16 juli 12
Index maandag 16 juli 12
Querying maandag 16 juli 12
Result maandag 16 juli 12
maandag 16 juli 12
maandag 16 juli 12
Tuples: InvertedIndex: maandag 16 juli 12
Querying maandag 16 juli 12
maandag 16 juli 12
Elias ɣ Encoding • 1, 2, 3 • 00000000 00000001,
00000000 00000010, 00000000 00000011 • 6 bytes = 48 bits maandag 16 juli 12
Elias ɣ Encoding n => .. => length + 1
value 1 => 1 => 1 2 => 10 => 0 10 3 => 11 => 0 11 4 => 100 => 00 100 5 => 101 => 00 101 maandag 16 juli 12
Elias ɣ Encoding • 0000000000000001, 0000000000000010, 0000000000000011 • 48 bits
• 1 010 011 • 7 bits • compression ratio: 0.15 maandag 16 juli 12
Elias ɣ Encoding • erts_debug:size([1,2,3,4,5]) => • 10 words =
10 * 8 bytes = 640 bits • erlang:byte_size(binary:encode_unsigned(2#10100110010000101)). • 3 bytes = 24 bits • compression ratio: 0.04 maandag 16 juli 12
Delta Gap Compression • [3, 7, 8, 9, 12, 13,
14, 15, 16] • 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 • 3 - 1 - 3 - 3 - 2 - 5 • compression ratio: 0.67 maandag 16 juli 12
Combined • 64-bit Erlang • [3, 7, 8, 9, 12,
13, 14, 15, 16] • [3, 1, 3 , 3, 2, 5] • 011101101100101 • 1152 vs 16 bits • compression ratio: 0.014 maandag 16 juli 12
More information • https://github.com/wardbekker/compression/ • Modern Information Retrieval • http://www.mir2ed.org/
• @wardbekker maandag 16 juli 12