Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Building a search engine - in less than 15 minutes
Search
Ward Bekker
July 12, 2012
Technology
3
140
Building a search engine - in less than 15 minutes
In this talk we dive into two popular compression algorithms in search engines.
Ward Bekker
July 12, 2012
Tweet
Share
More Decks by Ward Bekker
See All by Ward Bekker
Automated testing with Erlang - these go to eleven -
wardbekker
3
270
Erltricity
wardbekker
1
84
Other Decks in Technology
See All in Technology
戦えるAIエージェントの作り方
iwiwi
21
10k
Oracle Database@Google Cloud:サービス概要のご紹介
oracle4engineer
PRO
0
420
AWS DMS で SQL Server を移行してみた/aws-dms-sql-server-migration
emiki
0
280
NOT A HOTEL SOFTWARE DECK (2025/11/06)
notahotel
0
2.3k
設計に疎いエンジニアでも始めやすいアーキテクチャドキュメント
phaya72
26
17k
累計5000万DLサービスの裏側 – LINEマンガのKotlinで挑む大規模 Server-side ETLの最適化
ldf_tech
0
160
触れるけど壊れないWordPressの作り方
masakawai
0
650
初海外がre:Inventだった人間の感じたこと
tommy0124
1
180
東京大学「Agile-X」のFPGA AIデザインハッカソンを制したソニーのAI最適化
sony
0
190
プロダクト開発と社内データ活用での、BI×AIの現在地 / Data_Findy
sansan_randd
1
790
実践マルチモーダル検索!
shibuiwilliam
3
550
マルチエージェントのチームビルディング_2025-10-25
shinoyamada
0
250
Featured
See All Featured
How to train your dragon (web standard)
notwaldorf
97
6.3k
Build The Right Thing And Hit Your Dates
maggiecrowley
38
2.9k
Gamification - CAS2011
davidbonilla
81
5.5k
Product Roadmaps are Hard
iamctodd
PRO
55
11k
Art, The Web, and Tiny UX
lynnandtonic
303
21k
The Cult of Friendly URLs
andyhume
79
6.7k
Building an army of robots
kneath
306
46k
Designing Experiences People Love
moore
142
24k
Making the Leap to Tech Lead
cromwellryan
135
9.6k
Keith and Marios Guide to Fast Websites
keithpitt
412
23k
YesSQL, Process and Tooling at Scale
rocio
174
15k
Six Lessons from altMBA
skipperchong
29
4k
Transcript
Me • Ward Bekker • TTY.nl • Product owner •
@wardbekker • https://github.com/wardbekker maandag 16 juli 12
Building a search engine in less than 15 minutes maandag
16 juli 12
Documents maandag 16 juli 12
Index maandag 16 juli 12
Querying maandag 16 juli 12
Result maandag 16 juli 12
maandag 16 juli 12
maandag 16 juli 12
Tuples: InvertedIndex: maandag 16 juli 12
Querying maandag 16 juli 12
maandag 16 juli 12
Elias ɣ Encoding • 1, 2, 3 • 00000000 00000001,
00000000 00000010, 00000000 00000011 • 6 bytes = 48 bits maandag 16 juli 12
Elias ɣ Encoding n => .. => length + 1
value 1 => 1 => 1 2 => 10 => 0 10 3 => 11 => 0 11 4 => 100 => 00 100 5 => 101 => 00 101 maandag 16 juli 12
Elias ɣ Encoding • 0000000000000001, 0000000000000010, 0000000000000011 • 48 bits
• 1 010 011 • 7 bits • compression ratio: 0.15 maandag 16 juli 12
Elias ɣ Encoding • erts_debug:size([1,2,3,4,5]) => • 10 words =
10 * 8 bytes = 640 bits • erlang:byte_size(binary:encode_unsigned(2#10100110010000101)). • 3 bytes = 24 bits • compression ratio: 0.04 maandag 16 juli 12
Delta Gap Compression • [3, 7, 8, 9, 12, 13,
14, 15, 16] • 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 • 3 - 1 - 3 - 3 - 2 - 5 • compression ratio: 0.67 maandag 16 juli 12
Combined • 64-bit Erlang • [3, 7, 8, 9, 12,
13, 14, 15, 16] • [3, 1, 3 , 3, 2, 5] • 011101101100101 • 1152 vs 16 bits • compression ratio: 0.014 maandag 16 juli 12
More information • https://github.com/wardbekker/compression/ • Modern Information Retrieval • http://www.mir2ed.org/
• @wardbekker maandag 16 juli 12