Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Building a search engine - in less than 15 minutes
Search
Ward Bekker
July 12, 2012
Technology
3
140
Building a search engine - in less than 15 minutes
In this talk we dive into two popular compression algorithms in search engines.
Ward Bekker
July 12, 2012
Tweet
Share
More Decks by Ward Bekker
See All by Ward Bekker
Automated testing with Erlang - these go to eleven -
wardbekker
3
270
Erltricity
wardbekker
1
84
Other Decks in Technology
See All in Technology
Kafkaを利用したIcebergへのデータストリーミング
shmza
0
200
AWSのProductのLifecycleについて
stknohg
PRO
0
240
Rust In Python
lycorptech_jp
PRO
3
330
AIコーディングとエンジニアリングの現在地 / A Snapshot of AI Coding and Engineering(Sept. 2025)
ar_tama
0
130
Flaky Testへの現実解をGoのプロポーザルから考える | Go Conference 2025
upamune
1
190
「非更新サブスクリプション」って何者?
haseken_dev
0
210
北海道の人に知ってもらいたいGISスポット / gis-spot-in-hokkaido-2025
sakaik
0
140
生成AIを活用したZennの取り組み事例
ryosukeigarashi
0
130
stupid jj tricks
indirect
0
6.1k
非エンジニアのあなたもできる&もうやってる!コンテキストエンジニアリング
findy_eventslides
3
560
今改めてServiceクラスについて考える 〜あるRails開発者の10年〜
joker1007
16
7.5k
データ民主化を加速する仕組み作り -BigQuery Sharing の活用-
plaidtech
PRO
0
130
Featured
See All Featured
Measuring & Analyzing Core Web Vitals
bluesmoon
9
600
Navigating Team Friction
lara
189
15k
Building Flexible Design Systems
yeseniaperezcruz
329
39k
RailsConf 2023
tenderlove
30
1.2k
A designer walks into a library…
pauljervisheath
208
24k
YesSQL, Process and Tooling at Scale
rocio
173
14k
Java REST API Framework Comparison - PWX 2021
mraible
33
8.8k
Designing Experiences People Love
moore
142
24k
Automating Front-end Workflow
addyosmani
1371
200k
Site-Speed That Sticks
csswizardry
10
850
CSS Pre-Processors: Stylus, Less & Sass
bermonpainter
358
30k
Docker and Python
trallard
46
3.6k
Transcript
Me • Ward Bekker • TTY.nl • Product owner •
@wardbekker • https://github.com/wardbekker maandag 16 juli 12
Building a search engine in less than 15 minutes maandag
16 juli 12
Documents maandag 16 juli 12
Index maandag 16 juli 12
Querying maandag 16 juli 12
Result maandag 16 juli 12
maandag 16 juli 12
maandag 16 juli 12
Tuples: InvertedIndex: maandag 16 juli 12
Querying maandag 16 juli 12
maandag 16 juli 12
Elias ɣ Encoding • 1, 2, 3 • 00000000 00000001,
00000000 00000010, 00000000 00000011 • 6 bytes = 48 bits maandag 16 juli 12
Elias ɣ Encoding n => .. => length + 1
value 1 => 1 => 1 2 => 10 => 0 10 3 => 11 => 0 11 4 => 100 => 00 100 5 => 101 => 00 101 maandag 16 juli 12
Elias ɣ Encoding • 0000000000000001, 0000000000000010, 0000000000000011 • 48 bits
• 1 010 011 • 7 bits • compression ratio: 0.15 maandag 16 juli 12
Elias ɣ Encoding • erts_debug:size([1,2,3,4,5]) => • 10 words =
10 * 8 bytes = 640 bits • erlang:byte_size(binary:encode_unsigned(2#10100110010000101)). • 3 bytes = 24 bits • compression ratio: 0.04 maandag 16 juli 12
Delta Gap Compression • [3, 7, 8, 9, 12, 13,
14, 15, 16] • 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 • 3 - 1 - 3 - 3 - 2 - 5 • compression ratio: 0.67 maandag 16 juli 12
Combined • 64-bit Erlang • [3, 7, 8, 9, 12,
13, 14, 15, 16] • [3, 1, 3 , 3, 2, 5] • 011101101100101 • 1152 vs 16 bits • compression ratio: 0.014 maandag 16 juli 12
More information • https://github.com/wardbekker/compression/ • Modern Information Retrieval • http://www.mir2ed.org/
• @wardbekker maandag 16 juli 12