Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Building a search engine - in less than 15 minutes
Search
Sponsored
·
Ship Features Fearlessly
Turn features on and off without deploys. Used by thousands of Ruby developers.
→
Ward Bekker
July 12, 2012
Technology
3
140
Building a search engine - in less than 15 minutes
In this talk we dive into two popular compression algorithms in search engines.
Ward Bekker
July 12, 2012
Tweet
Share
More Decks by Ward Bekker
See All by Ward Bekker
Automated testing with Erlang - these go to eleven -
wardbekker
3
270
Erltricity
wardbekker
1
85
Other Decks in Technology
See All in Technology
わからなくて良いなら、わからなきゃだめなの?
kotaoue
1
370
SRE NEXT 2026 CfP レビュアーが語る聞きたくなるプロポーザルとは?
yutakawasaki0911
1
420
Kiro Powers 入門
k_adachi_01
0
110
エンジニアリングマネージャーの仕事
yuheinakasaka
0
110
モジュラモノリス導入から4年間の総括:アーキテクチャと組織の相互作用について / Architecture and Organizational Interaction
nazonohito51
1
190
1GB RAMのラズピッピで何ができるのか試してみよう / 20260319-rpijam-1gb-rpi-whats-possible
akkiesoft
0
350
今のWordPress の制作手法ってなにがあんねん?(改) / What’s the Deal with WordPress Development These Days?
tbshiki
0
500
A Casual Introduction to RISC-V
omasanori
0
320
[E2]CCoEはAI指揮官へ。Bedrock×MCPで構築するコスト・セキュリティ自律運用基盤
taku1418
0
190
PMとしての意思決定とAI活用状況について
lycorptech_jp
PRO
0
140
バクラク最古参プロダクトで重ねた技術投資を振り返る
ypresto
0
170
TypeScript 7.0の現在地と備え方
uhyo
7
1.8k
Featured
See All Featured
A Modern Web Designer's Workflow
chriscoyier
698
190k
Between Models and Reality
mayunak
2
240
The Invisible Side of Design
smashingmag
302
51k
Believing is Seeing
oripsolob
1
86
Understanding Cognitive Biases in Performance Measurement
bluesmoon
32
2.8k
Reality Check: Gamification 10 Years Later
codingconduct
0
2.1k
Building Flexible Design Systems
yeseniaperezcruz
330
40k
Lightning Talk: Beautiful Slides for Beginners
inesmontani
PRO
1
480
職位にかかわらず全員がリーダーシップを発揮するチーム作り / Building a team where everyone can demonstrate leadership regardless of position
madoxten
62
52k
How STYLIGHT went responsive
nonsquared
100
6k
Large-scale JavaScript Application Architecture
addyosmani
515
110k
Designing Experiences People Love
moore
143
24k
Transcript
Me • Ward Bekker • TTY.nl • Product owner •
@wardbekker • https://github.com/wardbekker maandag 16 juli 12
Building a search engine in less than 15 minutes maandag
16 juli 12
Documents maandag 16 juli 12
Index maandag 16 juli 12
Querying maandag 16 juli 12
Result maandag 16 juli 12
maandag 16 juli 12
maandag 16 juli 12
Tuples: InvertedIndex: maandag 16 juli 12
Querying maandag 16 juli 12
maandag 16 juli 12
Elias ɣ Encoding • 1, 2, 3 • 00000000 00000001,
00000000 00000010, 00000000 00000011 • 6 bytes = 48 bits maandag 16 juli 12
Elias ɣ Encoding n => .. => length + 1
value 1 => 1 => 1 2 => 10 => 0 10 3 => 11 => 0 11 4 => 100 => 00 100 5 => 101 => 00 101 maandag 16 juli 12
Elias ɣ Encoding • 0000000000000001, 0000000000000010, 0000000000000011 • 48 bits
• 1 010 011 • 7 bits • compression ratio: 0.15 maandag 16 juli 12
Elias ɣ Encoding • erts_debug:size([1,2,3,4,5]) => • 10 words =
10 * 8 bytes = 640 bits • erlang:byte_size(binary:encode_unsigned(2#10100110010000101)). • 3 bytes = 24 bits • compression ratio: 0.04 maandag 16 juli 12
Delta Gap Compression • [3, 7, 8, 9, 12, 13,
14, 15, 16] • 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 • 3 - 1 - 3 - 3 - 2 - 5 • compression ratio: 0.67 maandag 16 juli 12
Combined • 64-bit Erlang • [3, 7, 8, 9, 12,
13, 14, 15, 16] • [3, 1, 3 , 3, 2, 5] • 011101101100101 • 1152 vs 16 bits • compression ratio: 0.014 maandag 16 juli 12
More information • https://github.com/wardbekker/compression/ • Modern Information Retrieval • http://www.mir2ed.org/
• @wardbekker maandag 16 juli 12