Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Building a search engine - in less than 15 minutes
Search
Ward Bekker
July 12, 2012
Technology
3
140
Building a search engine - in less than 15 minutes
In this talk we dive into two popular compression algorithms in search engines.
Ward Bekker
July 12, 2012
Tweet
Share
More Decks by Ward Bekker
See All by Ward Bekker
Automated testing with Erlang - these go to eleven -
wardbekker
3
270
Erltricity
wardbekker
1
84
Other Decks in Technology
See All in Technology
かわいい身体と声を持つ そういうものに私はなりたい
yoshimura_datam
0
390
AWS監視を「もっと楽する」ために
uechishingo
0
360
書籍執筆での生成AIの活用
sat
PRO
1
200
Kaggleコンペティション「MABe Challenge - Social Action Recognition in Mice」振り返り
yu4u
1
740
「全社導入」は結果。1人の熱狂が組織に伝播したmikanのn8n活用
sota_mikami
0
450
Agentic Coding 実践ワークショップ
watany
38
26k
エンジニアとして長く走るために気づいた2つのこと_大賀愛一郎
nanaism
1
240
いよいよ仕事を奪われそうな波が来たぜ
kazzpapa3
2
170
漸進的過負荷の原則
sansantech
PRO
3
350
AWSと暗号技術
nrinetcom
PRO
1
160
【Oracle Cloud ウェビナー】ランサムウェアが突く「侵入の隙」とバックアップの「死角」 ~ 過去の教訓に学ぶ — 侵入前提の防御とデータ保護 ~
oracle4engineer
PRO
2
210
AI アクセラレータチップ AWS Trainium/Inferentia に 今こそ入門
yoshimi0227
1
320
Featured
See All Featured
Building an army of robots
kneath
306
46k
Designing for Performance
lara
610
70k
The Cult of Friendly URLs
andyhume
79
6.8k
DBのスキルで生き残る技術 - AI時代におけるテーブル設計の勘所
soudai
PRO
61
49k
Connecting the Dots Between Site Speed, User Experience & Your Business [WebExpo 2025]
tammyeverts
11
800
Breaking role norms: Why Content Design is so much more than writing copy - Taylor Woolridge
uxyall
0
140
How People are Using Generative and Agentic AI to Supercharge Their Products, Projects, Services and Value Streams Today
helenjbeal
1
100
From π to Pie charts
rasagy
0
120
Lightning talk: Run Django tests with GitHub Actions
sabderemane
0
100
I Don’t Have Time: Getting Over the Fear to Launch Your Podcast
jcasabona
34
2.6k
JAMstack: Web Apps at Ludicrous Speed - All Things Open 2022
reverentgeek
1
310
Why You Should Never Use an ORM
jnunemaker
PRO
61
9.7k
Transcript
Me • Ward Bekker • TTY.nl • Product owner •
@wardbekker • https://github.com/wardbekker maandag 16 juli 12
Building a search engine in less than 15 minutes maandag
16 juli 12
Documents maandag 16 juli 12
Index maandag 16 juli 12
Querying maandag 16 juli 12
Result maandag 16 juli 12
maandag 16 juli 12
maandag 16 juli 12
Tuples: InvertedIndex: maandag 16 juli 12
Querying maandag 16 juli 12
maandag 16 juli 12
Elias ɣ Encoding • 1, 2, 3 • 00000000 00000001,
00000000 00000010, 00000000 00000011 • 6 bytes = 48 bits maandag 16 juli 12
Elias ɣ Encoding n => .. => length + 1
value 1 => 1 => 1 2 => 10 => 0 10 3 => 11 => 0 11 4 => 100 => 00 100 5 => 101 => 00 101 maandag 16 juli 12
Elias ɣ Encoding • 0000000000000001, 0000000000000010, 0000000000000011 • 48 bits
• 1 010 011 • 7 bits • compression ratio: 0.15 maandag 16 juli 12
Elias ɣ Encoding • erts_debug:size([1,2,3,4,5]) => • 10 words =
10 * 8 bytes = 640 bits • erlang:byte_size(binary:encode_unsigned(2#10100110010000101)). • 3 bytes = 24 bits • compression ratio: 0.04 maandag 16 juli 12
Delta Gap Compression • [3, 7, 8, 9, 12, 13,
14, 15, 16] • 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 • 3 - 1 - 3 - 3 - 2 - 5 • compression ratio: 0.67 maandag 16 juli 12
Combined • 64-bit Erlang • [3, 7, 8, 9, 12,
13, 14, 15, 16] • [3, 1, 3 , 3, 2, 5] • 011101101100101 • 1152 vs 16 bits • compression ratio: 0.014 maandag 16 juli 12
More information • https://github.com/wardbekker/compression/ • Modern Information Retrieval • http://www.mir2ed.org/
• @wardbekker maandag 16 juli 12