Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Building a search engine - in less than 15 minutes
Search
Ward Bekker
July 12, 2012
Technology
3
140
Building a search engine - in less than 15 minutes
In this talk we dive into two popular compression algorithms in search engines.
Ward Bekker
July 12, 2012
Tweet
Share
More Decks by Ward Bekker
See All by Ward Bekker
Automated testing with Erlang - these go to eleven -
wardbekker
3
270
Erltricity
wardbekker
1
85
Other Decks in Technology
See All in Technology
Scrumは歪む — 組織設計の原理原則
dashi
0
200
Sansanでの認証基盤内製化と移行
sansantech
PRO
0
570
OCI技術資料 : コンピュート・サービス 概要
ocise
4
54k
脳内メモリ、思ったより揮発性だった
koutorino
0
380
銀行の内製開発にて2つのプロダクトを1つのチームでスクラムしてみてる話
koba1210
1
140
AlloyDB 奮闘記
hatappi
0
150
OSC仙台プレ勉強会 AlmaLinuxとは
koedoyoshida
0
190
楽しく学ぼう!ネットワーク入門
shotashiratori
4
3.4k
頼れる Agentic AI を支える Datadog のオブザーバビリティ / Powering Reliable Agentic AI with Datadog Observability
aoto
PRO
0
200
visionOS 開発向けの MCP / Skills をつくり続けることで XR の探究と学習を最大化
karad
1
590
会社紹介資料 / Sansan Company Profile
sansan33
PRO
16
410k
エンジニアリングマネージャーの仕事
yuheinakasaka
0
110
Featured
See All Featured
実際に使うSQLの書き方 徹底解説 / pgcon21j-tutorial
soudai
PRO
199
73k
The AI Search Optimization Roadmap by Aleyda Solis
aleyda
1
5.4k
Distributed Sagas: A Protocol for Coordinating Microservices
caitiem20
333
22k
Why You Should Never Use an ORM
jnunemaker
PRO
61
9.8k
Building a Scalable Design System with Sketch
lauravandoore
463
34k
Money Talks: Using Revenue to Get Sh*t Done
nikkihalliwell
0
180
The Invisible Side of Design
smashingmag
302
51k
Impact Scores and Hybrid Strategies: The future of link building
tamaranovitovic
0
230
First, design no harm
axbom
PRO
2
1.1k
How GitHub (no longer) Works
holman
316
150k
Odyssey Design
rkendrick25
PRO
2
550
JAMstack: Web Apps at Ludicrous Speed - All Things Open 2022
reverentgeek
1
390
Transcript
Me • Ward Bekker • TTY.nl • Product owner •
@wardbekker • https://github.com/wardbekker maandag 16 juli 12
Building a search engine in less than 15 minutes maandag
16 juli 12
Documents maandag 16 juli 12
Index maandag 16 juli 12
Querying maandag 16 juli 12
Result maandag 16 juli 12
maandag 16 juli 12
maandag 16 juli 12
Tuples: InvertedIndex: maandag 16 juli 12
Querying maandag 16 juli 12
maandag 16 juli 12
Elias ɣ Encoding • 1, 2, 3 • 00000000 00000001,
00000000 00000010, 00000000 00000011 • 6 bytes = 48 bits maandag 16 juli 12
Elias ɣ Encoding n => .. => length + 1
value 1 => 1 => 1 2 => 10 => 0 10 3 => 11 => 0 11 4 => 100 => 00 100 5 => 101 => 00 101 maandag 16 juli 12
Elias ɣ Encoding • 0000000000000001, 0000000000000010, 0000000000000011 • 48 bits
• 1 010 011 • 7 bits • compression ratio: 0.15 maandag 16 juli 12
Elias ɣ Encoding • erts_debug:size([1,2,3,4,5]) => • 10 words =
10 * 8 bytes = 640 bits • erlang:byte_size(binary:encode_unsigned(2#10100110010000101)). • 3 bytes = 24 bits • compression ratio: 0.04 maandag 16 juli 12
Delta Gap Compression • [3, 7, 8, 9, 12, 13,
14, 15, 16] • 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 • 3 - 1 - 3 - 3 - 2 - 5 • compression ratio: 0.67 maandag 16 juli 12
Combined • 64-bit Erlang • [3, 7, 8, 9, 12,
13, 14, 15, 16] • [3, 1, 3 , 3, 2, 5] • 011101101100101 • 1152 vs 16 bits • compression ratio: 0.014 maandag 16 juli 12
More information • https://github.com/wardbekker/compression/ • Modern Information Retrieval • http://www.mir2ed.org/
• @wardbekker maandag 16 juli 12