Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Building a search engine - in less than 15 minutes
Search
Ward Bekker
July 12, 2012
Technology
3
140
Building a search engine - in less than 15 minutes
In this talk we dive into two popular compression algorithms in search engines.
Ward Bekker
July 12, 2012
Tweet
Share
More Decks by Ward Bekker
See All by Ward Bekker
Automated testing with Erlang - these go to eleven -
wardbekker
3
270
Erltricity
wardbekker
1
83
Other Decks in Technology
See All in Technology
Kotlinで学ぶ 代数的データ型
ysknsid25
5
1.1k
Model Mondays S2E01: Advanced Reasoning
nitya
0
330
IIWレポートからみるID業界で話題のMCP
fujie
0
160
(新URLに移行しました)FASTと向き合うことで見えた、大規模アジャイルの難しさと楽しさ
wooootack
0
690
Rubyで作る論理回路シミュレータの設計の話 - Kashiwa.rb #12
kozy4324
1
210
OpenTelemetry Collector internals
ymotongpoo
5
530
Digitization部 紹介資料
sansan33
PRO
1
4.2k
Introduction to Sansan for Engineers / エンジニア向け会社紹介
sansan33
PRO
5
38k
kotlin-lsp を Emacs で使えるようにしてみた / use kotlin-lsp in Emacs
nabeo
0
150
宇宙パトロール ルル子から考える LT設計のコツ
masakiokuda
2
100
上長や社内ステークホルダーに対する解像度を上げて、より良い補完関係を築く方法 / How-to-increase-resolution-and-build-better-complementary-relationships-with-your-bosses-and-internal-stakeholders
madoxten
13
7.6k
AIエージェントの継続的改善のためオブザーバビリティ
pharma_x_tech
6
1.1k
Featured
See All Featured
Into the Great Unknown - MozCon
thekraken
39
1.8k
Facilitating Awesome Meetings
lara
54
6.4k
How To Stay Up To Date on Web Technology
chriscoyier
790
250k
Refactoring Trust on Your Teams (GOTO; Chicago 2020)
rmw
34
3k
How to Ace a Technical Interview
jacobian
276
23k
The Power of CSS Pseudo Elements
geoffreycrofte
77
5.8k
CoffeeScript is Beautiful & I Never Want to Write Plain JavaScript Again
sstephenson
161
15k
Connecting the Dots Between Site Speed, User Experience & Your Business [WebExpo 2025]
tammyeverts
4
130
CSS Pre-Processors: Stylus, Less & Sass
bermonpainter
357
30k
[RailsConf 2023] Rails as a piece of cake
palkan
55
5.6k
Navigating Team Friction
lara
186
15k
Improving Core Web Vitals using Speculation Rules API
sergeychernyshev
16
920
Transcript
Me • Ward Bekker • TTY.nl • Product owner •
@wardbekker • https://github.com/wardbekker maandag 16 juli 12
Building a search engine in less than 15 minutes maandag
16 juli 12
Documents maandag 16 juli 12
Index maandag 16 juli 12
Querying maandag 16 juli 12
Result maandag 16 juli 12
maandag 16 juli 12
maandag 16 juli 12
Tuples: InvertedIndex: maandag 16 juli 12
Querying maandag 16 juli 12
maandag 16 juli 12
Elias ɣ Encoding • 1, 2, 3 • 00000000 00000001,
00000000 00000010, 00000000 00000011 • 6 bytes = 48 bits maandag 16 juli 12
Elias ɣ Encoding n => .. => length + 1
value 1 => 1 => 1 2 => 10 => 0 10 3 => 11 => 0 11 4 => 100 => 00 100 5 => 101 => 00 101 maandag 16 juli 12
Elias ɣ Encoding • 0000000000000001, 0000000000000010, 0000000000000011 • 48 bits
• 1 010 011 • 7 bits • compression ratio: 0.15 maandag 16 juli 12
Elias ɣ Encoding • erts_debug:size([1,2,3,4,5]) => • 10 words =
10 * 8 bytes = 640 bits • erlang:byte_size(binary:encode_unsigned(2#10100110010000101)). • 3 bytes = 24 bits • compression ratio: 0.04 maandag 16 juli 12
Delta Gap Compression • [3, 7, 8, 9, 12, 13,
14, 15, 16] • 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 • 3 - 1 - 3 - 3 - 2 - 5 • compression ratio: 0.67 maandag 16 juli 12
Combined • 64-bit Erlang • [3, 7, 8, 9, 12,
13, 14, 15, 16] • [3, 1, 3 , 3, 2, 5] • 011101101100101 • 1152 vs 16 bits • compression ratio: 0.014 maandag 16 juli 12
More information • https://github.com/wardbekker/compression/ • Modern Information Retrieval • http://www.mir2ed.org/
• @wardbekker maandag 16 juli 12