$30 off During Our Annual Pro Sale. View Details »
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Building a search engine - in less than 15 minutes
Search
Ward Bekker
July 12, 2012
Technology
3
140
Building a search engine - in less than 15 minutes
In this talk we dive into two popular compression algorithms in search engines.
Ward Bekker
July 12, 2012
Tweet
Share
More Decks by Ward Bekker
See All by Ward Bekker
Automated testing with Erlang - these go to eleven -
wardbekker
3
270
Erltricity
wardbekker
1
84
Other Decks in Technology
See All in Technology
2025-12-18_AI駆動開発推進プロジェクト運営について / AIDD-Promotion project management
yayoi_dd
0
110
日本Rubyの会: これまでとこれから
snoozer05
PRO
3
150
30分であなたをOmniのファンにしてみせます~分析画面のクリック操作をそのままコード化できるAI-ReadyなBIツール~
sagara
0
180
[デモです] NotebookLM で作ったスライドの例
kongmingstrap
0
160
「図面」から「法則」へ 〜メタ視点で読み解く現代のソフトウェアアーキテクチャ〜
scova0731
0
350
業務のトイルをバスターせよ 〜AI時代の生存戦略〜
staka121
PRO
2
220
MLflowダイエット大作戦
lycorptech_jp
PRO
1
140
AI 駆動開発勉強会 フロントエンド支部 #1 w/あずもば
1ftseabass
PRO
0
400
寫了幾年 Code,然後呢?軟體工程師必須重新認識的 DevOps
cheng_wei_chen
1
1.5k
ハッカソンから社内プロダクトへ AIエージェント「ko☆shi」開発で学んだ4つの重要要素
sonoda_mj
4
260
AWS CLIの新しい認証情報設定方法aws loginコマンドの実態
wkm2
7
760
Database イノベーショントークを振り返る/reinvent-2025-database-innovation-talk-recap
emiki
0
230
Featured
See All Featured
How Software Deployment tools have changed in the past 20 years
geshan
0
29k
Java REST API Framework Comparison - PWX 2021
mraible
34
9k
Skip the Path - Find Your Career Trail
mkilby
0
22
Building a Scalable Design System with Sketch
lauravandoore
463
34k
Designing Dashboards & Data Visualisations in Web Apps
destraynor
231
54k
Sharpening the Axe: The Primacy of Toolmaking
bcantrill
46
2.6k
Building Applications with DynamoDB
mza
96
6.8k
Principles of Awesome APIs and How to Build Them.
keavy
127
17k
The Illustrated Children's Guide to Kubernetes
chrisshort
51
51k
Become a Pro
speakerdeck
PRO
31
5.7k
Building an army of robots
kneath
306
46k
A Guide to Academic Writing Using Generative AI - A Workshop
ks91
PRO
0
160
Transcript
Me • Ward Bekker • TTY.nl • Product owner •
@wardbekker • https://github.com/wardbekker maandag 16 juli 12
Building a search engine in less than 15 minutes maandag
16 juli 12
Documents maandag 16 juli 12
Index maandag 16 juli 12
Querying maandag 16 juli 12
Result maandag 16 juli 12
maandag 16 juli 12
maandag 16 juli 12
Tuples: InvertedIndex: maandag 16 juli 12
Querying maandag 16 juli 12
maandag 16 juli 12
Elias ɣ Encoding • 1, 2, 3 • 00000000 00000001,
00000000 00000010, 00000000 00000011 • 6 bytes = 48 bits maandag 16 juli 12
Elias ɣ Encoding n => .. => length + 1
value 1 => 1 => 1 2 => 10 => 0 10 3 => 11 => 0 11 4 => 100 => 00 100 5 => 101 => 00 101 maandag 16 juli 12
Elias ɣ Encoding • 0000000000000001, 0000000000000010, 0000000000000011 • 48 bits
• 1 010 011 • 7 bits • compression ratio: 0.15 maandag 16 juli 12
Elias ɣ Encoding • erts_debug:size([1,2,3,4,5]) => • 10 words =
10 * 8 bytes = 640 bits • erlang:byte_size(binary:encode_unsigned(2#10100110010000101)). • 3 bytes = 24 bits • compression ratio: 0.04 maandag 16 juli 12
Delta Gap Compression • [3, 7, 8, 9, 12, 13,
14, 15, 16] • 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 • 3 - 1 - 3 - 3 - 2 - 5 • compression ratio: 0.67 maandag 16 juli 12
Combined • 64-bit Erlang • [3, 7, 8, 9, 12,
13, 14, 15, 16] • [3, 1, 3 , 3, 2, 5] • 011101101100101 • 1152 vs 16 bits • compression ratio: 0.014 maandag 16 juli 12
More information • https://github.com/wardbekker/compression/ • Modern Information Retrieval • http://www.mir2ed.org/
• @wardbekker maandag 16 juli 12