Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Building a search engine - in less than 15 minutes
Search
Ward Bekker
July 12, 2012
Technology
140
3
Share
Building a search engine - in less than 15 minutes
In this talk we dive into two popular compression algorithms in search engines.
Ward Bekker
July 12, 2012
More Decks by Ward Bekker
See All by Ward Bekker
Automated testing with Erlang - these go to eleven -
wardbekker
3
270
Erltricity
wardbekker
1
85
Other Decks in Technology
See All in Technology
Hooks, Filters & Now Context: Why MCPs Are the “Hooks” of the AI Era
miriamschwab
0
140
数案件を同時に進行するためのコンテキスト整理術
sutetotanuki
1
200
LLM とプロンプトエンジニアリング/チューターを定義する / LLMs and Prompt Engineering, and Defining Tutors
ks91
PRO
0
330
Strands Agents × Amazon Bedrock AgentCoreで パーソナルAIエージェントを作ろう
yokomachi
2
270
組織的なAI活用を阻む 最大のハードルは コンテキストデザインだった
ixbox
6
1.7k
ふりかえりを 「あそび」にしたら、 学習が勝手に進んだ / Playful Retros Drive Learning
katoaz
0
450
すごいぞManaged Kubernetes
harukasakihara
1
390
DevOpsDays Tokyo 2026 見えない開発現場を、見える投資に変える
rojoudotcom
2
170
Claude Teamプランの選定と、できること/できないこと
rfdnxbro
1
2.1k
Oracle Cloud Infrastructure(OCI):Onboarding Session(はじめてのOCI/Oracle Supportご利⽤ガイド)
oracle4engineer
PRO
2
17k
終盤で崩壊させないAI駆動開発
j5ik2o
0
500
"SQLは書けません"から始まる データドリブン
kubell_hr
1
270
Featured
See All Featured
Organizational Design Perspectives: An Ontology of Organizational Design Elements
kimpetersen
PRO
1
670
Building a A Zero-Code AI SEO Workflow
portentint
PRO
0
440
Learning to Love Humans: Emotional Interface Design
aarron
275
41k
Context Engineering - Making Every Token Count
addyosmani
9
810
SEO Brein meetup: CTRL+C is not how to scale international SEO
lindahogenes
1
2.5k
Chasing Engaging Ingredients in Design
codingconduct
0
160
How STYLIGHT went responsive
nonsquared
100
6k
So, you think you're a good person
axbom
PRO
2
2k
How to Think Like a Performance Engineer
csswizardry
28
2.5k
Navigating the moral maze — ethical principles for Al-driven product design
skipperchong
2
330
The Spectacular Lies of Maps
axbom
PRO
1
680
Sam Torres - BigQuery for SEOs
techseoconnect
PRO
0
240
Transcript
Me • Ward Bekker • TTY.nl • Product owner •
@wardbekker • https://github.com/wardbekker maandag 16 juli 12
Building a search engine in less than 15 minutes maandag
16 juli 12
Documents maandag 16 juli 12
Index maandag 16 juli 12
Querying maandag 16 juli 12
Result maandag 16 juli 12
maandag 16 juli 12
maandag 16 juli 12
Tuples: InvertedIndex: maandag 16 juli 12
Querying maandag 16 juli 12
maandag 16 juli 12
Elias ɣ Encoding • 1, 2, 3 • 00000000 00000001,
00000000 00000010, 00000000 00000011 • 6 bytes = 48 bits maandag 16 juli 12
Elias ɣ Encoding n => .. => length + 1
value 1 => 1 => 1 2 => 10 => 0 10 3 => 11 => 0 11 4 => 100 => 00 100 5 => 101 => 00 101 maandag 16 juli 12
Elias ɣ Encoding • 0000000000000001, 0000000000000010, 0000000000000011 • 48 bits
• 1 010 011 • 7 bits • compression ratio: 0.15 maandag 16 juli 12
Elias ɣ Encoding • erts_debug:size([1,2,3,4,5]) => • 10 words =
10 * 8 bytes = 640 bits • erlang:byte_size(binary:encode_unsigned(2#10100110010000101)). • 3 bytes = 24 bits • compression ratio: 0.04 maandag 16 juli 12
Delta Gap Compression • [3, 7, 8, 9, 12, 13,
14, 15, 16] • 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 • 3 - 1 - 3 - 3 - 2 - 5 • compression ratio: 0.67 maandag 16 juli 12
Combined • 64-bit Erlang • [3, 7, 8, 9, 12,
13, 14, 15, 16] • [3, 1, 3 , 3, 2, 5] • 011101101100101 • 1152 vs 16 bits • compression ratio: 0.014 maandag 16 juli 12
More information • https://github.com/wardbekker/compression/ • Modern Information Retrieval • http://www.mir2ed.org/
• @wardbekker maandag 16 juli 12