Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
How to reindex 1B documents in 1 hour?
Search
Qaiser Abbasi
December 13, 2018
Business
0
770
How to reindex 1B documents in 1 hour?
Talk given at
www.meetup.com/Elasticsearch-Berlin/
Qaiser Abbasi
December 13, 2018
Tweet
Share
More Decks by Qaiser Abbasi
See All by Qaiser Abbasi
Java User Group Frankfurt – CDI BeanTesting
qabbasi
0
65
Testing-Darwinismus
qabbasi
0
62
Other Decks in Business
See All in Business
【SRE Kaigi 2026】認知負荷を最小化するオブザーバビリティとSLOの導入 ―4名SREが200名のコードエンジニアを支援
higuchi_takashi
2
1.3k
株式会社CINC 会社案内/Company introduction
cinchr
6
74k
malna-recruiting-pitch
malna
0
14k
LW_brochure_engineer
lincwellhr
0
40k
Eco-Pork Impact Report 2026.02.09 EN
ecopork
0
280
40代データ人材のキャリア戦略
pacocat
4
4k
Startup Research : Challenges and solutions for female startup founders in Japan
mpower_partners
PRO
0
290
アットウェア 会社説明資料
atware
0
14k
MEEM_Company_Deck202512.pdf
info_meem
0
3.9k
【Progmat】Monthly-ST-Market-Report-2026-Jan.
progmat
0
340
株式会社Oxxx Culture Deck
oxxxinc
0
690
VISASQ: ABOUT DEV TEAM
eikohashiba
6
41k
Featured
See All Featured
Primal Persuasion: How to Engage the Brain for Learning That Lasts
tmiket
0
260
[RailsConf 2023] Rails as a piece of cake
palkan
59
6.3k
How to Ace a Technical Interview
jacobian
281
24k
Winning Ecommerce Organic Search in an AI Era - #searchnstuff2025
aleyda
1
1.9k
Navigating Team Friction
lara
192
16k
Technical Leadership for Architectural Decision Making
baasie
2
250
We Analyzed 250 Million AI Search Results: Here's What I Found
joshbly
1
750
Imperfection Machines: The Place of Print at Facebook
scottboms
269
14k
Taking LLMs out of the black box: A practical guide to human-in-the-loop distillation
inesmontani
PRO
3
2k
SEO Brein meetup: CTRL+C is not how to scale international SEO
lindahogenes
0
2.4k
DBのスキルで生き残る技術 - AI時代におけるテーブル設計の勘所
soudai
PRO
62
50k
Fireside Chat
paigeccino
41
3.8k
Transcript
Rene Treffer, Qaiser Abbasi How to reindex 1B documents in
1 hour?
Search @ SoundCloud
Powered by ElasticSearch
Typical search document
Clusters of 30 nodes
Clusters of 30 nodes data size * replication = 120%
* total memory
Cluster 2 Cluster 3 Cluster 1 Cluster 2 Cluster 3
Cluster 1 Multiple clusters per use-case
Problems?
Lead time of features and bugfixes Problems?
Indexing
Indexing 1. Extract
Indexing 1. Extract 2. Build ES documents
Indexing 1. Extract 2. Build ES documents 3. Load into
ES
Indexing 1. Extract 2. Build ES documents 3. Load into
ES 0. Live updates
Indexing 1. Extract 2. Build ES documents 3. Load into
ES 0. Live updates Kafka Kafka +
Kafka historic current compaction Cluster 1 Cluster 2 shipper 1
shipper 2 indexer
Kafka for ES documents 1. Enable compaction 2. Use fast
compression 3. Use enough partitions 4. Use SSDs + 10GBit
ES cluster lifecycle Reindex Live Maintenance
Reindex settings 1. Shards 2. Replication settings 3. Async Translog
4. Refresh Interval
Finish reindexing 1. Merge into one segment*** 2. Set #
replicas
Throughput ≈ 600K OP/s ≈ 30 Mins
4X faster for 95% ≈ 40ms for 50%
4X Reindexing in 1 Sprint
Summary • Solved initial problem • Enablement in daily life
Future work
Q & A
Sounds interesting? Come and talk to us!
THANK YOU
[email protected]
[email protected]