Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
How to reindex 1B documents in 1 hour?
Search
Qaiser Abbasi
December 13, 2018
Business
0
780
How to reindex 1B documents in 1 hour?
Talk given at
www.meetup.com/Elasticsearch-Berlin/
Qaiser Abbasi
December 13, 2018
Tweet
Share
More Decks by Qaiser Abbasi
See All by Qaiser Abbasi
Java User Group Frankfurt – CDI BeanTesting
qabbasi
0
66
Testing-Darwinismus
qabbasi
0
63
Other Decks in Business
See All in Business
Mercari-Fact-book_en
mercari_inc
2
33k
イークラウド会社紹介
ecrowdinc
0
220
インキュデータ会社紹介資料
okitsu
3
51k
株式会社SAFELY 会社紹介 / Company
safely_pr
1
5.8k
株式会社IDOM_FACT BOOK 2026
idompr
0
240
enechain company deck_english
enechain
PRO
0
290
【スライド150枚】優秀層獲得のための新卒採用マニュアル
yuto_hakamada
0
200
「自我を出さなかった」私がアジャイルに出会って─冷笑を捨て、自分の人生を「経験主義」で動かした話
kaedeyamazaki0820
1
410
イオンモール新利府・デジタル証券 ~仙台近郊~徹底解説セミナー
c0rp_mdm
PRO
0
1.9k
about-oha
oha
0
20k
ARグラスが当たり前になったら、デザインってどう変わる?Spectrum Tokyo Festival 2026 #spectrumfest26
arisan
0
140
透明性レポート(2025年下半期)
mercari_inc
0
590
Featured
See All Featured
Leveraging LLMs for student feedback in introductory data science courses - posit::conf(2025)
minecr
1
190
So, you think you're a good person
axbom
PRO
2
1.9k
Learning to Love Humans: Emotional Interface Design
aarron
275
41k
Site-Speed That Sticks
csswizardry
13
1.1k
How to train your dragon (web standard)
notwaldorf
97
6.5k
AI Search: Where Are We & What Can We Do About It?
aleyda
0
7.1k
HDC tutorial
michielstock
1
500
Making the Leap to Tech Lead
cromwellryan
135
9.8k
Bridging the Design Gap: How Collaborative Modelling removes blockers to flow between stakeholders and teams @FastFlow conf
baasie
0
470
Applied NLP in the Age of Generative AI
inesmontani
PRO
4
2.1k
AI Search: Implications for SEO and How to Move Forward - #ShenzhenSEOConference
aleyda
1
1.1k
It's Worth the Effort
3n
188
29k
Transcript
Rene Treffer, Qaiser Abbasi How to reindex 1B documents in
1 hour?
Search @ SoundCloud
Powered by ElasticSearch
Typical search document
Clusters of 30 nodes
Clusters of 30 nodes data size * replication = 120%
* total memory
Cluster 2 Cluster 3 Cluster 1 Cluster 2 Cluster 3
Cluster 1 Multiple clusters per use-case
Problems?
Lead time of features and bugfixes Problems?
Indexing
Indexing 1. Extract
Indexing 1. Extract 2. Build ES documents
Indexing 1. Extract 2. Build ES documents 3. Load into
ES
Indexing 1. Extract 2. Build ES documents 3. Load into
ES 0. Live updates
Indexing 1. Extract 2. Build ES documents 3. Load into
ES 0. Live updates Kafka Kafka +
Kafka historic current compaction Cluster 1 Cluster 2 shipper 1
shipper 2 indexer
Kafka for ES documents 1. Enable compaction 2. Use fast
compression 3. Use enough partitions 4. Use SSDs + 10GBit
ES cluster lifecycle Reindex Live Maintenance
Reindex settings 1. Shards 2. Replication settings 3. Async Translog
4. Refresh Interval
Finish reindexing 1. Merge into one segment*** 2. Set #
replicas
Throughput ≈ 600K OP/s ≈ 30 Mins
4X faster for 95% ≈ 40ms for 50%
4X Reindexing in 1 Sprint
Summary • Solved initial problem • Enablement in daily life
Future work
Q & A
Sounds interesting? Come and talk to us!
THANK YOU
[email protected]
[email protected]