Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
How to reindex 1B documents in 1 hour?
Search
Qaiser Abbasi
December 13, 2018
Business
0
770
How to reindex 1B documents in 1 hour?
Talk given at
www.meetup.com/Elasticsearch-Berlin/
Qaiser Abbasi
December 13, 2018
Tweet
Share
More Decks by Qaiser Abbasi
See All by Qaiser Abbasi
Java User Group Frankfurt – CDI BeanTesting
qabbasi
0
65
Testing-Darwinismus
qabbasi
0
62
Other Decks in Business
See All in Business
スタートアップ調査:女性起業家を取り巻く課題と解決策
mpower_partners
PRO
0
600
【Progmat】Monthly-ST-Market-Report-2026-Jan.
progmat
0
340
LRM株式会社 - ピッチ資料2026
lrm
0
180
re:Invent2025 re:Cap 〜技術的負債解消と AWS Transform Customと わたし〜
maijun
0
140
株式会社EventHub 会社紹介資料
eventhub
1
44k
giftee_Company introduction Febrary 2026
recruit_giftee
1
610
Akatsuki AI Technologies Company Deck
akatsuki_ai_technologies
0
670
株式会社Gizumo_会社紹介資料(2026.1更新)
gizumo
0
650
enechain company deck
enechain
PRO
10
160k
about-oha
oha
0
20k
【新卒向け】株式会社リブに興味のある方へ
libinc
0
11k
2025 サステナビリティレポート
mpower_partners
PRO
1
110
Featured
See All Featured
Neural Spatial Audio Processing for Sound Field Analysis and Control
skoyamalab
0
170
The Success of Rails: Ensuring Growth for the Next 100 Years
eileencodes
47
7.9k
Building Flexible Design Systems
yeseniaperezcruz
330
40k
ピンチをチャンスに:未来をつくるプロダクトロードマップ #pmconf2020
aki_iinuma
128
55k
My Coaching Mixtape
mlcsv
0
49
The Pragmatic Product Professional
lauravandoore
37
7.1k
Avoiding the “Bad Training, Faster” Trap in the Age of AI
tmiket
0
79
YesSQL, Process and Tooling at Scale
rocio
174
15k
How To Speak Unicorn (iThemes Webinar)
marktimemedia
1
380
CSS Pre-Processors: Stylus, Less & Sass
bermonpainter
359
30k
Raft: Consensus for Rubyists
vanstee
141
7.3k
Why Your Marketing Sucks and What You Can Do About It - Sophie Logan
marketingsoph
0
77
Transcript
Rene Treffer, Qaiser Abbasi How to reindex 1B documents in
1 hour?
Search @ SoundCloud
Powered by ElasticSearch
Typical search document
Clusters of 30 nodes
Clusters of 30 nodes data size * replication = 120%
* total memory
Cluster 2 Cluster 3 Cluster 1 Cluster 2 Cluster 3
Cluster 1 Multiple clusters per use-case
Problems?
Lead time of features and bugfixes Problems?
Indexing
Indexing 1. Extract
Indexing 1. Extract 2. Build ES documents
Indexing 1. Extract 2. Build ES documents 3. Load into
ES
Indexing 1. Extract 2. Build ES documents 3. Load into
ES 0. Live updates
Indexing 1. Extract 2. Build ES documents 3. Load into
ES 0. Live updates Kafka Kafka +
Kafka historic current compaction Cluster 1 Cluster 2 shipper 1
shipper 2 indexer
Kafka for ES documents 1. Enable compaction 2. Use fast
compression 3. Use enough partitions 4. Use SSDs + 10GBit
ES cluster lifecycle Reindex Live Maintenance
Reindex settings 1. Shards 2. Replication settings 3. Async Translog
4. Refresh Interval
Finish reindexing 1. Merge into one segment*** 2. Set #
replicas
Throughput ≈ 600K OP/s ≈ 30 Mins
4X faster for 95% ≈ 40ms for 50%
4X Reindexing in 1 Sprint
Summary • Solved initial problem • Enablement in daily life
Future work
Q & A
Sounds interesting? Come and talk to us!
THANK YOU
[email protected]
[email protected]