Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
How to reindex 1B documents in 1 hour?
Search
Qaiser Abbasi
December 13, 2018
Business
0
770
How to reindex 1B documents in 1 hour?
Talk given at
www.meetup.com/Elasticsearch-Berlin/
Qaiser Abbasi
December 13, 2018
Tweet
Share
More Decks by Qaiser Abbasi
See All by Qaiser Abbasi
Java User Group Frankfurt – CDI BeanTesting
qabbasi
0
64
Testing-Darwinismus
qabbasi
0
61
Other Decks in Business
See All in Business
生成 AI 時代に職業はどのように変わり、どう対処すべきか ?
icoxfog417
PRO
4
1.5k
Morght 会社紹介資料_LAST UPDATED 2025.10
morght
1
5.4k
株式会社Digeon / 会社紹介資料
digeon
1
15k
【リクロマ株式会社】20251026_会社紹介資料
takahiro4545
0
130
採用ピッチ資料
mimirin
0
160
test_taiju
tami134
0
210
We are Wunderbar, Culture Deck Full
wunderbar
0
1.4k
他人が怖くて話せない私が、過去の寄り道に救われた『会話へのハードルを”割引”する方法』
aokiplayer
PRO
0
200
2025年 コミュニティ×ビジネスのリアル_Mitz
comucal
PRO
0
170
Guiding teams, and shaping a portfolio, using Wardley Maps & DDD at KanDDDinsky
marijn
0
110
ファクトリークリエイティブキャンプ 製造業におけるクラウド・生成AI・IoT活用の現在地と未来展望
satoshi7
1
310
スクラム再始動 〜場づくりで透明性を促進し、『形骸化』から脱出しよう!〜
nato
0
160
Featured
See All Featured
StorybookのUI Testing Handbookを読んだ
zakiyama
31
6.3k
Large-scale JavaScript Application Architecture
addyosmani
514
110k
Practical Tips for Bootstrapping Information Extraction Pipelines
honnibal
PRO
23
1.5k
Fashionably flexible responsive web design (full day workshop)
malarkey
407
66k
Building Flexible Design Systems
yeseniaperezcruz
329
39k
Agile that works and the tools we love
rasmusluckow
331
21k
YesSQL, Process and Tooling at Scale
rocio
174
15k
Sharpening the Axe: The Primacy of Toolmaking
bcantrill
46
2.5k
Producing Creativity
orderedlist
PRO
348
40k
Visualization
eitanlees
150
16k
Responsive Adventures: Dirty Tricks From The Dark Corners of Front-End
smashingmag
253
22k
Put a Button on it: Removing Barriers to Going Fast.
kastner
60
4k
Transcript
Rene Treffer, Qaiser Abbasi How to reindex 1B documents in
1 hour?
Search @ SoundCloud
Powered by ElasticSearch
Typical search document
Clusters of 30 nodes
Clusters of 30 nodes data size * replication = 120%
* total memory
Cluster 2 Cluster 3 Cluster 1 Cluster 2 Cluster 3
Cluster 1 Multiple clusters per use-case
Problems?
Lead time of features and bugfixes Problems?
Indexing
Indexing 1. Extract
Indexing 1. Extract 2. Build ES documents
Indexing 1. Extract 2. Build ES documents 3. Load into
ES
Indexing 1. Extract 2. Build ES documents 3. Load into
ES 0. Live updates
Indexing 1. Extract 2. Build ES documents 3. Load into
ES 0. Live updates Kafka Kafka +
Kafka historic current compaction Cluster 1 Cluster 2 shipper 1
shipper 2 indexer
Kafka for ES documents 1. Enable compaction 2. Use fast
compression 3. Use enough partitions 4. Use SSDs + 10GBit
ES cluster lifecycle Reindex Live Maintenance
Reindex settings 1. Shards 2. Replication settings 3. Async Translog
4. Refresh Interval
Finish reindexing 1. Merge into one segment*** 2. Set #
replicas
Throughput ≈ 600K OP/s ≈ 30 Mins
4X faster for 95% ≈ 40ms for 50%
4X Reindexing in 1 Sprint
Summary • Solved initial problem • Enablement in daily life
Future work
Q & A
Sounds interesting? Come and talk to us!
THANK YOU
[email protected]
[email protected]