Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
How to reindex 1B documents in 1 hour?
Search
Sponsored
·
Ship Features Fearlessly
Turn features on and off without deploys. Used by thousands of Ruby developers.
→
Qaiser Abbasi
December 13, 2018
Business
0
780
How to reindex 1B documents in 1 hour?
Talk given at
www.meetup.com/Elasticsearch-Berlin/
Qaiser Abbasi
December 13, 2018
Tweet
Share
More Decks by Qaiser Abbasi
See All by Qaiser Abbasi
Java User Group Frankfurt – CDI BeanTesting
qabbasi
0
66
Testing-Darwinismus
qabbasi
0
64
Other Decks in Business
See All in Business
採用ピッチ資料
kasamatsu123
0
220
「きっかけ作り」から始めるKiro定着の軌跡
iamme
0
200
DATUM STUDIO - 会社紹介資料
datumstudio
0
1.8k
2025年度ICT職専門研修(海外派遣研修)報告書 No.1
tokyo_metropolitan_gov_digital_hr
0
300
人々にとってかけがえのないプロダクトを作るには ~顧客の日常に紛れる "not not" を見つけろ!~ #pdmyy
bonotake
2
220
(41枚)目標管理の全スキル 目標の立て方・課題の設定の仕方・計画の立て方・仕組みの作り方・進捗管理のやり方等すべてを解説
nyattx
PRO
2
820
Ambientnavi Company Deck
ambientnavi0329
0
130
動機は不純、だがそれがいい
newrice
0
270
2025年度ICT職専門研修(海外派遣研修)報告書 No.4
tokyo_metropolitan_gov_digital_hr
0
310
透明性レポート(2025年下半期)
mercari_inc
0
5.3k
セーフィー株式会社(Safie Inc.) 会社紹介資料
safie_recruit
7
420k
特定領域から複数領域へ、そのとき何を求められるのか?縦と横、2つの影響力:統合型を目指す大規模な開発組織での実践
keitatomozawa
3
510
Featured
See All Featured
Reality Check: Gamification 10 Years Later
codingconduct
0
2.1k
Getting science done with accelerated Python computing platforms
jacobtomlinson
2
150
Avoiding the “Bad Training, Faster” Trap in the Age of AI
tmiket
0
110
Test your architecture with Archunit
thirion
1
2.2k
Into the Great Unknown - MozCon
thekraken
40
2.3k
The B2B funnel & how to create a winning content strategy
katarinadahlin
PRO
1
310
Un-Boring Meetings
codingconduct
0
240
Why You Should Never Use an ORM
jnunemaker
PRO
61
9.8k
The Art of Delivering Value - GDevCon NA Keynote
reverentgeek
16
1.9k
Visualizing Your Data: Incorporating Mongo into Loggly Infrastructure
mongodb
49
9.9k
[Rails World 2023 - Day 1 Closing Keynote] - The Magic of Rails
eileencodes
38
2.8k
Sharpening the Axe: The Primacy of Toolmaking
bcantrill
46
2.7k
Transcript
Rene Treffer, Qaiser Abbasi How to reindex 1B documents in
1 hour?
Search @ SoundCloud
Powered by ElasticSearch
Typical search document
Clusters of 30 nodes
Clusters of 30 nodes data size * replication = 120%
* total memory
Cluster 2 Cluster 3 Cluster 1 Cluster 2 Cluster 3
Cluster 1 Multiple clusters per use-case
Problems?
Lead time of features and bugfixes Problems?
Indexing
Indexing 1. Extract
Indexing 1. Extract 2. Build ES documents
Indexing 1. Extract 2. Build ES documents 3. Load into
ES
Indexing 1. Extract 2. Build ES documents 3. Load into
ES 0. Live updates
Indexing 1. Extract 2. Build ES documents 3. Load into
ES 0. Live updates Kafka Kafka +
Kafka historic current compaction Cluster 1 Cluster 2 shipper 1
shipper 2 indexer
Kafka for ES documents 1. Enable compaction 2. Use fast
compression 3. Use enough partitions 4. Use SSDs + 10GBit
ES cluster lifecycle Reindex Live Maintenance
Reindex settings 1. Shards 2. Replication settings 3. Async Translog
4. Refresh Interval
Finish reindexing 1. Merge into one segment*** 2. Set #
replicas
Throughput ≈ 600K OP/s ≈ 30 Mins
4X faster for 95% ≈ 40ms for 50%
4X Reindexing in 1 Sprint
Summary • Solved initial problem • Enablement in daily life
Future work
Q & A
Sounds interesting? Come and talk to us!
THANK YOU
[email protected]
[email protected]