Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
How to reindex 1B documents in 1 hour?
Search
Qaiser Abbasi
December 13, 2018
Business
0
770
How to reindex 1B documents in 1 hour?
Talk given at
www.meetup.com/Elasticsearch-Berlin/
Qaiser Abbasi
December 13, 2018
Tweet
Share
More Decks by Qaiser Abbasi
See All by Qaiser Abbasi
Java User Group Frankfurt – CDI BeanTesting
qabbasi
0
64
Testing-Darwinismus
qabbasi
0
61
Other Decks in Business
See All in Business
株式会社トリビュー|エンジニア向け会社説明資料
tribeau
0
6.7k
宣言やガイドを示したってよくならない!スクラムチームが回るようにするためにはきっかけが必要だ!
abe2014
0
140
FY2025.6 Impact Report EN
mercari_inc
0
7k
AIプロダクト時代のPdMに必要なコンテキストエンジニアリング
kayato
0
110
20251003-GENDA経営戦略チーム-Value-Upの全体像
geshi0820
0
1.6k
ソニックガーデン経営組織論(2025/10版)
kuranuki
1
2.2k
20251012_社内でのMCT活動
ponponmikankan
1
610
FABRIC TOKYO会社紹介資料 / We are hiring(2025年10月07日更新)
yuichirom
36
350k
malna-recruiting-pitch
malna
0
10k
No Company - Company Deck 2025
nocompany
1
700
We are Wunderbar, Culture Deck Full
wunderbar
0
1.4k
アッテル会社紹介資料/culture deck
attelu
10
15k
Featured
See All Featured
Unsuck your backbone
ammeep
671
58k
A Modern Web Designer's Workflow
chriscoyier
697
190k
Visualization
eitanlees
149
16k
Fantastic passwords and where to find them - at NoRuKo
philnash
52
3.4k
Templates, Plugins, & Blocks: Oh My! Creating the theme that thinks of everything
marktimemedia
31
2.5k
The Straight Up "How To Draw Better" Workshop
denniskardys
238
140k
CoffeeScript is Beautiful & I Never Want to Write Plain JavaScript Again
sstephenson
162
15k
The Illustrated Children's Guide to Kubernetes
chrisshort
49
51k
Rebuilding a faster, lazier Slack
samanthasiow
84
9.2k
Dealing with People You Can't Stand - Big Design 2015
cassininazir
367
27k
"I'm Feeling Lucky" - Building Great Search Experiences for Today's Users (#IAC19)
danielanewman
230
22k
Java REST API Framework Comparison - PWX 2021
mraible
34
8.9k
Transcript
Rene Treffer, Qaiser Abbasi How to reindex 1B documents in
1 hour?
Search @ SoundCloud
Powered by ElasticSearch
Typical search document
Clusters of 30 nodes
Clusters of 30 nodes data size * replication = 120%
* total memory
Cluster 2 Cluster 3 Cluster 1 Cluster 2 Cluster 3
Cluster 1 Multiple clusters per use-case
Problems?
Lead time of features and bugfixes Problems?
Indexing
Indexing 1. Extract
Indexing 1. Extract 2. Build ES documents
Indexing 1. Extract 2. Build ES documents 3. Load into
ES
Indexing 1. Extract 2. Build ES documents 3. Load into
ES 0. Live updates
Indexing 1. Extract 2. Build ES documents 3. Load into
ES 0. Live updates Kafka Kafka +
Kafka historic current compaction Cluster 1 Cluster 2 shipper 1
shipper 2 indexer
Kafka for ES documents 1. Enable compaction 2. Use fast
compression 3. Use enough partitions 4. Use SSDs + 10GBit
ES cluster lifecycle Reindex Live Maintenance
Reindex settings 1. Shards 2. Replication settings 3. Async Translog
4. Refresh Interval
Finish reindexing 1. Merge into one segment*** 2. Set #
replicas
Throughput ≈ 600K OP/s ≈ 30 Mins
4X faster for 95% ≈ 40ms for 50%
4X Reindexing in 1 Sprint
Summary • Solved initial problem • Enablement in daily life
Future work
Q & A
Sounds interesting? Come and talk to us!
THANK YOU
[email protected]
[email protected]