Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
How to reindex 1B documents in 1 hour?
Search
Qaiser Abbasi
December 13, 2018
Business
0
760
How to reindex 1B documents in 1 hour?
Talk given at
www.meetup.com/Elasticsearch-Berlin/
Qaiser Abbasi
December 13, 2018
Tweet
Share
More Decks by Qaiser Abbasi
See All by Qaiser Abbasi
Java User Group Frankfurt – CDI BeanTesting
qabbasi
0
64
Testing-Darwinismus
qabbasi
0
60
Other Decks in Business
See All in Business
Cierpa&Co._Culture Deck_202509
cierpa0905
PRO
0
1.4k
なぜ人はすれ違うのか_製造業で当たり前に行っていた根回しから考える、事前の配慮で顧客やチームとの対話を促進する方法
katsuakihoribe8
1
3.3k
FintechShift_事業説明資料.pdf
finatext
1
110
コミューン株式会社_採用候補者様向け資料
commune
PRO
1
4.2k
Cloudbase Recruiting Deck / 採用資料
cloudbaseinc
0
280
Fracta Leap 会社紹介資料
fracta_leap
PRO
0
150
映像×AI×IoT:領域を越境するプロダクトマネージメント
maeshima
2
840
Steal This Stack: Automate Your Learning Campaigns
tmiket
0
130
株式会社アドバンテッジリスクマネジメント会社紹介資料
arm0020
0
54k
エンジニア職/新卒向け会社紹介資料(テックファーム株式会社)
techfirm
1
4.7k
【UI/UXデザイナー職】中途採用向け会社説明資料(テックファーム株式会社)
techfirm
0
330
Infcurion Company Deck
infcurion
2
30k
Featured
See All Featured
Java REST API Framework Comparison - PWX 2021
mraible
33
8.8k
RailsConf & Balkan Ruby 2019: The Past, Present, and Future of Rails at GitHub
eileencodes
139
34k
Embracing the Ebb and Flow
colly
87
4.8k
The Success of Rails: Ensuring Growth for the Next 100 Years
eileencodes
46
7.6k
The Psychology of Web Performance [Beyond Tellerrand 2023]
tammyeverts
49
3k
Intergalactic Javascript Robots from Outer Space
tanoku
272
27k
jQuery: Nuts, Bolts and Bling
dougneiner
64
7.9k
A better future with KSS
kneath
239
17k
The Cost Of JavaScript in 2023
addyosmani
53
8.9k
Understanding Cognitive Biases in Performance Measurement
bluesmoon
29
1.9k
Designing for Performance
lara
610
69k
The MySQL Ecosystem @ GitHub 2015
samlambert
251
13k
Transcript
Rene Treffer, Qaiser Abbasi How to reindex 1B documents in
1 hour?
Search @ SoundCloud
Powered by ElasticSearch
Typical search document
Clusters of 30 nodes
Clusters of 30 nodes data size * replication = 120%
* total memory
Cluster 2 Cluster 3 Cluster 1 Cluster 2 Cluster 3
Cluster 1 Multiple clusters per use-case
Problems?
Lead time of features and bugfixes Problems?
Indexing
Indexing 1. Extract
Indexing 1. Extract 2. Build ES documents
Indexing 1. Extract 2. Build ES documents 3. Load into
ES
Indexing 1. Extract 2. Build ES documents 3. Load into
ES 0. Live updates
Indexing 1. Extract 2. Build ES documents 3. Load into
ES 0. Live updates Kafka Kafka +
Kafka historic current compaction Cluster 1 Cluster 2 shipper 1
shipper 2 indexer
Kafka for ES documents 1. Enable compaction 2. Use fast
compression 3. Use enough partitions 4. Use SSDs + 10GBit
ES cluster lifecycle Reindex Live Maintenance
Reindex settings 1. Shards 2. Replication settings 3. Async Translog
4. Refresh Interval
Finish reindexing 1. Merge into one segment*** 2. Set #
replicas
Throughput ≈ 600K OP/s ≈ 30 Mins
4X faster for 95% ≈ 40ms for 50%
4X Reindexing in 1 Sprint
Summary • Solved initial problem • Enablement in daily life
Future work
Q & A
Sounds interesting? Come and talk to us!
THANK YOU
[email protected]
[email protected]