Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
How to reindex 1B documents in 1 hour?
Search
Qaiser Abbasi
December 13, 2018
Business
780
0
Share
How to reindex 1B documents in 1 hour?
Talk given at
www.meetup.com/Elasticsearch-Berlin/
Qaiser Abbasi
December 13, 2018
More Decks by Qaiser Abbasi
See All by Qaiser Abbasi
Java User Group Frankfurt – CDI BeanTesting
qabbasi
0
66
Testing-Darwinismus
qabbasi
0
66
Other Decks in Business
See All in Business
Goods-N採用広報資料
goodsn
PRO
0
170
会社紹介資料
gatechnologies
2
160k
Project Facilitation
hiranabe
1
250
受託開発からtoCプロダクトへ 〜変わったこと・変わらないこと〜 #事業を動かすエンジニア
layerx
PRO
2
430
SimpleForm 会社紹介資料
simpleform
2
51k
株式会社SAFELY 会社紹介 / Company
safely_pr
1
6.2k
LEVELING UP OR LEVELING DOWN? THE IMPACT OF GENERATIVE AI ON STUDENT PERFORMANCE IN BUSINESS SCHOOLS
icopilots
PRO
0
150
気がついたら自分がボトルネックになってた -1人でプロダクトをみることになった編-
koinunopochi
0
310
TECTURE 採用資料 / We are hiring
tecture
1
6.5k
AIを"組織の武器"にする方法
tamoryo
0
140
jinjer recruiting pitch
jinjer_official
0
200k
CMMI教育サービスのご案内
tomokb
0
13k
Featured
See All Featured
YesSQL, Process and Tooling at Scale
rocio
174
15k
StorybookのUI Testing Handbookを読んだ
zakiyama
31
6.6k
Abbi's Birthday
coloredviolet
2
6.1k
CoffeeScript is Beautiful & I Never Want to Write Plain JavaScript Again
sstephenson
162
16k
GitHub's CSS Performance
jonrohan
1032
470k
Visualization
eitanlees
150
17k
Cheating the UX When There Is Nothing More to Optimize - PixelPioneers
stephaniewalter
287
14k
Data-driven link building: lessons from a $708K investment (BrightonSEO talk)
szymonslowik
1
990
The Success of Rails: Ensuring Growth for the Next 100 Years
eileencodes
47
8k
Darren the Foodie - Storyboard
khoart
PRO
3
3.1k
Design of three-dimensional binary manipulators for pick-and-place task avoiding obstacles (IECON2024)
konakalab
0
390
The MySQL Ecosystem @ GitHub 2015
samlambert
251
13k
Transcript
Rene Treffer, Qaiser Abbasi How to reindex 1B documents in
1 hour?
Search @ SoundCloud
Powered by ElasticSearch
Typical search document
Clusters of 30 nodes
Clusters of 30 nodes data size * replication = 120%
* total memory
Cluster 2 Cluster 3 Cluster 1 Cluster 2 Cluster 3
Cluster 1 Multiple clusters per use-case
Problems?
Lead time of features and bugfixes Problems?
Indexing
Indexing 1. Extract
Indexing 1. Extract 2. Build ES documents
Indexing 1. Extract 2. Build ES documents 3. Load into
ES
Indexing 1. Extract 2. Build ES documents 3. Load into
ES 0. Live updates
Indexing 1. Extract 2. Build ES documents 3. Load into
ES 0. Live updates Kafka Kafka +
Kafka historic current compaction Cluster 1 Cluster 2 shipper 1
shipper 2 indexer
Kafka for ES documents 1. Enable compaction 2. Use fast
compression 3. Use enough partitions 4. Use SSDs + 10GBit
ES cluster lifecycle Reindex Live Maintenance
Reindex settings 1. Shards 2. Replication settings 3. Async Translog
4. Refresh Interval
Finish reindexing 1. Merge into one segment*** 2. Set #
replicas
Throughput ≈ 600K OP/s ≈ 30 Mins
4X faster for 95% ≈ 40ms for 50%
4X Reindexing in 1 Sprint
Summary • Solved initial problem • Enablement in daily life
Future work
Q & A
Sounds interesting? Come and talk to us!
THANK YOU
[email protected]
[email protected]