Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
How to reindex 1B documents in 1 hour?
Search
Qaiser Abbasi
December 13, 2018
Business
0
740
How to reindex 1B documents in 1 hour?
Talk given at
www.meetup.com/Elasticsearch-Berlin/
Qaiser Abbasi
December 13, 2018
Tweet
Share
More Decks by Qaiser Abbasi
See All by Qaiser Abbasi
Java User Group Frankfurt – CDI BeanTesting
qabbasi
0
58
Testing-Darwinismus
qabbasi
0
55
Other Decks in Business
See All in Business
immedio Company Deck
hamada_immedio
0
320
株式会社Acompany - カルチャーデッキ
acompany
PRO
0
370
ドコドア_採用ピッチ資料_20250318
docodoor_hr
3
10k
FUSION_company deck
fusioninc
0
420
(7枚)STAR法を活用した事例の書き方
nyattx
PRO
2
200
REVISIO採用資料
revisio_hr
0
190
タケウチグループRecruit
takeuchigroup
0
5.4k
Nstock 採用資料 / We are hiring
nstock
27
290k
採用ピッチ(2025年4月2日更新)
canvas_recruit
1
960
そのドキュメント、ちゃんと息してる? ~ 使われ続ける“生きた”ドキュメントの育て方 ~
natty_natty254
4
2.6k
fulltan_lt.pdf
icoriha
2
170
セイホ工業株式会社
seiho
1
140
Featured
See All Featured
The MySQL Ecosystem @ GitHub 2015
samlambert
251
12k
VelocityConf: Rendering Performance Case Studies
addyosmani
328
24k
4 Signs Your Business is Dying
shpigford
183
22k
Design and Strategy: How to Deal with People Who Don’t "Get" Design
morganepeng
129
19k
Imperfection Machines: The Place of Print at Facebook
scottboms
267
13k
Facilitating Awesome Meetings
lara
53
6.3k
Speed Design
sergeychernyshev
28
870
Side Projects
sachag
452
42k
Why Our Code Smells
bkeepers
PRO
336
57k
Agile that works and the tools we love
rasmusluckow
328
21k
Typedesign – Prime Four
hannesfritz
41
2.6k
The Success of Rails: Ensuring Growth for the Next 100 Years
eileencodes
44
7.1k
Transcript
Rene Treffer, Qaiser Abbasi How to reindex 1B documents in
1 hour?
Search @ SoundCloud
Powered by ElasticSearch
Typical search document
Clusters of 30 nodes
Clusters of 30 nodes data size * replication = 120%
* total memory
Cluster 2 Cluster 3 Cluster 1 Cluster 2 Cluster 3
Cluster 1 Multiple clusters per use-case
Problems?
Lead time of features and bugfixes Problems?
Indexing
Indexing 1. Extract
Indexing 1. Extract 2. Build ES documents
Indexing 1. Extract 2. Build ES documents 3. Load into
ES
Indexing 1. Extract 2. Build ES documents 3. Load into
ES 0. Live updates
Indexing 1. Extract 2. Build ES documents 3. Load into
ES 0. Live updates Kafka Kafka +
Kafka historic current compaction Cluster 1 Cluster 2 shipper 1
shipper 2 indexer
Kafka for ES documents 1. Enable compaction 2. Use fast
compression 3. Use enough partitions 4. Use SSDs + 10GBit
ES cluster lifecycle Reindex Live Maintenance
Reindex settings 1. Shards 2. Replication settings 3. Async Translog
4. Refresh Interval
Finish reindexing 1. Merge into one segment*** 2. Set #
replicas
Throughput ≈ 600K OP/s ≈ 30 Mins
4X faster for 95% ≈ 40ms for 50%
4X Reindexing in 1 Sprint
Summary • Solved initial problem • Enablement in daily life
Future work
Q & A
Sounds interesting? Come and talk to us!
THANK YOU rene.treffer@soundcloud.com qaiser.abbasi@soundcloud.com