Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
How to reindex 1B documents in 1 hour?
Search
Qaiser Abbasi
December 13, 2018
Business
0
750
How to reindex 1B documents in 1 hour?
Talk given at
www.meetup.com/Elasticsearch-Berlin/
Qaiser Abbasi
December 13, 2018
Tweet
Share
More Decks by Qaiser Abbasi
See All by Qaiser Abbasi
Java User Group Frankfurt – CDI BeanTesting
qabbasi
0
63
Testing-Darwinismus
qabbasi
0
58
Other Decks in Business
See All in Business
ベタートラップと夏
mosa_siru
8
3.4k
Leading Mark新卒採用資料
unno
0
2.5k
ユウミ会社説明資料
yumi2023
0
1.1k
タケウチグループRecruit
takeuchigroup
0
6.9k
LW_brochure_engineer
lincwellhr
0
34k
アッテル会社紹介資料/culture deck
attelu
10
15k
VISASQ: ABOUT US
eikohashiba
15
500k
Sales Marker Culture book
salesmarker
PRO
36
58k
GMOフィナンシャルHD 会社紹介資料
gmofh_hr_team
0
50k
採用ピッチ資料|SBペイメントサービス株式会社
sbps
0
30k
【全ポジション共通】㈱エグゼクション/会社紹介資料
exe_recruit
1
1.3k
特別講義 理系のための法学入門
seko_shuhei
2
2.4k
Featured
See All Featured
Docker and Python
trallard
44
3.5k
Fireside Chat
paigeccino
37
3.5k
Design and Strategy: How to Deal with People Who Don’t "Get" Design
morganepeng
130
19k
Building Better People: How to give real-time feedback that sticks.
wjessup
367
19k
The Success of Rails: Ensuring Growth for the Next 100 Years
eileencodes
45
7.5k
Large-scale JavaScript Application Architecture
addyosmani
512
110k
Visualizing Your Data: Incorporating Mongo into Loggly Infrastructure
mongodb
46
9.6k
Product Roadmaps are Hard
iamctodd
PRO
54
11k
Music & Morning Musume
bryan
46
6.6k
Code Review Best Practice
trishagee
69
19k
Performance Is Good for Brains [We Love Speed 2024]
tammyeverts
10
970
Six Lessons from altMBA
skipperchong
28
3.9k
Transcript
Rene Treffer, Qaiser Abbasi How to reindex 1B documents in
1 hour?
Search @ SoundCloud
Powered by ElasticSearch
Typical search document
Clusters of 30 nodes
Clusters of 30 nodes data size * replication = 120%
* total memory
Cluster 2 Cluster 3 Cluster 1 Cluster 2 Cluster 3
Cluster 1 Multiple clusters per use-case
Problems?
Lead time of features and bugfixes Problems?
Indexing
Indexing 1. Extract
Indexing 1. Extract 2. Build ES documents
Indexing 1. Extract 2. Build ES documents 3. Load into
ES
Indexing 1. Extract 2. Build ES documents 3. Load into
ES 0. Live updates
Indexing 1. Extract 2. Build ES documents 3. Load into
ES 0. Live updates Kafka Kafka +
Kafka historic current compaction Cluster 1 Cluster 2 shipper 1
shipper 2 indexer
Kafka for ES documents 1. Enable compaction 2. Use fast
compression 3. Use enough partitions 4. Use SSDs + 10GBit
ES cluster lifecycle Reindex Live Maintenance
Reindex settings 1. Shards 2. Replication settings 3. Async Translog
4. Refresh Interval
Finish reindexing 1. Merge into one segment*** 2. Set #
replicas
Throughput ≈ 600K OP/s ≈ 30 Mins
4X faster for 95% ≈ 40ms for 50%
4X Reindexing in 1 Sprint
Summary • Solved initial problem • Enablement in daily life
Future work
Q & A
Sounds interesting? Come and talk to us!
THANK YOU
[email protected]
[email protected]