Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
How to reindex 1B documents in 1 hour?
Search
Qaiser Abbasi
December 13, 2018
Business
0
750
How to reindex 1B documents in 1 hour?
Talk given at
www.meetup.com/Elasticsearch-Berlin/
Qaiser Abbasi
December 13, 2018
Tweet
Share
More Decks by Qaiser Abbasi
See All by Qaiser Abbasi
Java User Group Frankfurt – CDI BeanTesting
qabbasi
0
62
Testing-Darwinismus
qabbasi
0
58
Other Decks in Business
See All in Business
消防設備について:2720 JAPAN O.K. ロータリーEクラブ ・(有)タナカ消防設備 専務取締役 田中 省吾 会員
2720japanoke
0
720
tokyo_dbt_meetup_#14_意志ある羅針盤たれ<データサイド>
t_yamaguchi
2
340
採用説明資料
recruit_mitsukaru
0
140
アジャイル開発組織における KA法実践の意義
hynym
PRO
0
110
ZEIN株式会社 会社説明資料
zein
0
590
アシスト 会社紹介資料
ashisuto_career
3
120k
株式会社 Laboro.AI 会社紹介資料
laboroai2016
0
450
フルカイテン株式会社 採用資料
fullkaiten
0
65k
BALLAS 事業紹介資料
ballas_inc
0
14k
M3 Career Culture Deck(セールス&コンサルティング職)
m3c
1
280k
アディクシィ株式会社 会社資料
adixi
0
400
アッテル会社紹介資料/culture deck
attelu
10
14k
Featured
See All Featured
It's Worth the Effort
3n
184
28k
Dealing with People You Can't Stand - Big Design 2015
cassininazir
367
26k
A better future with KSS
kneath
239
17k
Making Projects Easy
brettharned
116
6.2k
Art, The Web, and Tiny UX
lynnandtonic
299
21k
Scaling GitHub
holman
459
140k
Become a Pro
speakerdeck
PRO
28
5.4k
Helping Users Find Their Own Way: Creating Modern Search Experiences
danielanewman
29
2.7k
The World Runs on Bad Software
bkeepers
PRO
68
11k
[Rails World 2023 - Day 1 Closing Keynote] - The Magic of Rails
eileencodes
35
2.3k
Why You Should Never Use an ORM
jnunemaker
PRO
56
9.4k
Evolution of real-time – Irina Nazarova, EuRuKo, 2024
irinanazarova
8
780
Transcript
Rene Treffer, Qaiser Abbasi How to reindex 1B documents in
1 hour?
Search @ SoundCloud
Powered by ElasticSearch
Typical search document
Clusters of 30 nodes
Clusters of 30 nodes data size * replication = 120%
* total memory
Cluster 2 Cluster 3 Cluster 1 Cluster 2 Cluster 3
Cluster 1 Multiple clusters per use-case
Problems?
Lead time of features and bugfixes Problems?
Indexing
Indexing 1. Extract
Indexing 1. Extract 2. Build ES documents
Indexing 1. Extract 2. Build ES documents 3. Load into
ES
Indexing 1. Extract 2. Build ES documents 3. Load into
ES 0. Live updates
Indexing 1. Extract 2. Build ES documents 3. Load into
ES 0. Live updates Kafka Kafka +
Kafka historic current compaction Cluster 1 Cluster 2 shipper 1
shipper 2 indexer
Kafka for ES documents 1. Enable compaction 2. Use fast
compression 3. Use enough partitions 4. Use SSDs + 10GBit
ES cluster lifecycle Reindex Live Maintenance
Reindex settings 1. Shards 2. Replication settings 3. Async Translog
4. Refresh Interval
Finish reindexing 1. Merge into one segment*** 2. Set #
replicas
Throughput ≈ 600K OP/s ≈ 30 Mins
4X faster for 95% ≈ 40ms for 50%
4X Reindexing in 1 Sprint
Summary • Solved initial problem • Enablement in daily life
Future work
Q & A
Sounds interesting? Come and talk to us!
THANK YOU
[email protected]
[email protected]