Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
How to reindex 1B documents in 1 hour?
Search
Qaiser Abbasi
December 13, 2018
Business
0
760
How to reindex 1B documents in 1 hour?
Talk given at
www.meetup.com/Elasticsearch-Berlin/
Qaiser Abbasi
December 13, 2018
Tweet
Share
More Decks by Qaiser Abbasi
See All by Qaiser Abbasi
Java User Group Frankfurt – CDI BeanTesting
qabbasi
0
64
Testing-Darwinismus
qabbasi
0
60
Other Decks in Business
See All in Business
KENCOPA Company Dec
kurodaaaaaa
0
160
「なんとなく使いにくい」を論理的に説明する方法 〜プロダクトエンジニアとしてUXを議論できる第一歩〜
mkitahara01985
0
330
数字で見る松岡会計事務所
wf714201
0
370
Sales Marker Culture Book(English)
salesmarker
PRO
2
6k
merpay-Overview
mercari_inc
8
180k
仕事の傍ら4歳児のパパでも、アプリをリリースできた理由
kent_code3
1
220
enechain company deck
enechain
PRO
9
130k
エンジニア職/新卒向け会社紹介資料(テックファーム株式会社)
techfirm
1
4.6k
アークエル株式会社 会社説明資料
aakel
1
6k
キャッチアップ会社紹介
catchup
2
57k
Sapeet Recruiting materials
sapeet
0
5.2k
Agentic AI による新時代の IBP (Intelligent Business Planning)
mickey_kubo
1
130
Featured
See All Featured
Understanding Cognitive Biases in Performance Measurement
bluesmoon
29
1.8k
ピンチをチャンスに:未来をつくるプロダクトロードマップ #pmconf2020
aki_iinuma
126
53k
How to train your dragon (web standard)
notwaldorf
96
6.2k
The Web Performance Landscape in 2024 [PerfNow 2024]
tammyeverts
9
770
RailsConf & Balkan Ruby 2019: The Past, Present, and Future of Rails at GitHub
eileencodes
139
34k
BBQ
matthewcrist
89
9.8k
How To Stay Up To Date on Web Technology
chriscoyier
790
250k
Responsive Adventures: Dirty Tricks From The Dark Corners of Front-End
smashingmag
251
21k
Rebuilding a faster, lazier Slack
samanthasiow
83
9.1k
Git: the NoSQL Database
bkeepers
PRO
431
65k
How to Ace a Technical Interview
jacobian
279
23k
The Language of Interfaces
destraynor
160
25k
Transcript
Rene Treffer, Qaiser Abbasi How to reindex 1B documents in
1 hour?
Search @ SoundCloud
Powered by ElasticSearch
Typical search document
Clusters of 30 nodes
Clusters of 30 nodes data size * replication = 120%
* total memory
Cluster 2 Cluster 3 Cluster 1 Cluster 2 Cluster 3
Cluster 1 Multiple clusters per use-case
Problems?
Lead time of features and bugfixes Problems?
Indexing
Indexing 1. Extract
Indexing 1. Extract 2. Build ES documents
Indexing 1. Extract 2. Build ES documents 3. Load into
ES
Indexing 1. Extract 2. Build ES documents 3. Load into
ES 0. Live updates
Indexing 1. Extract 2. Build ES documents 3. Load into
ES 0. Live updates Kafka Kafka +
Kafka historic current compaction Cluster 1 Cluster 2 shipper 1
shipper 2 indexer
Kafka for ES documents 1. Enable compaction 2. Use fast
compression 3. Use enough partitions 4. Use SSDs + 10GBit
ES cluster lifecycle Reindex Live Maintenance
Reindex settings 1. Shards 2. Replication settings 3. Async Translog
4. Refresh Interval
Finish reindexing 1. Merge into one segment*** 2. Set #
replicas
Throughput ≈ 600K OP/s ≈ 30 Mins
4X faster for 95% ≈ 40ms for 50%
4X Reindexing in 1 Sprint
Summary • Solved initial problem • Enablement in daily life
Future work
Q & A
Sounds interesting? Come and talk to us!
THANK YOU
[email protected]
[email protected]