Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
How to reindex 1B documents in 1 hour?
Search
Qaiser Abbasi
December 13, 2018
Business
0
720
How to reindex 1B documents in 1 hour?
Talk given at
www.meetup.com/Elasticsearch-Berlin/
Qaiser Abbasi
December 13, 2018
Tweet
Share
More Decks by Qaiser Abbasi
See All by Qaiser Abbasi
Java User Group Frankfurt – CDI BeanTesting
qabbasi
0
56
Testing-Darwinismus
qabbasi
0
48
Other Decks in Business
See All in Business
バイセルのものさし(Ver. 1.1)
buyselltechnologies
0
190
採用資料
daichihayashi
0
260
ファブリカホールディングス_2025年3月期 第2四半期説明資料
fabrica_com
0
2.7k
HERBEST_about service
beat
0
630
AIを活用した住家被害認定支援ツールの開発
tokyo_metropolitan_gov_digital_hr
0
360
受託開発のアジャイル奮闘記
mifujita
1
10k
エンジニア向けオープンワーク会社紹介資料 / company profile
openwork
1
17k
経営組織論〜ソニックガーデンの場合(2024/11版)
kuranuki
0
470
M&A Cloud Advisory Partners 採用ピッチブック
macloud
1
13k
株式会社BFT 会社紹介資料|エンジニア&セールス職向け
bft_recruit
2
11k
HireRoo Culture Deck(日本語)
kkosukeee
1
24k
Startup CTO of the year 2024 株式会社ハイヤールー
kkosukeee
0
3.5k
Featured
See All Featured
The Art of Programming - Codeland 2020
erikaheidi
52
13k
Building Adaptive Systems
keathley
38
2.3k
[RailsConf 2023 Opening Keynote] The Magic of Rails
eileencodes
28
9.1k
Rebuilding a faster, lazier Slack
samanthasiow
79
8.7k
I Don’t Have Time: Getting Over the Fear to Launch Your Podcast
jcasabona
28
2k
Helping Users Find Their Own Way: Creating Modern Search Experiences
danielanewman
29
2.3k
Measuring & Analyzing Core Web Vitals
bluesmoon
4
120
A designer walks into a library…
pauljervisheath
203
24k
Designing for humans not robots
tammielis
250
25k
個人開発の失敗を避けるイケてる考え方 / tips for indie hackers
panda_program
93
16k
Let's Do A Bunch of Simple Stuff to Make Websites Faster
chriscoyier
506
140k
The Cost Of JavaScript in 2023
addyosmani
45
6.7k
Transcript
Rene Treffer, Qaiser Abbasi How to reindex 1B documents in
1 hour?
Search @ SoundCloud
Powered by ElasticSearch
Typical search document
Clusters of 30 nodes
Clusters of 30 nodes data size * replication = 120%
* total memory
Cluster 2 Cluster 3 Cluster 1 Cluster 2 Cluster 3
Cluster 1 Multiple clusters per use-case
Problems?
Lead time of features and bugfixes Problems?
Indexing
Indexing 1. Extract
Indexing 1. Extract 2. Build ES documents
Indexing 1. Extract 2. Build ES documents 3. Load into
ES
Indexing 1. Extract 2. Build ES documents 3. Load into
ES 0. Live updates
Indexing 1. Extract 2. Build ES documents 3. Load into
ES 0. Live updates Kafka Kafka +
Kafka historic current compaction Cluster 1 Cluster 2 shipper 1
shipper 2 indexer
Kafka for ES documents 1. Enable compaction 2. Use fast
compression 3. Use enough partitions 4. Use SSDs + 10GBit
ES cluster lifecycle Reindex Live Maintenance
Reindex settings 1. Shards 2. Replication settings 3. Async Translog
4. Refresh Interval
Finish reindexing 1. Merge into one segment*** 2. Set #
replicas
Throughput ≈ 600K OP/s ≈ 30 Mins
4X faster for 95% ≈ 40ms for 50%
4X Reindexing in 1 Sprint
Summary • Solved initial problem • Enablement in daily life
Future work
Q & A
Sounds interesting? Come and talk to us!
THANK YOU
[email protected]
[email protected]