Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
How to reindex 1B documents in 1 hour?
Search
Qaiser Abbasi
December 13, 2018
Business
0
760
How to reindex 1B documents in 1 hour?
Talk given at
www.meetup.com/Elasticsearch-Berlin/
Qaiser Abbasi
December 13, 2018
Tweet
Share
More Decks by Qaiser Abbasi
See All by Qaiser Abbasi
Java User Group Frankfurt – CDI BeanTesting
qabbasi
0
64
Testing-Darwinismus
qabbasi
0
60
Other Decks in Business
See All in Business
ナウビレッジ株式会社_会社紹介資料_20250821
nowvill
0
12k
営業職/新卒向け会社紹介資料(テックファーム株式会社)
techfirm
1
930
Corporate Story (GA technologies Co., Ltd.)
gatechnologies
0
170
Fracta Leap 会社紹介資料
fracta_leap
PRO
0
110
CREによる顧客のキャッチアップを加速する仕組み作り / Creating a mechanism to accelerate customer catch-up through CRE
woody_kawagoe
1
250
定義のない仕事 / Undefined Work
nrslib
9
4.1k
物流の専門家がお客様に伴走するサブスク型コンサルティング
mclogi
0
380
20250816 「アジャイル」って?~"Do Agile"から"Be Agile"へ~
east_takumi
0
3k
家族アルバム みてね 事業紹介 / Our Business
familyalbum
6
46k
【テックファームホールディングス】中途採用向け会社説明資料
techfirm
0
410
ブラインドスクエア&キーパンチ
chibanba1982
PRO
0
280
2011年 「ぼっちが懇親会でするべき97のこと」 #97bocchi から積み重ねた【令和7年最新】技術コミュニティ交流戦略2025
bash0c7
0
480
Featured
See All Featured
The Pragmatic Product Professional
lauravandoore
36
6.9k
Designing for Performance
lara
610
69k
Practical Orchestrator
shlominoach
190
11k
Designing Experiences People Love
moore
142
24k
StorybookのUI Testing Handbookを読んだ
zakiyama
31
6.1k
Automating Front-end Workflow
addyosmani
1370
200k
Build The Right Thing And Hit Your Dates
maggiecrowley
37
2.8k
GraphQLの誤解/rethinking-graphql
sonatard
72
11k
Rails Girls Zürich Keynote
gr2m
95
14k
Let's Do A Bunch of Simple Stuff to Make Websites Faster
chriscoyier
507
140k
Making the Leap to Tech Lead
cromwellryan
135
9.5k
Art, The Web, and Tiny UX
lynnandtonic
302
21k
Transcript
Rene Treffer, Qaiser Abbasi How to reindex 1B documents in
1 hour?
Search @ SoundCloud
Powered by ElasticSearch
Typical search document
Clusters of 30 nodes
Clusters of 30 nodes data size * replication = 120%
* total memory
Cluster 2 Cluster 3 Cluster 1 Cluster 2 Cluster 3
Cluster 1 Multiple clusters per use-case
Problems?
Lead time of features and bugfixes Problems?
Indexing
Indexing 1. Extract
Indexing 1. Extract 2. Build ES documents
Indexing 1. Extract 2. Build ES documents 3. Load into
ES
Indexing 1. Extract 2. Build ES documents 3. Load into
ES 0. Live updates
Indexing 1. Extract 2. Build ES documents 3. Load into
ES 0. Live updates Kafka Kafka +
Kafka historic current compaction Cluster 1 Cluster 2 shipper 1
shipper 2 indexer
Kafka for ES documents 1. Enable compaction 2. Use fast
compression 3. Use enough partitions 4. Use SSDs + 10GBit
ES cluster lifecycle Reindex Live Maintenance
Reindex settings 1. Shards 2. Replication settings 3. Async Translog
4. Refresh Interval
Finish reindexing 1. Merge into one segment*** 2. Set #
replicas
Throughput ≈ 600K OP/s ≈ 30 Mins
4X faster for 95% ≈ 40ms for 50%
4X Reindexing in 1 Sprint
Summary • Solved initial problem • Enablement in daily life
Future work
Q & A
Sounds interesting? Come and talk to us!
THANK YOU
[email protected]
[email protected]