Lock in $30 Savings on PRO—Offer Ends Soon! ⏳
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
How to reindex 1B documents in 1 hour?
Search
Qaiser Abbasi
December 13, 2018
Business
0
770
How to reindex 1B documents in 1 hour?
Talk given at
www.meetup.com/Elasticsearch-Berlin/
Qaiser Abbasi
December 13, 2018
Tweet
Share
More Decks by Qaiser Abbasi
See All by Qaiser Abbasi
Java User Group Frankfurt – CDI BeanTesting
qabbasi
0
65
Testing-Darwinismus
qabbasi
0
62
Other Decks in Business
See All in Business
[NGA] カンパニーデック202511Ver.
ngaltd
PRO
1
700
~モブ、まだいけるよな?~2025年をふりかえってみて_20251126
masakiokuda
0
150
【DearOne】Dear Newest Member
hrm
2
14k
セーフィー株式会社(Safie Inc.) 会社紹介資料
safie_recruit
6
400k
TORICO Ethereum_companydeck_20251217
torico
0
650
曖昧なLLMの出力をプロダクト価値へつなげる、要求の具体化と評価
zerebom
4
410
インキュデータ会社紹介資料
okitsu
3
47k
強みのデザイン入門 / Introduction to Strengths Design
tbpgr
0
400
不感対策ソリューション 詳細資料
jtes
0
360
pmconf2025_-_現役教師のたこ焼き屋さん___現役PMの駄菓子屋さんが未来に挑む___ユーザーコミュニティ主導のプロダクトマネジメント_.pdf
mindman
0
2.7k
週4社員しながら個人開発にベットする / Betting on Personal Projects While Working a Four-Day Week
kohii00
4
2.9k
『業務設計の教科書』の概要
shunsuke_takeuchi
PRO
3
7.3k
Featured
See All Featured
Imperfection Machines: The Place of Print at Facebook
scottboms
269
13k
From Legacy to Launchpad: Building Startup-Ready Communities
dugsong
0
110
Writing Fast Ruby
sferik
630
62k
Design in an AI World
tapps
0
89
[Rails World 2023 - Day 1 Closing Keynote] - The Magic of Rails
eileencodes
37
2.7k
ピンチをチャンスに:未来をつくるプロダクトロードマップ #pmconf2020
aki_iinuma
128
54k
Designing for Performance
lara
610
69k
Easily Structure & Communicate Ideas using Wireframe
afnizarnur
194
17k
Building a Scalable Design System with Sketch
lauravandoore
463
34k
Context Engineering - Making Every Token Count
addyosmani
9
540
Scaling GitHub
holman
464
140k
Chrome DevTools: State of the Union 2024 - Debugging React & Beyond
addyosmani
9
1k
Transcript
Rene Treffer, Qaiser Abbasi How to reindex 1B documents in
1 hour?
Search @ SoundCloud
Powered by ElasticSearch
Typical search document
Clusters of 30 nodes
Clusters of 30 nodes data size * replication = 120%
* total memory
Cluster 2 Cluster 3 Cluster 1 Cluster 2 Cluster 3
Cluster 1 Multiple clusters per use-case
Problems?
Lead time of features and bugfixes Problems?
Indexing
Indexing 1. Extract
Indexing 1. Extract 2. Build ES documents
Indexing 1. Extract 2. Build ES documents 3. Load into
ES
Indexing 1. Extract 2. Build ES documents 3. Load into
ES 0. Live updates
Indexing 1. Extract 2. Build ES documents 3. Load into
ES 0. Live updates Kafka Kafka +
Kafka historic current compaction Cluster 1 Cluster 2 shipper 1
shipper 2 indexer
Kafka for ES documents 1. Enable compaction 2. Use fast
compression 3. Use enough partitions 4. Use SSDs + 10GBit
ES cluster lifecycle Reindex Live Maintenance
Reindex settings 1. Shards 2. Replication settings 3. Async Translog
4. Refresh Interval
Finish reindexing 1. Merge into one segment*** 2. Set #
replicas
Throughput ≈ 600K OP/s ≈ 30 Mins
4X faster for 95% ≈ 40ms for 50%
4X Reindexing in 1 Sprint
Summary • Solved initial problem • Enablement in daily life
Future work
Q & A
Sounds interesting? Come and talk to us!
THANK YOU
[email protected]
[email protected]