Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
How to reindex 1B documents in 1 hour?
Search
Qaiser Abbasi
December 13, 2018
Business
780
0
Share
How to reindex 1B documents in 1 hour?
Talk given at
www.meetup.com/Elasticsearch-Berlin/
Qaiser Abbasi
December 13, 2018
More Decks by Qaiser Abbasi
See All by Qaiser Abbasi
Java User Group Frankfurt – CDI BeanTesting
qabbasi
0
66
Testing-Darwinismus
qabbasi
0
66
Other Decks in Business
See All in Business
LW_brochure_business
lincwellhr
1
80k
白金鉱業meetup発表資料
tetsuroito
1
110
営業職/新卒向け会社紹介資料(テックファーム株式会社)
techfirm
1
1.5k
Company Profile
katsuegu23
2
14k
介護休業ガイドブック(スパイダープラス)
spiderplus_cb
0
290
採用ピッチデック
macloud
4
87k
Palette Cloud Company Deck
palettecloud
0
12k
インキュデータ会社紹介資料
okitsu
3
53k
ele&company_companydeck
eleand
0
3.5k
enechain company deck_english
enechain
PRO
1
370
20260401_UPDATER_companysummary
updater_pr
0
120k
(48枚)絶対達成新入社員研修の特徴と2つのスタイル(リアル研修とオンライン教材)
nyattx
PRO
0
190
Featured
See All Featured
Leveraging Curiosity to Care for An Aging Population
cassininazir
1
220
Paper Plane
katiecoart
PRO
1
49k
We Are The Robots
honzajavorek
0
210
Redefining SEO in the New Era of Traffic Generation
szymonslowik
1
280
世界の人気アプリ100個を分析して見えたペイウォール設計の心得
akihiro_kokubo
PRO
69
38k
SERP Conf. Vienna - Web Accessibility: Optimizing for Inclusivity and SEO
sarafernandez
2
1.4k
How People are Using Generative and Agentic AI to Supercharge Their Products, Projects, Services and Value Streams Today
helenjbeal
1
160
SEO in 2025: How to Prepare for the Future of Search
ipullrank
3
3.4k
The Illustrated Children's Guide to Kubernetes
chrisshort
51
52k
Automating Front-end Workflow
addyosmani
1370
200k
Writing Fast Ruby
sferik
630
63k
Documentation Writing (for coders)
carmenintech
77
5.3k
Transcript
Rene Treffer, Qaiser Abbasi How to reindex 1B documents in
1 hour?
Search @ SoundCloud
Powered by ElasticSearch
Typical search document
Clusters of 30 nodes
Clusters of 30 nodes data size * replication = 120%
* total memory
Cluster 2 Cluster 3 Cluster 1 Cluster 2 Cluster 3
Cluster 1 Multiple clusters per use-case
Problems?
Lead time of features and bugfixes Problems?
Indexing
Indexing 1. Extract
Indexing 1. Extract 2. Build ES documents
Indexing 1. Extract 2. Build ES documents 3. Load into
ES
Indexing 1. Extract 2. Build ES documents 3. Load into
ES 0. Live updates
Indexing 1. Extract 2. Build ES documents 3. Load into
ES 0. Live updates Kafka Kafka +
Kafka historic current compaction Cluster 1 Cluster 2 shipper 1
shipper 2 indexer
Kafka for ES documents 1. Enable compaction 2. Use fast
compression 3. Use enough partitions 4. Use SSDs + 10GBit
ES cluster lifecycle Reindex Live Maintenance
Reindex settings 1. Shards 2. Replication settings 3. Async Translog
4. Refresh Interval
Finish reindexing 1. Merge into one segment*** 2. Set #
replicas
Throughput ≈ 600K OP/s ≈ 30 Mins
4X faster for 95% ≈ 40ms for 50%
4X Reindexing in 1 Sprint
Summary • Solved initial problem • Enablement in daily life
Future work
Q & A
Sounds interesting? Come and talk to us!
THANK YOU
[email protected]
[email protected]