Lock in $30 Savings on PRO—Offer Ends Soon! ⏳
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
How to reindex 1B documents in 1 hour?
Search
Qaiser Abbasi
December 13, 2018
Business
0
770
How to reindex 1B documents in 1 hour?
Talk given at
www.meetup.com/Elasticsearch-Berlin/
Qaiser Abbasi
December 13, 2018
Tweet
Share
More Decks by Qaiser Abbasi
See All by Qaiser Abbasi
Java User Group Frankfurt – CDI BeanTesting
qabbasi
0
65
Testing-Darwinismus
qabbasi
0
62
Other Decks in Business
See All in Business
VISASQ: ABOUT DEV TEAM
eikohashiba
6
38k
曖昧なLLMの出力をプロダクト価値へつなげる、要求の具体化と評価
zerebom
3
360
アイエーエナジー 会社説明資料 ~一緒に未来を作る仲間へ~
iaenergy
0
150
生成AI専任営業が語るre:Inventで発表された生成AIアップデート情報
suzakiyoshito
0
170
セブンデックス 採用資料
sevendex
1
3.2k
Sales Marker Culture Book(English)
salesmarker
PRO
2
7.3k
強みのデザイン入門 / Introduction to Strengths Design
tbpgr
0
380
不感対策ソリューション 詳細資料
jtes
0
350
PdMによるLiveバイブコーディング〜プロトタイプ開発実践〜
kakumaeda
2
540
PIGG Culture Deck / 株式会社サイバーエージェント AmebaLIFE事業本部
cyberagent_amebalife
2
1.8k
日本マーケティング学会2025発表_組織の市場志向形成におけるバウンダリースパナー行動とマーケターの越境的役割
nazoru
PRO
0
760
知識の非対称性を越える_PdMがエキスパートと築く_信頼と対話の_意思決定の技術__.pdf
hirotoshisakata1
0
2.1k
Featured
See All Featured
We Have a Design System, Now What?
morganepeng
54
7.9k
Facilitating Awesome Meetings
lara
57
6.7k
How to Create Impact in a Changing Tech Landscape [PerfNow 2023]
tammyeverts
55
3.1k
Raft: Consensus for Rubyists
vanstee
141
7.2k
What’s in a name? Adding method to the madness
productmarketing
PRO
24
3.8k
VelocityConf: Rendering Performance Case Studies
addyosmani
333
24k
XXLCSS - How to scale CSS and keep your sanity
sugarenia
249
1.3M
Responsive Adventures: Dirty Tricks From The Dark Corners of Front-End
smashingmag
253
22k
A designer walks into a library…
pauljervisheath
210
24k
The Cult of Friendly URLs
andyhume
79
6.7k
Site-Speed That Sticks
csswizardry
13
990
[RailsConf 2023 Opening Keynote] The Magic of Rails
eileencodes
31
9.8k
Transcript
Rene Treffer, Qaiser Abbasi How to reindex 1B documents in
1 hour?
Search @ SoundCloud
Powered by ElasticSearch
Typical search document
Clusters of 30 nodes
Clusters of 30 nodes data size * replication = 120%
* total memory
Cluster 2 Cluster 3 Cluster 1 Cluster 2 Cluster 3
Cluster 1 Multiple clusters per use-case
Problems?
Lead time of features and bugfixes Problems?
Indexing
Indexing 1. Extract
Indexing 1. Extract 2. Build ES documents
Indexing 1. Extract 2. Build ES documents 3. Load into
ES
Indexing 1. Extract 2. Build ES documents 3. Load into
ES 0. Live updates
Indexing 1. Extract 2. Build ES documents 3. Load into
ES 0. Live updates Kafka Kafka +
Kafka historic current compaction Cluster 1 Cluster 2 shipper 1
shipper 2 indexer
Kafka for ES documents 1. Enable compaction 2. Use fast
compression 3. Use enough partitions 4. Use SSDs + 10GBit
ES cluster lifecycle Reindex Live Maintenance
Reindex settings 1. Shards 2. Replication settings 3. Async Translog
4. Refresh Interval
Finish reindexing 1. Merge into one segment*** 2. Set #
replicas
Throughput ≈ 600K OP/s ≈ 30 Mins
4X faster for 95% ≈ 40ms for 50%
4X Reindexing in 1 Sprint
Summary • Solved initial problem • Enablement in daily life
Future work
Q & A
Sounds interesting? Come and talk to us!
THANK YOU
[email protected]
[email protected]