Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
How to reindex 1B documents in 1 hour?
Search
Qaiser Abbasi
December 13, 2018
Business
0
720
How to reindex 1B documents in 1 hour?
Talk given at
www.meetup.com/Elasticsearch-Berlin/
Qaiser Abbasi
December 13, 2018
Tweet
Share
More Decks by Qaiser Abbasi
See All by Qaiser Abbasi
Java User Group Frankfurt – CDI BeanTesting
qabbasi
0
57
Testing-Darwinismus
qabbasi
0
51
Other Decks in Business
See All in Business
職員給与等実態調査のDX
tokyo_metropolitan_gov_digital_hr
0
280
合議で決めたいわけではないけれど、 集合知で助けてほしい。_pmconf_2024
tomosooon
1
5.1k
デジタルツールを活用した収用委員会運営プロジェクト
tokyo_metropolitan_gov_digital_hr
0
260
重厚長大なものづくり企業におけるプロダクトマネジメントの挑戦と苦悩 / pmconf2024
tkchy
0
4.9k
株式会社miibo|採用デック
natsumidnx
0
140
タケウチグループRecruit
takeuchigroup
0
2k
概要
_connect
0
680
VISASQ: ABOUT US
eikohashiba
15
470k
20241211_CMCNagoya_9
hideki_ojima
0
400
Ampersand Company Profile
cuebicventures
PRO
0
480
ドローンを活用した汚泥焼却炉内点検のDX
tokyo_metropolitan_gov_digital_hr
0
320
【エンジニア職】中途採用向け会社説明資料(テックファーム株式会社)
techfirm
0
4.2k
Featured
See All Featured
Building Flexible Design Systems
yeseniaperezcruz
327
38k
RailsConf 2023
tenderlove
29
940
It's Worth the Effort
3n
183
28k
Refactoring Trust on Your Teams (GOTO; Chicago 2020)
rmw
32
2.7k
ピンチをチャンスに:未来をつくるプロダクトロードマップ #pmconf2020
aki_iinuma
111
49k
Reflections from 52 weeks, 52 projects
jeffersonlam
347
20k
Agile that works and the tools we love
rasmusluckow
328
21k
Designing Dashboards & Data Visualisations in Web Apps
destraynor
229
52k
What's in a price? How to price your products and services
michaelherold
243
12k
Measuring & Analyzing Core Web Vitals
bluesmoon
4
170
We Have a Design System, Now What?
morganepeng
51
7.3k
Docker and Python
trallard
42
3.1k
Transcript
Rene Treffer, Qaiser Abbasi How to reindex 1B documents in
1 hour?
Search @ SoundCloud
Powered by ElasticSearch
Typical search document
Clusters of 30 nodes
Clusters of 30 nodes data size * replication = 120%
* total memory
Cluster 2 Cluster 3 Cluster 1 Cluster 2 Cluster 3
Cluster 1 Multiple clusters per use-case
Problems?
Lead time of features and bugfixes Problems?
Indexing
Indexing 1. Extract
Indexing 1. Extract 2. Build ES documents
Indexing 1. Extract 2. Build ES documents 3. Load into
ES
Indexing 1. Extract 2. Build ES documents 3. Load into
ES 0. Live updates
Indexing 1. Extract 2. Build ES documents 3. Load into
ES 0. Live updates Kafka Kafka +
Kafka historic current compaction Cluster 1 Cluster 2 shipper 1
shipper 2 indexer
Kafka for ES documents 1. Enable compaction 2. Use fast
compression 3. Use enough partitions 4. Use SSDs + 10GBit
ES cluster lifecycle Reindex Live Maintenance
Reindex settings 1. Shards 2. Replication settings 3. Async Translog
4. Refresh Interval
Finish reindexing 1. Merge into one segment*** 2. Set #
replicas
Throughput ≈ 600K OP/s ≈ 30 Mins
4X faster for 95% ≈ 40ms for 50%
4X Reindexing in 1 Sprint
Summary • Solved initial problem • Enablement in daily life
Future work
Q & A
Sounds interesting? Come and talk to us!
THANK YOU
[email protected]
[email protected]