Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
How to reindex 1B documents in 1 hour?
Search
Qaiser Abbasi
December 13, 2018
Business
0
640
How to reindex 1B documents in 1 hour?
Talk given at
www.meetup.com/Elasticsearch-Berlin/
Qaiser Abbasi
December 13, 2018
Tweet
Share
More Decks by Qaiser Abbasi
See All by Qaiser Abbasi
Java User Group Frankfurt – CDI BeanTesting
qabbasi
0
50
Testing-Darwinismus
qabbasi
0
45
Other Decks in Business
See All in Business
生成AIと歩むこれからのキャリア
yuka_kakiuchi
1
130
採用ピッチブック
macloud
2
47k
Recruitment_information2024
hdn_tocci
0
190
【株式会社Amazia】採用資料(ビジネス職)
amazia200910
1
900
株式会社EventHub 会社紹介資料
eventhub
0
20k
Nstock 採用資料 / We are hiring
nstock
20
150k
TOILETHON
takuro_nakajima
PRO
1
1.4k
Smartwill Company Profile
1129panda
0
550
第24回クラウド女子会 登壇資料
o2mami
1
1.4k
ジンジニアのキャリア ~てぃーびーの場合~ / Tb's career
tbpgr
0
240
『射精責任』を禁欲本へ
takuro_nakajima
PRO
1
1.4k
プライシングについて②
umzws
0
240
Featured
See All Featured
個人開発の失敗を避けるイケてる考え方 / tips for indie hackers
panda_program
60
14k
How STYLIGHT went responsive
nonsquared
92
4.8k
Navigating Team Friction
lara
177
13k
For a Future-Friendly Web
brad_frost
171
8.9k
Cheating the UX When There Is Nothing More to Optimize - PixelPioneers
stephaniewalter
273
13k
No one is an island. Learnings from fostering a developers community.
thoeni
14
2.1k
Visualization
eitanlees
135
14k
Helping Users Find Their Own Way: Creating Modern Search Experiences
danielanewman
19
1.9k
Why Our Code Smells
bkeepers
PRO
331
56k
We Have a Design System, Now What?
morganepeng
42
6.7k
5 minutes of I Can Smell Your CMS
philhawksworth
199
19k
The Straight Up "How To Draw Better" Workshop
denniskardys
227
130k
Transcript
Rene Treffer, Qaiser Abbasi How to reindex 1B documents in
1 hour?
Search @ SoundCloud
Powered by ElasticSearch
Typical search document
Clusters of 30 nodes
Clusters of 30 nodes data size * replication = 120%
* total memory
Cluster 2 Cluster 3 Cluster 1 Cluster 2 Cluster 3
Cluster 1 Multiple clusters per use-case
Problems?
Lead time of features and bugfixes Problems?
Indexing
Indexing 1. Extract
Indexing 1. Extract 2. Build ES documents
Indexing 1. Extract 2. Build ES documents 3. Load into
ES
Indexing 1. Extract 2. Build ES documents 3. Load into
ES 0. Live updates
Indexing 1. Extract 2. Build ES documents 3. Load into
ES 0. Live updates Kafka Kafka +
Kafka historic current compaction Cluster 1 Cluster 2 shipper 1
shipper 2 indexer
Kafka for ES documents 1. Enable compaction 2. Use fast
compression 3. Use enough partitions 4. Use SSDs + 10GBit
ES cluster lifecycle Reindex Live Maintenance
Reindex settings 1. Shards 2. Replication settings 3. Async Translog
4. Refresh Interval
Finish reindexing 1. Merge into one segment*** 2. Set #
replicas
Throughput ≈ 600K OP/s ≈ 30 Mins
4X faster for 95% ≈ 40ms for 50%
4X Reindexing in 1 Sprint
Summary • Solved initial problem • Enablement in daily life
Future work
Q & A
Sounds interesting? Come and talk to us!
THANK YOU
[email protected]
[email protected]