Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
How to reindex 1B documents in 1 hour?
Search
Qaiser Abbasi
December 13, 2018
Business
790
0
Share
How to reindex 1B documents in 1 hour?
Talk given at
www.meetup.com/Elasticsearch-Berlin/
Qaiser Abbasi
December 13, 2018
More Decks by Qaiser Abbasi
See All by Qaiser Abbasi
Java User Group Frankfurt – CDI BeanTesting
qabbasi
0
71
Testing-Darwinismus
qabbasi
0
72
Other Decks in Business
See All in Business
エンジニアのためのコミュニケーション術
zashii
0
190
DMM.com コーポレートブック
dmm
2
480k
インキュデータ会社紹介資料
okitsu
3
55k
株式会社ELYZA(イライザ) 採用情報資料 / RECRUIT PITCH
elyza
2
750k
三井物産グループのデジタル証券~イオン大宮~徹底解説セミナー
c0rp_mdm
PRO
0
1.1k
introduce_backoffice_coordinate
yuki_yano
PRO
1
540
エージェントスキルによる最適化
mickey_kubo
2
120
BASE株式会社 統合報告書2026
base
PRO
0
1.1k
【キャリア採用】NEC会社紹介資料
nec_recruiting
0
140
Codexを安心して業務活用するには?──「権限・接続・実行」の考え方
hima2b4
0
220
Smart Share Recruiting Deck
smartshare
0
160
複雑なシステムから大学職員を救う自律型エージェント「だっこくん」
micknerd
0
160
Featured
See All Featured
BBQ
matthewcrist
89
10k
GitHub's CSS Performance
jonrohan
1033
470k
We Analyzed 250 Million AI Search Results: Here's What I Found
joshbly
1
1.3k
[RailsConf 2023] Rails as a piece of cake
palkan
59
6.6k
Amusing Abliteration
ianozsvald
1
180
The Straight Up "How To Draw Better" Workshop
denniskardys
239
140k
Raft: Consensus for Rubyists
vanstee
141
7.4k
Why Your Marketing Sucks and What You Can Do About It - Sophie Logan
marketingsoph
0
150
Easily Structure & Communicate Ideas using Wireframe
afnizarnur
194
17k
Into the Great Unknown - MozCon
thekraken
41
2.5k
Noah Learner - AI + Me: how we built a GSC Bulk Export data pipeline
techseoconnect
PRO
0
180
"I'm Feeling Lucky" - Building Great Search Experiences for Today's Users (#IAC19)
danielanewman
231
23k
Transcript
Rene Treffer, Qaiser Abbasi How to reindex 1B documents in
1 hour?
Search @ SoundCloud
Powered by ElasticSearch
Typical search document
Clusters of 30 nodes
Clusters of 30 nodes data size * replication = 120%
* total memory
Cluster 2 Cluster 3 Cluster 1 Cluster 2 Cluster 3
Cluster 1 Multiple clusters per use-case
Problems?
Lead time of features and bugfixes Problems?
Indexing
Indexing 1. Extract
Indexing 1. Extract 2. Build ES documents
Indexing 1. Extract 2. Build ES documents 3. Load into
ES
Indexing 1. Extract 2. Build ES documents 3. Load into
ES 0. Live updates
Indexing 1. Extract 2. Build ES documents 3. Load into
ES 0. Live updates Kafka Kafka +
Kafka historic current compaction Cluster 1 Cluster 2 shipper 1
shipper 2 indexer
Kafka for ES documents 1. Enable compaction 2. Use fast
compression 3. Use enough partitions 4. Use SSDs + 10GBit
ES cluster lifecycle Reindex Live Maintenance
Reindex settings 1. Shards 2. Replication settings 3. Async Translog
4. Refresh Interval
Finish reindexing 1. Merge into one segment*** 2. Set #
replicas
Throughput ≈ 600K OP/s ≈ 30 Mins
4X faster for 95% ≈ 40ms for 50%
4X Reindexing in 1 Sprint
Summary • Solved initial problem • Enablement in daily life
Future work
Q & A
Sounds interesting? Come and talk to us!
THANK YOU
[email protected]
[email protected]