Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
How to reindex 1B documents in 1 hour?
Search
Qaiser Abbasi
December 13, 2018
Business
800
0
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
How to reindex 1B documents in 1 hour?
Talk given at
www.meetup.com/Elasticsearch-Berlin/
Qaiser Abbasi
December 13, 2018
More Decks by Qaiser Abbasi
See All by Qaiser Abbasi
Java User Group Frankfurt – CDI BeanTesting
qabbasi
0
71
Testing-Darwinismus
qabbasi
0
76
Other Decks in Business
See All in Business
会社紹介資料
nipap
0
140
VISASQ: ABOUT DEV TEAM
eikohashiba
6
44k
パーソルクロステクノロジー_グループソリューション本部のご紹介 / Introduction_of_gs
pxt_gs_ssol
0
3.4k
ブランディングサービス紹介資料《抜粋版》
brandingtechnology
0
350
CompanyDeck_v6.5.pdf
xid
3
27k
【エンジニア採用】BuySell Technologies会社説明資料
buyselltechnologies
3
98k
営業、広報、開発。 多面的なAIネイティブ化のための 基盤について
timakin
0
200
AI導入で変わる PdMとエンジニアの関係性
paulxl
0
290
Smart Share Recruiting Deck
smartshare
0
530
【事業について知る】エーテンラボ採用デック
a10lab201612
0
110
採用ピッチ資料_キヨモトテックイチ
satoshi01
0
190
元ウェブエンジニアが軸を持って人事に転職したら大きくステップアップした話 / Web Dev to HR with a Purpose Driven Career Leap
tbpgr
2
2.4k
Featured
See All Featured
Making the Leap to Tech Lead
cromwellryan
135
9.9k
Into the Great Unknown - MozCon
thekraken
41
2.6k
Automating Front-end Workflow
addyosmani
1370
210k
Design of three-dimensional binary manipulators for pick-and-place task avoiding obstacles (IECON2024)
konakalab
0
450
Exploring anti-patterns in Rails
aemeredith
3
400
The AI Search Optimization Roadmap by Aleyda Solis
aleyda
1
5.9k
Scaling GitHub
holman
464
140k
エンジニアに許された特別な時間の終わり
watany
107
250k
Building Better People: How to give real-time feedback that sticks.
wjessup
370
20k
Impact Scores and Hybrid Strategies: The future of link building
tamaranovitovic
0
310
ReactJS: Keep Simple. Everything can be a component!
pedronauck
666
130k
Marketing Yourself as an Engineer | Alaka | Gurzu
gurzu
0
230
Transcript
Rene Treffer, Qaiser Abbasi How to reindex 1B documents in
1 hour?
Search @ SoundCloud
Powered by ElasticSearch
Typical search document
Clusters of 30 nodes
Clusters of 30 nodes data size * replication = 120%
* total memory
Cluster 2 Cluster 3 Cluster 1 Cluster 2 Cluster 3
Cluster 1 Multiple clusters per use-case
Problems?
Lead time of features and bugfixes Problems?
Indexing
Indexing 1. Extract
Indexing 1. Extract 2. Build ES documents
Indexing 1. Extract 2. Build ES documents 3. Load into
ES
Indexing 1. Extract 2. Build ES documents 3. Load into
ES 0. Live updates
Indexing 1. Extract 2. Build ES documents 3. Load into
ES 0. Live updates Kafka Kafka +
Kafka historic current compaction Cluster 1 Cluster 2 shipper 1
shipper 2 indexer
Kafka for ES documents 1. Enable compaction 2. Use fast
compression 3. Use enough partitions 4. Use SSDs + 10GBit
ES cluster lifecycle Reindex Live Maintenance
Reindex settings 1. Shards 2. Replication settings 3. Async Translog
4. Refresh Interval
Finish reindexing 1. Merge into one segment*** 2. Set #
replicas
Throughput ≈ 600K OP/s ≈ 30 Mins
4X faster for 95% ≈ 40ms for 50%
4X Reindexing in 1 Sprint
Summary • Solved initial problem • Enablement in daily life
Future work
Q & A
Sounds interesting? Come and talk to us!
THANK YOU
[email protected]
[email protected]