Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
How to reindex 1B documents in 1 hour?
Search
Qaiser Abbasi
December 13, 2018
Business
0
750
How to reindex 1B documents in 1 hour?
Talk given at
www.meetup.com/Elasticsearch-Berlin/
Qaiser Abbasi
December 13, 2018
Tweet
Share
More Decks by Qaiser Abbasi
See All by Qaiser Abbasi
Java User Group Frankfurt – CDI BeanTesting
qabbasi
0
63
Testing-Darwinismus
qabbasi
0
58
Other Decks in Business
See All in Business
社会の中のわたしの技術 ─ 自分の地図の描き方 #wttjp
yotii23
0
310
第9回 情シス転職ミートアップ - わたしのミッションとLayerXに決めた理由
shimosyan
0
340
息苦しい目標設定に、さよならを。 〜挑戦するチームへ導く「成長観点」と「給与観点」の使い分け〜
mkitahara01985
3
300
OpenBridge株式会社 会社紹介資料 / We are hiring
openbridge
0
270
AIUX is Agentic UX
kan
0
280
『Policy Fund』採択団体 政策提言集/Policy Fund Report
polipoli
0
460
株式会社kubellパートナー 会社説明資料 (MINAGINE事業版)
kubell_partner
2
340
BoostDraft 会社紹介資料
boostdraft
0
430
【DearOne】Dear Newest Member
hrm
2
10k
tokyo_dbt_meetup_#14_意志ある羅針盤たれ<データサイド>
t_yamaguchi
3
580
インキュデータ会社紹介資料
okitsu
3
42k
国内ランサムウェア3事例から学ぶ中小病院におけるサイバーセキュリティ対策 / Cybersecurity Learned from Cases
henryofficial
0
220
Featured
See All Featured
How STYLIGHT went responsive
nonsquared
100
5.6k
Designing for humans not robots
tammielis
253
25k
What’s in a name? Adding method to the madness
productmarketing
PRO
23
3.5k
Let's Do A Bunch of Simple Stuff to Make Websites Faster
chriscoyier
507
140k
The Cult of Friendly URLs
andyhume
79
6.5k
CSS Pre-Processors: Stylus, Less & Sass
bermonpainter
357
30k
StorybookのUI Testing Handbookを読んだ
zakiyama
30
5.9k
Speed Design
sergeychernyshev
32
1k
Creating an realtime collaboration tool: Agile Flush - .NET Oxford
marcduiker
30
2.1k
Adopting Sorbet at Scale
ufuk
77
9.4k
Embracing the Ebb and Flow
colly
86
4.7k
Fantastic passwords and where to find them - at NoRuKo
philnash
51
3.3k
Transcript
Rene Treffer, Qaiser Abbasi How to reindex 1B documents in
1 hour?
Search @ SoundCloud
Powered by ElasticSearch
Typical search document
Clusters of 30 nodes
Clusters of 30 nodes data size * replication = 120%
* total memory
Cluster 2 Cluster 3 Cluster 1 Cluster 2 Cluster 3
Cluster 1 Multiple clusters per use-case
Problems?
Lead time of features and bugfixes Problems?
Indexing
Indexing 1. Extract
Indexing 1. Extract 2. Build ES documents
Indexing 1. Extract 2. Build ES documents 3. Load into
ES
Indexing 1. Extract 2. Build ES documents 3. Load into
ES 0. Live updates
Indexing 1. Extract 2. Build ES documents 3. Load into
ES 0. Live updates Kafka Kafka +
Kafka historic current compaction Cluster 1 Cluster 2 shipper 1
shipper 2 indexer
Kafka for ES documents 1. Enable compaction 2. Use fast
compression 3. Use enough partitions 4. Use SSDs + 10GBit
ES cluster lifecycle Reindex Live Maintenance
Reindex settings 1. Shards 2. Replication settings 3. Async Translog
4. Refresh Interval
Finish reindexing 1. Merge into one segment*** 2. Set #
replicas
Throughput ≈ 600K OP/s ≈ 30 Mins
4X faster for 95% ≈ 40ms for 50%
4X Reindexing in 1 Sprint
Summary • Solved initial problem • Enablement in daily life
Future work
Q & A
Sounds interesting? Come and talk to us!
THANK YOU
[email protected]
[email protected]