Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
How to reindex 1B documents in 1 hour?
Search
Qaiser Abbasi
December 13, 2018
Business
0
730
How to reindex 1B documents in 1 hour?
Talk given at
www.meetup.com/Elasticsearch-Berlin/
Qaiser Abbasi
December 13, 2018
Tweet
Share
More Decks by Qaiser Abbasi
See All by Qaiser Abbasi
Java User Group Frankfurt – CDI BeanTesting
qabbasi
0
58
Testing-Darwinismus
qabbasi
0
55
Other Decks in Business
See All in Business
Cobe Associe: Who we are? /コンサル・市場調査・人材紹介のCobe Associe
nozomi
6
19k
株式会社LANY / Company Deck
lany
2
59k
ARI会社説明
arisaiyou
1
6.4k
サスメド株式会社 Culture Deck
susmed
0
38k
技術広報の集い #5 LT 資料 2025 年挑戦したいこと
n0mzk
0
260
Arches 会社説明資料/ HR Deck
arches0501
0
9.3k
THECOO採用資料 (全社版)_2025.01.16
thecoo
0
200
SRE じゃなくてもできる! インシデント対応で鍛えた CRE チームの5年史 / Five-year history of CRE's hard work in incident response
mayuzo
1
3.8k
Owned株式会社 採用ピッチ
owned_recruit
PRO
0
550
MOSH_companydeck_202502
mosh_inc
0
990
akippa株式会社 - 会社紹介資料
akippa
4
62k
スマートキャンプ株式会社 会社紹介資料 / companydeck
smartcamp
18
650k
Featured
See All Featured
Done Done
chrislema
182
16k
For a Future-Friendly Web
brad_frost
176
9.5k
CSS Pre-Processors: Stylus, Less & Sass
bermonpainter
356
29k
Site-Speed That Sticks
csswizardry
3
300
Stop Working from a Prison Cell
hatefulcrawdad
267
20k
Facilitating Awesome Meetings
lara
51
6.2k
Statistics for Hackers
jakevdp
797
220k
4 Signs Your Business is Dying
shpigford
182
22k
How To Stay Up To Date on Web Technology
chriscoyier
790
250k
Building Flexible Design Systems
yeseniaperezcruz
328
38k
Producing Creativity
orderedlist
PRO
343
39k
Fireside Chat
paigeccino
34
3.2k
Transcript
Rene Treffer, Qaiser Abbasi How to reindex 1B documents in
1 hour?
Search @ SoundCloud
Powered by ElasticSearch
Typical search document
Clusters of 30 nodes
Clusters of 30 nodes data size * replication = 120%
* total memory
Cluster 2 Cluster 3 Cluster 1 Cluster 2 Cluster 3
Cluster 1 Multiple clusters per use-case
Problems?
Lead time of features and bugfixes Problems?
Indexing
Indexing 1. Extract
Indexing 1. Extract 2. Build ES documents
Indexing 1. Extract 2. Build ES documents 3. Load into
ES
Indexing 1. Extract 2. Build ES documents 3. Load into
ES 0. Live updates
Indexing 1. Extract 2. Build ES documents 3. Load into
ES 0. Live updates Kafka Kafka +
Kafka historic current compaction Cluster 1 Cluster 2 shipper 1
shipper 2 indexer
Kafka for ES documents 1. Enable compaction 2. Use fast
compression 3. Use enough partitions 4. Use SSDs + 10GBit
ES cluster lifecycle Reindex Live Maintenance
Reindex settings 1. Shards 2. Replication settings 3. Async Translog
4. Refresh Interval
Finish reindexing 1. Merge into one segment*** 2. Set #
replicas
Throughput ≈ 600K OP/s ≈ 30 Mins
4X faster for 95% ≈ 40ms for 50%
4X Reindexing in 1 Sprint
Summary • Solved initial problem • Enablement in daily life
Future work
Q & A
Sounds interesting? Come and talk to us!
THANK YOU
[email protected]
[email protected]