Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
How to reindex 1B documents in 1 hour?
Search
Qaiser Abbasi
December 13, 2018
Business
0
730
How to reindex 1B documents in 1 hour?
Talk given at
www.meetup.com/Elasticsearch-Berlin/
Qaiser Abbasi
December 13, 2018
Tweet
Share
More Decks by Qaiser Abbasi
See All by Qaiser Abbasi
Java User Group Frankfurt – CDI BeanTesting
qabbasi
0
58
Testing-Darwinismus
qabbasi
0
55
Other Decks in Business
See All in Business
enechain company deck
enechain
PRO
8
96k
ログラス会社紹介資料 新卒採用 ビジネス職[経営幹部候補]/ Loglass Company Deck
loglass2019
1
2k
コンセンサスゲーム「NASAゲーム カード版」
chibanba1982
PRO
0
1.1k
イオリア株式会社 会社説明資料
aeoliainc
0
180
IT業界向けグループワーク「THEクリティカルパス オンライン版」
chibanba1982
PRO
0
160
Digital Experience, Inc. - Company Deck
sprasiainc
0
18k
【株式会灯白社】会社紹介資料_カンパニーデック
tohakusha202006
0
270
企業向けオンライン謎解きゲーム「謎解き会社経営オンライン」
chibanba1982
PRO
0
290
情報整理ゲーム「野球のポジション当てゲーム カード版」
chibanba1982
PRO
0
580
Sales Marker Culture Book(English)
salesmarker
PRO
1
3.3k
フォロワーシップ、ビジョン共有の重要性を学べる「部課長ゲームオンライン」
chibanba1982
PRO
0
230
プロジェクトマネジメント疑似体験ゲーム「プロジェクトテーマパーク」
chibanba1982
PRO
0
270
Featured
See All Featured
GraphQLとの向き合い方2022年版
quramy
44
13k
[RailsConf 2023] Rails as a piece of cake
palkan
53
5.1k
Testing 201, or: Great Expectations
jmmastey
41
7.2k
How STYLIGHT went responsive
nonsquared
96
5.3k
The Art of Delivering Value - GDevCon NA Keynote
reverentgeek
8
1.2k
The Pragmatic Product Professional
lauravandoore
32
6.4k
Git: the NoSQL Database
bkeepers
PRO
427
64k
Design and Strategy: How to Deal with People Who Don’t "Get" Design
morganepeng
127
18k
Fireside Chat
paigeccino
34
3.1k
The Straight Up "How To Draw Better" Workshop
denniskardys
232
140k
Stop Working from a Prison Cell
hatefulcrawdad
267
20k
Principles of Awesome APIs and How to Build Them.
keavy
126
17k
Transcript
Rene Treffer, Qaiser Abbasi How to reindex 1B documents in
1 hour?
Search @ SoundCloud
Powered by ElasticSearch
Typical search document
Clusters of 30 nodes
Clusters of 30 nodes data size * replication = 120%
* total memory
Cluster 2 Cluster 3 Cluster 1 Cluster 2 Cluster 3
Cluster 1 Multiple clusters per use-case
Problems?
Lead time of features and bugfixes Problems?
Indexing
Indexing 1. Extract
Indexing 1. Extract 2. Build ES documents
Indexing 1. Extract 2. Build ES documents 3. Load into
ES
Indexing 1. Extract 2. Build ES documents 3. Load into
ES 0. Live updates
Indexing 1. Extract 2. Build ES documents 3. Load into
ES 0. Live updates Kafka Kafka +
Kafka historic current compaction Cluster 1 Cluster 2 shipper 1
shipper 2 indexer
Kafka for ES documents 1. Enable compaction 2. Use fast
compression 3. Use enough partitions 4. Use SSDs + 10GBit
ES cluster lifecycle Reindex Live Maintenance
Reindex settings 1. Shards 2. Replication settings 3. Async Translog
4. Refresh Interval
Finish reindexing 1. Merge into one segment*** 2. Set #
replicas
Throughput ≈ 600K OP/s ≈ 30 Mins
4X faster for 95% ≈ 40ms for 50%
4X Reindexing in 1 Sprint
Summary • Solved initial problem • Enablement in daily life
Future work
Q & A
Sounds interesting? Come and talk to us!
THANK YOU
[email protected]
[email protected]