Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Refactoring a Solr based api application
Search
Sponsored
·
Ship Features Fearlessly
Turn features on and off without deploys. Used by thousands of Ruby developers.
→
Torsten Bøgh Köster
April 13, 2012
Programming
3
110
Refactoring a Solr based api application
Held on Apache Lucene Eurocon 2011 in Barcelona
Torsten Bøgh Köster
April 13, 2012
Tweet
Share
More Decks by Torsten Bøgh Köster
See All by Torsten Bøgh Köster
LLMs im Griff: Observability, Tracing und Security
tboeghk
0
12
Oder mache ich es lieber selbst? Wie sich Kosten und Geopolitik auf Cloud-Betrieb auswirken
tboeghk
0
11
Taking an abandoned Solr search from zero to GenAI hero
tboeghk
0
37
Oder mache ich es lieber selbst? Wie sich Kosten und Geopolitik auf Cloud-Betrieb auswirken
tboeghk
0
42
🔪 How we cut our AWS costs in half
tboeghk
0
330
Shared Nothing Logging Infrastructure
tboeghk
0
120
Beyond Cloud: A road trip into AWS and back to bare metal
tboeghk
1
110
Shared Nothing Logging Infrastructure
tboeghk
0
1.4k
Kubernetes the ❤️ way
tboeghk
0
1.1k
Other Decks in Programming
See All in Programming
CSC307 Lecture 05
javiergs
PRO
0
500
CSC307 Lecture 03
javiergs
PRO
1
490
CSC307 Lecture 06
javiergs
PRO
0
680
CSC307 Lecture 09
javiergs
PRO
1
830
フルサイクルエンジニアリングをAI Agentで全自動化したい 〜構想と現在地〜
kamina_zzz
0
400
Grafana:建立系統全知視角的捷徑
blueswen
0
330
それ、本当に安全? ファイルアップロードで見落としがちなセキュリティリスクと対策
penpeen
7
3.8k
組織で育むオブザーバビリティ
ryota_hnk
0
170
humanlayerのブログから学ぶ、良いCLAUDE.mdの書き方
tsukamoto1783
0
190
MDN Web Docs に日本語翻訳でコントリビュート
ohmori_yusuke
0
640
AIによる開発の民主化を支える コンテキスト管理のこれまでとこれから
mulyu
3
140
Data-Centric Kaggle
isax1015
2
770
Featured
See All Featured
Heart Work Chapter 1 - Part 1
lfama
PRO
5
35k
Believing is Seeing
oripsolob
1
53
The #1 spot is gone: here's how to win anyway
tamaranovitovic
2
930
What the history of the web can teach us about the future of AI
inesmontani
PRO
1
430
The untapped power of vector embeddings
frankvandijk
1
1.6k
Building a A Zero-Code AI SEO Workflow
portentint
PRO
0
300
How to Align SEO within the Product Triangle To Get Buy-In & Support - #RIMC
aleyda
1
1.4k
職位にかかわらず全員がリーダーシップを発揮するチーム作り / Building a team where everyone can demonstrate leadership regardless of position
madoxten
57
50k
The Straight Up "How To Draw Better" Workshop
denniskardys
239
140k
Skip the Path - Find Your Career Trail
mkilby
0
53
RailsConf & Balkan Ruby 2019: The Past, Present, and Future of Rails at GitHub
eileencodes
141
34k
jQuery: Nuts, Bolts and Bling
dougneiner
65
8.4k
Transcript
Architectural lessons learned from refactoring a Solr based API application.
Torsten Bøgh Köster (Shopping24) Apache Lucene Eurocon, 19.10.2011
Contents Shopping24 and it‘s API Technical scaling solutions Sharding Caching
Solr Cores „Elastic“ infrastructure business requirements as key factor
@tboeghk Software- and systems- architect 2 years experience with Solr
3 years experience with Lucene Team of 7 Java developers currently at Shopping24
shopping24 internet group
1 portal became n portals
30 partner shops became 700
500k to 7m documents
index fact time •16 Gig Data •Single-Core-Layout •Up to 17s
response time •Machine size limited •Stalled at solr version 1.4 •API designed for small tools
scaling goal: 15-50m documents
ask the nerds „Shard!“ That‘ll be fun! „Use spare compute
cores at Amazon?“ breathe load into the cloud „Reduce that index size“ „Get rid of those long running queries!“
data sharding ...
... is highly effective. 125ms 250ms 375ms 500ms 1 4
8 12 16 20 1shard 2shard 3shard 4shard 6shard 8shard concurrent requests
Sharding: size matters the bigger your index gets, the more
complex your queries are, the more concurrent requests, the more sharding you need
but wait ...
Why do we have such a big index?
7m documents vs. 2m active poducts
fashion product lifecycle meets SEO Bastografie / photocase.com
Separation of duties! Remove unsearchable data from your index.
Why do we have complex queries?
A Solr index designed for 1 portal
Grown into a multi-portal index
Let “sharding“ follow your data ...
... and build separate cores for every client.
Duplicate data as long as access is fast. andybahn /
photocase.com
Streamline your index provisioning process.
A thousand splendid cores at your fingertips.
Throwing hardware at problems. Automated.
evil traps: latency, $$
mirror your complete system – solve load balancer problems froodmat
/ photocase.com
I said faster!
use a cache layer like Varnish.
What about those complex queries? Why do we have them?
And how do we get rid of them?
Lost in encapsulation: Solr API exposed to world.
What‘s the key factor?
look at your business requirements
decrease complexity
Questions? Comments? Ideas? Twitter: @tboeghk Github: @tboeghk Email:
[email protected]
Web:
http://www.s24.com Images: sxc.hu (unless noted otherwise)