Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Refactoring a Solr based api application
Search
Torsten Bøgh Köster
April 13, 2012
Programming
3
89
Refactoring a Solr based api application
Held on Apache Lucene Eurocon 2011 in Barcelona
Torsten Bøgh Köster
April 13, 2012
Tweet
Share
More Decks by Torsten Bøgh Köster
See All by Torsten Bøgh Köster
🔪 How we cut our AWS costs in half
tboeghk
0
120
Shared Nothing Logging Infrastructure
tboeghk
0
79
Beyond Cloud: A road trip into AWS and back to bare metal
tboeghk
1
55
Shared Nothing Logging Infrastructure
tboeghk
0
1k
Kubernetes the ❤️ way
tboeghk
0
840
Beyond Cloud: A road trip into AWS and back to bare metal
tboeghk
0
74
Open-Source-Logging und -Monitoring (W-JAX 2017)
tboeghk
0
87
Beyond Cloud (W-JAX 2017)
tboeghk
0
110
Open Source Logging & Monitoring (code.talks 2017)
tboeghk
0
82
Other Decks in Programming
See All in Programming
Tailwind CSSを本気でカスタマイズする方法
fsubal
9
3.5k
雑に思考を整理する技術と効能
konifar
57
26k
Zero Waste, Radical Magic, and Italian Graft – Quarkus Efficiency Secrets
hollycummins
0
220
コーンフレークから始める モデリング会話入門
ogurotakayuki
0
290
if constexpr文はテンプレート世界のラムダ式である
faithandbrave
2
350
Rails と人魚の話/rails-and-mermaid
sanfrecce_osaka
0
100
Azure OpenAI Serviceのプロンプトエンジニアリング入門
tomokusaba
3
160
HUIT新歓2024「競技プログラミング、やってみませんか?」
slephy2784
1
250
単体テストを書かない技術 #phpcon_odawara
o0h
PRO
26
7.9k
サイコロで理解する統計的仮説検定の考え方
tatamiya
2
230
AWS Application Composerで始める、 サーバーレスなデータ基盤構築 / 20240406-jawsug-hokuriku-shinkansen
kasacchiful
1
250
元気予報
suu_mire0726
0
860
Featured
See All Featured
Building Your Own Lightsaber
phodgson
98
5.7k
Scaling GitHub
holman
457
140k
Statistics for Hackers
jakevdp
789
220k
StorybookのUI Testing Handbookを読んだ
zakiyama
11
4.6k
"I'm Feeling Lucky" - Building Great Search Experiences for Today's Users (#IAC19)
danielanewman
220
21k
ParisWeb 2013: Learning to Love: Crash Course in Emotional UX Design
dotmariusz
104
6.6k
Designing Experiences People Love
moore
136
23k
Optimising Largest Contentful Paint
csswizardry
7
2.3k
[RailsConf 2023 Opening Keynote] The Magic of Rails
eileencodes
9
8.3k
5 minutes of I Can Smell Your CMS
philhawksworth
199
19k
Building a Scalable Design System with Sketch
lauravandoore
455
32k
Cheating the UX When There Is Nothing More to Optimize - PixelPioneers
stephaniewalter
273
13k
Transcript
Architectural lessons learned from refactoring a Solr based API application.
Torsten Bøgh Köster (Shopping24) Apache Lucene Eurocon, 19.10.2011
Contents Shopping24 and it‘s API Technical scaling solutions Sharding Caching
Solr Cores „Elastic“ infrastructure business requirements as key factor
@tboeghk Software- and systems- architect 2 years experience with Solr
3 years experience with Lucene Team of 7 Java developers currently at Shopping24
shopping24 internet group
1 portal became n portals
30 partner shops became 700
500k to 7m documents
index fact time •16 Gig Data •Single-Core-Layout •Up to 17s
response time •Machine size limited •Stalled at solr version 1.4 •API designed for small tools
scaling goal: 15-50m documents
ask the nerds „Shard!“ That‘ll be fun! „Use spare compute
cores at Amazon?“ breathe load into the cloud „Reduce that index size“ „Get rid of those long running queries!“
data sharding ...
... is highly effective. 125ms 250ms 375ms 500ms 1 4
8 12 16 20 1shard 2shard 3shard 4shard 6shard 8shard concurrent requests
Sharding: size matters the bigger your index gets, the more
complex your queries are, the more concurrent requests, the more sharding you need
but wait ...
Why do we have such a big index?
7m documents vs. 2m active poducts
fashion product lifecycle meets SEO Bastografie / photocase.com
Separation of duties! Remove unsearchable data from your index.
Why do we have complex queries?
A Solr index designed for 1 portal
Grown into a multi-portal index
Let “sharding“ follow your data ...
... and build separate cores for every client.
Duplicate data as long as access is fast. andybahn /
photocase.com
Streamline your index provisioning process.
A thousand splendid cores at your fingertips.
Throwing hardware at problems. Automated.
evil traps: latency, $$
mirror your complete system – solve load balancer problems froodmat
/ photocase.com
I said faster!
use a cache layer like Varnish.
What about those complex queries? Why do we have them?
And how do we get rid of them?
Lost in encapsulation: Solr API exposed to world.
What‘s the key factor?
look at your business requirements
decrease complexity
Questions? Comments? Ideas? Twitter: @tboeghk Github: @tboeghk Email:
[email protected]
Web:
http://www.s24.com Images: sxc.hu (unless noted otherwise)