Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Refactoring a Solr based api application
Search
Torsten Bøgh Köster
April 13, 2012
Programming
3
100
Refactoring a Solr based api application
Held on Apache Lucene Eurocon 2011 in Barcelona
Torsten Bøgh Köster
April 13, 2012
Tweet
Share
More Decks by Torsten Bøgh Köster
See All by Torsten Bøgh Köster
Taking an abandoned Solr search from zero to GenAI hero
tboeghk
0
14
Oder mache ich es lieber selbst? Wie sich Kosten und Geopolitik auf Cloud-Betrieb auswirken
tboeghk
0
34
🔪 How we cut our AWS costs in half
tboeghk
0
250
Shared Nothing Logging Infrastructure
tboeghk
0
110
Beyond Cloud: A road trip into AWS and back to bare metal
tboeghk
1
100
Shared Nothing Logging Infrastructure
tboeghk
0
1.3k
Kubernetes the ❤️ way
tboeghk
0
1k
Beyond Cloud: A road trip into AWS and back to bare metal
tboeghk
0
100
Open-Source-Logging und -Monitoring (W-JAX 2017)
tboeghk
0
97
Other Decks in Programming
See All in Programming
明示と暗黙 ー PHPとGoの インターフェイスの違いを知る
shimabox
2
480
AI時代のソフトウェア開発を考える(2025/07版) / Agentic Software Engineering Findy 2025-07 Edition
twada
PRO
75
24k
Composerが「依存解決」のためにどんな工夫をしているか #phpcon
o0h
PRO
1
250
Kotlin エンジニアへ送る:Swift 案件に参加させられる日に備えて~似てるけど色々違う Swift の仕様 / from Kotlin to Swift
lovee
1
260
来たるべき 8.0 に備えて React 19 新機能と React Router 固有機能の取捨選択とすり合わせを考える
oukayuka
2
920
10 Costly Database Performance Mistakes (And How To Fix Them)
andyatkinson
0
230
PipeCDのプラグイン化で目指すところ
warashi
1
270
LT 2025-06-30: プロダクトエンジニアの役割
yamamotok
0
730
Code as Context 〜 1にコードで 2にリンタ 34がなくて 5にルール? 〜
yodakeisuke
0
120
git worktree × Claude Code × MCP ~生成AI時代の並列開発フロー~
hisuzuya
1
550
What Spring Developers Should Know About Jakarta EE
ivargrimstad
0
440
スタートアップの急成長を支えるプラットフォームエンジニアリングと組織戦略
sutochin26
1
4.7k
Featured
See All Featured
10 Git Anti Patterns You Should be Aware of
lemiorhan
PRO
657
60k
Practical Orchestrator
shlominoach
188
11k
VelocityConf: Rendering Performance Case Studies
addyosmani
332
24k
Templates, Plugins, & Blocks: Oh My! Creating the theme that thinks of everything
marktimemedia
31
2.4k
Designing for Performance
lara
610
69k
Adopting Sorbet at Scale
ufuk
77
9.4k
Mobile First: as difficult as doing things right
swwweet
223
9.7k
GraphQLの誤解/rethinking-graphql
sonatard
71
11k
Fight the Zombie Pattern Library - RWD Summit 2016
marcelosomers
233
17k
The Cost Of JavaScript in 2023
addyosmani
51
8.5k
The Success of Rails: Ensuring Growth for the Next 100 Years
eileencodes
45
7.5k
The Myth of the Modular Monolith - Day 2 Keynote - Rails World 2024
eileencodes
26
2.9k
Transcript
Architectural lessons learned from refactoring a Solr based API application.
Torsten Bøgh Köster (Shopping24) Apache Lucene Eurocon, 19.10.2011
Contents Shopping24 and it‘s API Technical scaling solutions Sharding Caching
Solr Cores „Elastic“ infrastructure business requirements as key factor
@tboeghk Software- and systems- architect 2 years experience with Solr
3 years experience with Lucene Team of 7 Java developers currently at Shopping24
shopping24 internet group
1 portal became n portals
30 partner shops became 700
500k to 7m documents
index fact time •16 Gig Data •Single-Core-Layout •Up to 17s
response time •Machine size limited •Stalled at solr version 1.4 •API designed for small tools
scaling goal: 15-50m documents
ask the nerds „Shard!“ That‘ll be fun! „Use spare compute
cores at Amazon?“ breathe load into the cloud „Reduce that index size“ „Get rid of those long running queries!“
data sharding ...
... is highly effective. 125ms 250ms 375ms 500ms 1 4
8 12 16 20 1shard 2shard 3shard 4shard 6shard 8shard concurrent requests
Sharding: size matters the bigger your index gets, the more
complex your queries are, the more concurrent requests, the more sharding you need
but wait ...
Why do we have such a big index?
7m documents vs. 2m active poducts
fashion product lifecycle meets SEO Bastografie / photocase.com
Separation of duties! Remove unsearchable data from your index.
Why do we have complex queries?
A Solr index designed for 1 portal
Grown into a multi-portal index
Let “sharding“ follow your data ...
... and build separate cores for every client.
Duplicate data as long as access is fast. andybahn /
photocase.com
Streamline your index provisioning process.
A thousand splendid cores at your fingertips.
Throwing hardware at problems. Automated.
evil traps: latency, $$
mirror your complete system – solve load balancer problems froodmat
/ photocase.com
I said faster!
use a cache layer like Varnish.
What about those complex queries? Why do we have them?
And how do we get rid of them?
Lost in encapsulation: Solr API exposed to world.
What‘s the key factor?
look at your business requirements
decrease complexity
Questions? Comments? Ideas? Twitter: @tboeghk Github: @tboeghk Email:
[email protected]
Web:
http://www.s24.com Images: sxc.hu (unless noted otherwise)