Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Refactoring a Solr based api application
Search
Torsten Bøgh Köster
April 13, 2012
Programming
3
110
Refactoring a Solr based api application
Held on Apache Lucene Eurocon 2011 in Barcelona
Torsten Bøgh Köster
April 13, 2012
Tweet
Share
More Decks by Torsten Bøgh Köster
See All by Torsten Bøgh Köster
LLMs im Griff: Observability, Tracing und Security
tboeghk
0
4
Oder mache ich es lieber selbst? Wie sich Kosten und Geopolitik auf Cloud-Betrieb auswirken
tboeghk
0
10
Taking an abandoned Solr search from zero to GenAI hero
tboeghk
0
35
Oder mache ich es lieber selbst? Wie sich Kosten und Geopolitik auf Cloud-Betrieb auswirken
tboeghk
0
39
🔪 How we cut our AWS costs in half
tboeghk
0
320
Shared Nothing Logging Infrastructure
tboeghk
0
120
Beyond Cloud: A road trip into AWS and back to bare metal
tboeghk
1
110
Shared Nothing Logging Infrastructure
tboeghk
0
1.3k
Kubernetes the ❤️ way
tboeghk
0
1.1k
Other Decks in Programming
See All in Programming
LLMで複雑な検索条件アセットから脱却する!! 生成的検索インタフェースの設計論
po3rin
4
1k
GoLab2025 Recap
kuro_kurorrr
0
790
Flutter On-device AI로 완성하는 오프라인 앱, 박제창 @DevFest INCHEON 2025
itsmedreamwalker
1
170
「コードは上から下へ読むのが一番」と思った時に、思い出してほしい話
panda728
PRO
39
26k
Go コードベースの構成と AI コンテキスト定義
andpad
0
150
大規模Cloud Native環境におけるFalcoの運用
owlinux1000
0
230
AI時代を生き抜く 新卒エンジニアの生きる道
coconala_engineer
1
480
[AI Engineering Summit Tokyo 2025] LLMは計画業務のゲームチェンジャーか? 最適化業務における活⽤の可能性と限界
terryu16
1
140
公共交通オープンデータ × モバイルUX 複雑な運行情報を 『直感』に変換する技術
tinykitten
PRO
0
170
脳の「省エネモード」をデバッグする ~System 1(直感)と System 2(論理)の切り替え~
panda728
PRO
0
130
CSC307 Lecture 01
javiergs
PRO
0
630
Cap'n Webについて
yusukebe
0
160
Featured
See All Featured
Joys of Absence: A Defence of Solitary Play
codingconduct
1
260
The innovator’s Mindset - Leading Through an Era of Exponential Change - McGill University 2025
jdejongh
PRO
1
74
Building Better People: How to give real-time feedback that sticks.
wjessup
370
20k
From Legacy to Launchpad: Building Startup-Ready Communities
dugsong
0
120
Refactoring Trust on Your Teams (GOTO; Chicago 2020)
rmw
35
3.3k
Test your architecture with Archunit
thirion
1
2.1k
技術選定の審美眼(2025年版) / Understanding the Spiral of Technologies 2025 edition
twada
PRO
115
100k
Intergalactic Javascript Robots from Outer Space
tanoku
273
27k
Building a Modern Day E-commerce SEO Strategy
aleyda
45
8.4k
So, you think you're a good person
axbom
PRO
0
1.9k
Navigating Weather and Climate Data
rabernat
0
58
RailsConf & Balkan Ruby 2019: The Past, Present, and Future of Rails at GitHub
eileencodes
141
34k
Transcript
Architectural lessons learned from refactoring a Solr based API application.
Torsten Bøgh Köster (Shopping24) Apache Lucene Eurocon, 19.10.2011
Contents Shopping24 and it‘s API Technical scaling solutions Sharding Caching
Solr Cores „Elastic“ infrastructure business requirements as key factor
@tboeghk Software- and systems- architect 2 years experience with Solr
3 years experience with Lucene Team of 7 Java developers currently at Shopping24
shopping24 internet group
1 portal became n portals
30 partner shops became 700
500k to 7m documents
index fact time •16 Gig Data •Single-Core-Layout •Up to 17s
response time •Machine size limited •Stalled at solr version 1.4 •API designed for small tools
scaling goal: 15-50m documents
ask the nerds „Shard!“ That‘ll be fun! „Use spare compute
cores at Amazon?“ breathe load into the cloud „Reduce that index size“ „Get rid of those long running queries!“
data sharding ...
... is highly effective. 125ms 250ms 375ms 500ms 1 4
8 12 16 20 1shard 2shard 3shard 4shard 6shard 8shard concurrent requests
Sharding: size matters the bigger your index gets, the more
complex your queries are, the more concurrent requests, the more sharding you need
but wait ...
Why do we have such a big index?
7m documents vs. 2m active poducts
fashion product lifecycle meets SEO Bastografie / photocase.com
Separation of duties! Remove unsearchable data from your index.
Why do we have complex queries?
A Solr index designed for 1 portal
Grown into a multi-portal index
Let “sharding“ follow your data ...
... and build separate cores for every client.
Duplicate data as long as access is fast. andybahn /
photocase.com
Streamline your index provisioning process.
A thousand splendid cores at your fingertips.
Throwing hardware at problems. Automated.
evil traps: latency, $$
mirror your complete system – solve load balancer problems froodmat
/ photocase.com
I said faster!
use a cache layer like Varnish.
What about those complex queries? Why do we have them?
And how do we get rid of them?
Lost in encapsulation: Solr API exposed to world.
What‘s the key factor?
look at your business requirements
decrease complexity
Questions? Comments? Ideas? Twitter: @tboeghk Github: @tboeghk Email:
[email protected]
Web:
http://www.s24.com Images: sxc.hu (unless noted otherwise)