Upgrade to PRO for Only $50/Year—Limited-Time Offer! 🔥
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Apache Solr: Lessons Learned
Search
Jeroen Rosenberg
June 11, 2013
Technology
2
100
Apache Solr: Lessons Learned
Lessons learned when working with (a custom version of) Solr for 3 years
Jeroen Rosenberg
June 11, 2013
Tweet
Share
More Decks by Jeroen Rosenberg
See All by Jeroen Rosenberg
Cooking your Ravioli "al dente" with Hexagonal Architecture
jeroenr
0
43
CoffeeScript
jeroenr
2
340
Websocket on Rails
jeroenr
4
580
Stop thinking, go faster
jeroenr
2
220
Git
jeroenr
3
460
Provisioning with Vagrant & Puppet
jeroenr
5
820
Monit
jeroenr
2
230
Other Decks in Technology
See All in Technology
AWS CLIの新しい認証情報設定方法aws loginコマンドの実態
wkm2
6
690
regrowth_tokyo_2025_securityagent
hiashisan
0
210
日本Rubyの会の構造と実行とあと何か / hokurikurk01
takahashim
4
1k
Haskell を武器にして挑む競技プログラミング ─ 操作的思考から意味モデル思考へ
naoya
6
1.4k
直接メモリアクセス
koba789
0
290
品質のための共通認識
kakehashi
PRO
3
240
因果AIへの招待
sshimizu2006
0
940
Edge AI Performance on Zephyr Pico vs. Pico 2
iotengineer22
0
120
意外とあった SQL Server 関連アップデート + Database Savings Plans
stknohg
PRO
0
300
AI活用によるPRレビュー改善の歩み ― 社内全体に広がる学びと実践
lycorptech_jp
PRO
1
200
re:Invent2025 コンテナ系アップデート振り返り(+CloudWatchログのアップデート紹介)
masukawa
0
330
AWSセキュリティアップデートとAWSを育てる話
cmusudakeisuke
0
210
Featured
See All Featured
Design and Strategy: How to Deal with People Who Don’t "Get" Design
morganepeng
132
19k
Fashionably flexible responsive web design (full day workshop)
malarkey
407
66k
Practical Tips for Bootstrapping Information Extraction Pipelines
honnibal
25
1.6k
Balancing Empowerment & Direction
lara
5
790
Git: the NoSQL Database
bkeepers
PRO
432
66k
Let's Do A Bunch of Simple Stuff to Make Websites Faster
chriscoyier
508
140k
How GitHub (no longer) Works
holman
316
140k
The Psychology of Web Performance [Beyond Tellerrand 2023]
tammyeverts
49
3.2k
Measuring & Analyzing Core Web Vitals
bluesmoon
9
700
A Modern Web Designer's Workflow
chriscoyier
698
190k
Statistics for Hackers
jakevdp
799
230k
[RailsConf 2023] Rails as a piece of cake
palkan
58
6.1k
Transcript
lessons learned Solr @jeroenrosenberg
Frontend of Lucene
Lucene xml/json api + field types + caching + faceting
+ grouping +
Indexing
Indexing
Lucene's inverted index
Efficient when many docs share the same value
Field types
<field name="id" type="string" indexed="true" stored="true" required=" true" multiValued="false"/> <field name="name"
type="string" indexed="false" stored="true" required="true" multiValued="false"/> Field type definition
<field name="id" type="string" indexed="true" stored="true" required=" true" multiValued="false"/> <field name="name"
type="string" indexed="false" stored="true" required="true" multiValued="false"/> Field type definition
... <fieldtype name="pdate" class="solr.DateField" sortMissingLast="true"/> ... <field name="date" type="pdate" indexed="false"
stored="true"/> <field name="range_date" type="pdate" indexed="true" stored="false"/> <copyField source="date" dest="range_date"/> Field type definition
... <fieldtype name="pdate" class="solr.DateField" sortMissingLast="true"/> ... <field name="date" type="pdate" indexed="false"
stored="true"/> <field name="range_date" type="pdate" indexed="true" stored="false"/> <copyField source="date" dest="range_date"/> Field type definition
<dynamicField name="*_s" type="string" indexed="true" stored="true"/> Schemaless
Segments
Tune the merge factor
Max. # of segments Faster search, but slower indexing Faster
indexing, but slower search
Don't commit. Ever.
Don't commit often.
Sharding
Manual distribution
foo foo foo core1 core2 core3 Index distributor replication
Look Ma, no downtime!
q=name:hotel1& shards=solr2:7070/solr/foo,solr3: 7070/solr/foo& partialResults=true Distributed search
<requestHandler name="distributedSearch" class="solr.SearchHandler" default="false"> <lst name="defaults"> <int name="rows">10</int> <str name="fl">*</str>
<bool name="partialResults">true</bool> <str name="shards">solr2:7070/solr/foo,solr3:7070/solr/foo</str> </lst> </requestHandler> Distributed search config
<requestHandler name="distributedSearch" class="solr.SearchHandler" default="false"> <lst name="defaults"> <int name="rows">10</int> <str name="fl">*</str>
<bool name="partialResults">true</bool> <str name="shards">solr2:7070/solr/foo,solr3:7070/solr/foo</str> </lst> </requestHandler> Distributed search config
<requestHandler name="distributedSearch" class="solr.SearchHandler" default false"> <lst name="defaults"> <int name="rows">10</int> <str
name="fl">*</str> <bool name="partialResults">true</bool> <str name="shards">solr2:7070/solr/foo,solr3:7070/solr/foo</str> </lst> </requestHandler> Distributed search config
q=name:hotel1&qt=distributedSearch Distributed search
Caching
Field value Filter Document Query result
Document Field value Query result Filter Doc ids of results
per filter query
Query result Document Filter Field value Field names (facets) mapped
to mapping of doc ids to terms
Field value Filter Document Query result Ordered set of doc
ids of top N results
Field value Filter Query result Document Stored fields for each
doc
Autowarming
None
q=*:*&fq=country:AN&fq=duration:[1 TO *]& fq=date:[NOW TO 2013-07-01T00:00:00Z] Filter queries...
q=*:*&fq=country:AN&fq=duration:[1 TO *]& fq=date:[NOW TO 2013-07-01T00:00:00Z] Match all documents q=*:*
q=*:*&fq=country:AN&fq=duration:[1 TO *]& fq=date:[NOW TO 2013-07-01T00:00:00Z] Filter by field value
fq=country:AN
q=*:*&fq=country:AN&fq=duration:[1 TO *]& fq=date:[NOW TO 2013-07-01T00:00:00Z] Range query with wildcard
fq=duration:[1 TO *] range query using DateMath syntax fq=date:[NOW TO 2013-07-01T00:00:00Z]
q=*:*&rows=10000000 Getting all results
Faceting
rows=0&facet=true&facet.field=departureairport& facet.field=touroperator&facet.limit=-1& facet.mincount=1&f.touroperator.facet.limit=2 A facet query...
rows=0&facet=true&facet.field=departureairport& facet.field=touroperator&facet.limit=-1& facet.mincount=1&f.touroperator.facet.limit=2 Enable faceting facet=true
rows=0&facet=true&facet.field=departureairport& facet.field=touroperator&facet.limit=-1& facet.mincount=1&f.touroperator.facet.limit=2 rows=0 Suppress document results
rows=0&facet=true&facet.field=departureairport& facet.field=touroperator&facet.limit=-1& facet.mincount=1&f.touroperator.facet.limit=2 facet.field=departureairport Specify a field name ...and another
one facet.field=touroperator
rows=0&facet=true&facet.field=departureairport& facet.field=touroperator&facet.limit=-1& facet.mincount=1&f.touroperator.facet.limit=2 Unlimited field values (globally) facet.limit=-1
rows=0&facet=true&facet.field=departureairport& facet.field=touroperator&facet.limit=-1& facet.mincount=1&f.touroperator.facet.limit=2 Unlimited field values (globally) facet.limit=-1 Basically, always
a good idea
rows=0&facet=true&facet.field=departureairport& facet.field=touroperator&facet.limit=-1& facet.mincount=1&f.touroperator.facet.limit=2 Override global limit for specific field names
f.touroperator.facet.limit=2
rows=0&facet=true&facet.field=departureairport& facet.field=touroperator&facet.limit=-1& facet.mincount=1&f.touroperator.facet.limit=2 At least 1 document per field value
facet.mincount=1
q=*:*&fq={!tag=country}country:AN&facet=true& facet.field={!ex=country}country&facet.limit=-1& facet.mincount=1 Multi-select faceting...
q=*:*&fq={!tag=country}country:AN&facet=true& facet.field={!ex=country}country&facet.limit=-1& facet.mincount=1 fq={!tag=country}country:AN Tag a filter query... ...and exclude
it for a field value facet.field={!ex=country}country
FACET ALL THE THINGS! FACET ALL THE THINGS!
Grouping
group=true&group.field=accoid& group.sort=price asc&sort=popularity asc& group.facets=UNGROUPED A grouping query...
group=true&group.field=accoid& group.sort=price asc&sort=popularity asc& group.facets=UNGROUPED Enable grouping group=true
group=true&group.field=accoid& group.sort=price asc&sort=popularity asc& group.facets=UNGROUPED Specify the field name group.field=accoid
group=true&group.field=accoid& group.sort=price asc&sort=popularity asc& group.facets=UNGROUPED Determines group head group.sort=price asc
Determine order of document results sort=popularity asc
group=true&group.field=accoid& group.sort=price asc&sort=popularity asc& group.facets=UNGROUPED Determines group head group.sort=price asc
Determine order of document results sort=popularity asc Only group heads are returned!
ONE DOES NOT SIMPLY EXPLAIN SOLR QUERIES ONE DOES NOT
SIMPLY EXPLAIN SOLR QUERIES
debugQuery=true
Solr 4.3 is coming http://docs.lucidworks.com/display/solr/Major+Changes+from+Solr+3+to+Solr+4
Queries?