Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Apache Solr: Lessons Learned
Search
Jeroen Rosenberg
June 11, 2013
Technology
100
2
Share
Apache Solr: Lessons Learned
Lessons learned when working with (a custom version of) Solr for 3 years
Jeroen Rosenberg
June 11, 2013
More Decks by Jeroen Rosenberg
See All by Jeroen Rosenberg
Cooking your Ravioli "al dente" with Hexagonal Architecture
jeroenr
0
50
CoffeeScript
jeroenr
2
350
Websocket on Rails
jeroenr
4
590
Stop thinking, go faster
jeroenr
2
220
Git
jeroenr
3
470
Provisioning with Vagrant & Puppet
jeroenr
5
830
Monit
jeroenr
2
230
Other Decks in Technology
See All in Technology
LLMに何を任せ、何を任せないか
cap120
11
6.7k
Tour of Agent Protocols: MCP, A2A, AG-UI, A2UI with ADK
meteatamel
0
160
Zephyr(RTOS)でARMとRISC-Vのコア間通信をしてみた
iotengineer22
0
110
Kubernetesの「隠れメモリ消費」によるNode共倒れと、Request適正化という処方箋
g0xu
0
170
Databricks Appsで実現する社内向けAIアプリ開発の効率化
r_miura
0
160
【AWS】CloudTrail LakeとCloudWatch Logs Insightsの使い分け方針
tsurunosd
0
130
BFCacheを活用して無限スクロールのUX を改善した話
apple_yagi
0
140
Oracle Cloud Infrastructure:2026年3月度サービス・アップデート
oracle4engineer
PRO
0
220
ハーネスエンジニアリング×AI適応開発
aictokamiya
1
910
Datadog で実現するセキュリティ対策 ~オブザーバビリティとセキュリティを 一緒にやると何がいいのか~
a2ush
0
180
Bref でサービスを運用している話
sgash708
0
210
FastMCP OAuth Proxy with Cognito
hironobuiga
3
230
Featured
See All Featured
CoffeeScript is Beautiful & I Never Want to Write Plain JavaScript Again
sstephenson
162
16k
Mozcon NYC 2025: Stop Losing SEO Traffic
samtorres
0
190
Conquering PDFs: document understanding beyond plain text
inesmontani
PRO
4
2.5k
個人開発の失敗を避けるイケてる考え方 / tips for indie hackers
panda_program
122
21k
Evolution of real-time – Irina Nazarova, EuRuKo, 2024
irinanazarova
9
1.2k
How to optimise 3,500 product descriptions for ecommerce in one day using ChatGPT
katarinadahlin
PRO
1
3.5k
Building Better People: How to give real-time feedback that sticks.
wjessup
370
20k
How To Stay Up To Date on Web Technology
chriscoyier
790
250k
Fight the Zombie Pattern Library - RWD Summit 2016
marcelosomers
234
17k
Claude Code のすすめ
schroneko
67
220k
Measuring Dark Social's Impact On Conversion and Attribution
stephenakadiri
1
170
The browser strikes back
jonoalderson
0
860
Transcript
lessons learned Solr @jeroenrosenberg
Frontend of Lucene
Lucene xml/json api + field types + caching + faceting
+ grouping +
Indexing
Indexing
Lucene's inverted index
Efficient when many docs share the same value
Field types
<field name="id" type="string" indexed="true" stored="true" required=" true" multiValued="false"/> <field name="name"
type="string" indexed="false" stored="true" required="true" multiValued="false"/> Field type definition
<field name="id" type="string" indexed="true" stored="true" required=" true" multiValued="false"/> <field name="name"
type="string" indexed="false" stored="true" required="true" multiValued="false"/> Field type definition
... <fieldtype name="pdate" class="solr.DateField" sortMissingLast="true"/> ... <field name="date" type="pdate" indexed="false"
stored="true"/> <field name="range_date" type="pdate" indexed="true" stored="false"/> <copyField source="date" dest="range_date"/> Field type definition
... <fieldtype name="pdate" class="solr.DateField" sortMissingLast="true"/> ... <field name="date" type="pdate" indexed="false"
stored="true"/> <field name="range_date" type="pdate" indexed="true" stored="false"/> <copyField source="date" dest="range_date"/> Field type definition
<dynamicField name="*_s" type="string" indexed="true" stored="true"/> Schemaless
Segments
Tune the merge factor
Max. # of segments Faster search, but slower indexing Faster
indexing, but slower search
Don't commit. Ever.
Don't commit often.
Sharding
Manual distribution
foo foo foo core1 core2 core3 Index distributor replication
Look Ma, no downtime!
q=name:hotel1& shards=solr2:7070/solr/foo,solr3: 7070/solr/foo& partialResults=true Distributed search
<requestHandler name="distributedSearch" class="solr.SearchHandler" default="false"> <lst name="defaults"> <int name="rows">10</int> <str name="fl">*</str>
<bool name="partialResults">true</bool> <str name="shards">solr2:7070/solr/foo,solr3:7070/solr/foo</str> </lst> </requestHandler> Distributed search config
<requestHandler name="distributedSearch" class="solr.SearchHandler" default="false"> <lst name="defaults"> <int name="rows">10</int> <str name="fl">*</str>
<bool name="partialResults">true</bool> <str name="shards">solr2:7070/solr/foo,solr3:7070/solr/foo</str> </lst> </requestHandler> Distributed search config
<requestHandler name="distributedSearch" class="solr.SearchHandler" default false"> <lst name="defaults"> <int name="rows">10</int> <str
name="fl">*</str> <bool name="partialResults">true</bool> <str name="shards">solr2:7070/solr/foo,solr3:7070/solr/foo</str> </lst> </requestHandler> Distributed search config
q=name:hotel1&qt=distributedSearch Distributed search
Caching
Field value Filter Document Query result
Document Field value Query result Filter Doc ids of results
per filter query
Query result Document Filter Field value Field names (facets) mapped
to mapping of doc ids to terms
Field value Filter Document Query result Ordered set of doc
ids of top N results
Field value Filter Query result Document Stored fields for each
doc
Autowarming
None
q=*:*&fq=country:AN&fq=duration:[1 TO *]& fq=date:[NOW TO 2013-07-01T00:00:00Z] Filter queries...
q=*:*&fq=country:AN&fq=duration:[1 TO *]& fq=date:[NOW TO 2013-07-01T00:00:00Z] Match all documents q=*:*
q=*:*&fq=country:AN&fq=duration:[1 TO *]& fq=date:[NOW TO 2013-07-01T00:00:00Z] Filter by field value
fq=country:AN
q=*:*&fq=country:AN&fq=duration:[1 TO *]& fq=date:[NOW TO 2013-07-01T00:00:00Z] Range query with wildcard
fq=duration:[1 TO *] range query using DateMath syntax fq=date:[NOW TO 2013-07-01T00:00:00Z]
q=*:*&rows=10000000 Getting all results
Faceting
rows=0&facet=true&facet.field=departureairport& facet.field=touroperator&facet.limit=-1& facet.mincount=1&f.touroperator.facet.limit=2 A facet query...
rows=0&facet=true&facet.field=departureairport& facet.field=touroperator&facet.limit=-1& facet.mincount=1&f.touroperator.facet.limit=2 Enable faceting facet=true
rows=0&facet=true&facet.field=departureairport& facet.field=touroperator&facet.limit=-1& facet.mincount=1&f.touroperator.facet.limit=2 rows=0 Suppress document results
rows=0&facet=true&facet.field=departureairport& facet.field=touroperator&facet.limit=-1& facet.mincount=1&f.touroperator.facet.limit=2 facet.field=departureairport Specify a field name ...and another
one facet.field=touroperator
rows=0&facet=true&facet.field=departureairport& facet.field=touroperator&facet.limit=-1& facet.mincount=1&f.touroperator.facet.limit=2 Unlimited field values (globally) facet.limit=-1
rows=0&facet=true&facet.field=departureairport& facet.field=touroperator&facet.limit=-1& facet.mincount=1&f.touroperator.facet.limit=2 Unlimited field values (globally) facet.limit=-1 Basically, always
a good idea
rows=0&facet=true&facet.field=departureairport& facet.field=touroperator&facet.limit=-1& facet.mincount=1&f.touroperator.facet.limit=2 Override global limit for specific field names
f.touroperator.facet.limit=2
rows=0&facet=true&facet.field=departureairport& facet.field=touroperator&facet.limit=-1& facet.mincount=1&f.touroperator.facet.limit=2 At least 1 document per field value
facet.mincount=1
q=*:*&fq={!tag=country}country:AN&facet=true& facet.field={!ex=country}country&facet.limit=-1& facet.mincount=1 Multi-select faceting...
q=*:*&fq={!tag=country}country:AN&facet=true& facet.field={!ex=country}country&facet.limit=-1& facet.mincount=1 fq={!tag=country}country:AN Tag a filter query... ...and exclude
it for a field value facet.field={!ex=country}country
FACET ALL THE THINGS! FACET ALL THE THINGS!
Grouping
group=true&group.field=accoid& group.sort=price asc&sort=popularity asc& group.facets=UNGROUPED A grouping query...
group=true&group.field=accoid& group.sort=price asc&sort=popularity asc& group.facets=UNGROUPED Enable grouping group=true
group=true&group.field=accoid& group.sort=price asc&sort=popularity asc& group.facets=UNGROUPED Specify the field name group.field=accoid
group=true&group.field=accoid& group.sort=price asc&sort=popularity asc& group.facets=UNGROUPED Determines group head group.sort=price asc
Determine order of document results sort=popularity asc
group=true&group.field=accoid& group.sort=price asc&sort=popularity asc& group.facets=UNGROUPED Determines group head group.sort=price asc
Determine order of document results sort=popularity asc Only group heads are returned!
ONE DOES NOT SIMPLY EXPLAIN SOLR QUERIES ONE DOES NOT
SIMPLY EXPLAIN SOLR QUERIES
debugQuery=true
Solr 4.3 is coming http://docs.lucidworks.com/display/solr/Major+Changes+from+Solr+3+to+Solr+4
Queries?