Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Apache Solr: Lessons Learned
Search
Sponsored
·
Your Podcast. Everywhere. Effortlessly.
Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.
→
Jeroen Rosenberg
June 11, 2013
Technology
2
100
Apache Solr: Lessons Learned
Lessons learned when working with (a custom version of) Solr for 3 years
Jeroen Rosenberg
June 11, 2013
Tweet
Share
More Decks by Jeroen Rosenberg
See All by Jeroen Rosenberg
Cooking your Ravioli "al dente" with Hexagonal Architecture
jeroenr
0
49
CoffeeScript
jeroenr
2
350
Websocket on Rails
jeroenr
4
590
Stop thinking, go faster
jeroenr
2
220
Git
jeroenr
3
470
Provisioning with Vagrant & Puppet
jeroenr
5
830
Monit
jeroenr
2
230
Other Decks in Technology
See All in Technology
The_Evolution_of_Bits_AI_SRE.pdf
nulabinc
PRO
0
180
DevOpsエージェントで実現する!! AWS Well-Architected(W-A) を実現するシステム設計 / 20260307 Masaki Okuda
shift_evolve
PRO
3
670
NewSQL_ ストレージ分離と分散合意を用いたスケーラブルアーキテクチャ
hacomono
PRO
3
300
わたしがセキュアにAWSを使えるわけないじゃん、ムリムリ!(※ムリじゃなかった!?)
cmusudakeisuke
1
690
進化するBits AI SREと私と組織
nulabinc
PRO
0
120
AIエージェント時代に備える AWS Organizations とアカウント設計
kossykinto
3
870
IBM Bobを使って、PostgreSQLのToDoアプリをDb2へ変換してみよう/202603_Dojo_Bob
mayumihirano
1
330
Claude Code 2026年 最新アップデート
oikon48
12
9.1k
猫でもわかるKiro CLI(AI 駆動開発への道編)
kentapapa
0
160
複数クラスタ運用と検索の高度化:ビズリーチにおけるElastic活用事例 / ElasticON Tokyo2026
visional_engineering_and_design
0
140
ランサムウエア対策してますか?やられた時の対策は本当にできてますか?AWSでのリスク分析と対応フローの泥臭いお話。
hootaki
0
110
AI実装による「レビューボトルネック」を解消する仕様駆動開発(SDD)/ ai-sdd-review-bottleneck
rakus_dev
0
110
Featured
See All Featured
Embracing the Ebb and Flow
colly
88
5k
Building Applications with DynamoDB
mza
96
7k
Utilizing Notion as your number one productivity tool
mfonobong
4
260
XXLCSS - How to scale CSS and keep your sanity
sugarenia
249
1.3M
No one is an island. Learnings from fostering a developers community.
thoeni
21
3.6k
StorybookのUI Testing Handbookを読んだ
zakiyama
31
6.6k
How to Get Subject Matter Experts Bought In and Actively Contributing to SEO & PR Initiatives.
livdayseo
0
83
Claude Code どこまでも/ Claude Code Everywhere
nwiizo
64
53k
How GitHub (no longer) Works
holman
316
140k
brightonSEO & MeasureFest 2025 - Christian Goodrich - Winning strategies for Black Friday CRO & PPC
cargoodrich
3
120
KATA
mclloyd
PRO
35
15k
Evolving SEO for Evolving Search Engines
ryanjones
0
150
Transcript
lessons learned Solr @jeroenrosenberg
Frontend of Lucene
Lucene xml/json api + field types + caching + faceting
+ grouping +
Indexing
Indexing
Lucene's inverted index
Efficient when many docs share the same value
Field types
<field name="id" type="string" indexed="true" stored="true" required=" true" multiValued="false"/> <field name="name"
type="string" indexed="false" stored="true" required="true" multiValued="false"/> Field type definition
<field name="id" type="string" indexed="true" stored="true" required=" true" multiValued="false"/> <field name="name"
type="string" indexed="false" stored="true" required="true" multiValued="false"/> Field type definition
... <fieldtype name="pdate" class="solr.DateField" sortMissingLast="true"/> ... <field name="date" type="pdate" indexed="false"
stored="true"/> <field name="range_date" type="pdate" indexed="true" stored="false"/> <copyField source="date" dest="range_date"/> Field type definition
... <fieldtype name="pdate" class="solr.DateField" sortMissingLast="true"/> ... <field name="date" type="pdate" indexed="false"
stored="true"/> <field name="range_date" type="pdate" indexed="true" stored="false"/> <copyField source="date" dest="range_date"/> Field type definition
<dynamicField name="*_s" type="string" indexed="true" stored="true"/> Schemaless
Segments
Tune the merge factor
Max. # of segments Faster search, but slower indexing Faster
indexing, but slower search
Don't commit. Ever.
Don't commit often.
Sharding
Manual distribution
foo foo foo core1 core2 core3 Index distributor replication
Look Ma, no downtime!
q=name:hotel1& shards=solr2:7070/solr/foo,solr3: 7070/solr/foo& partialResults=true Distributed search
<requestHandler name="distributedSearch" class="solr.SearchHandler" default="false"> <lst name="defaults"> <int name="rows">10</int> <str name="fl">*</str>
<bool name="partialResults">true</bool> <str name="shards">solr2:7070/solr/foo,solr3:7070/solr/foo</str> </lst> </requestHandler> Distributed search config
<requestHandler name="distributedSearch" class="solr.SearchHandler" default="false"> <lst name="defaults"> <int name="rows">10</int> <str name="fl">*</str>
<bool name="partialResults">true</bool> <str name="shards">solr2:7070/solr/foo,solr3:7070/solr/foo</str> </lst> </requestHandler> Distributed search config
<requestHandler name="distributedSearch" class="solr.SearchHandler" default false"> <lst name="defaults"> <int name="rows">10</int> <str
name="fl">*</str> <bool name="partialResults">true</bool> <str name="shards">solr2:7070/solr/foo,solr3:7070/solr/foo</str> </lst> </requestHandler> Distributed search config
q=name:hotel1&qt=distributedSearch Distributed search
Caching
Field value Filter Document Query result
Document Field value Query result Filter Doc ids of results
per filter query
Query result Document Filter Field value Field names (facets) mapped
to mapping of doc ids to terms
Field value Filter Document Query result Ordered set of doc
ids of top N results
Field value Filter Query result Document Stored fields for each
doc
Autowarming
None
q=*:*&fq=country:AN&fq=duration:[1 TO *]& fq=date:[NOW TO 2013-07-01T00:00:00Z] Filter queries...
q=*:*&fq=country:AN&fq=duration:[1 TO *]& fq=date:[NOW TO 2013-07-01T00:00:00Z] Match all documents q=*:*
q=*:*&fq=country:AN&fq=duration:[1 TO *]& fq=date:[NOW TO 2013-07-01T00:00:00Z] Filter by field value
fq=country:AN
q=*:*&fq=country:AN&fq=duration:[1 TO *]& fq=date:[NOW TO 2013-07-01T00:00:00Z] Range query with wildcard
fq=duration:[1 TO *] range query using DateMath syntax fq=date:[NOW TO 2013-07-01T00:00:00Z]
q=*:*&rows=10000000 Getting all results
Faceting
rows=0&facet=true&facet.field=departureairport& facet.field=touroperator&facet.limit=-1& facet.mincount=1&f.touroperator.facet.limit=2 A facet query...
rows=0&facet=true&facet.field=departureairport& facet.field=touroperator&facet.limit=-1& facet.mincount=1&f.touroperator.facet.limit=2 Enable faceting facet=true
rows=0&facet=true&facet.field=departureairport& facet.field=touroperator&facet.limit=-1& facet.mincount=1&f.touroperator.facet.limit=2 rows=0 Suppress document results
rows=0&facet=true&facet.field=departureairport& facet.field=touroperator&facet.limit=-1& facet.mincount=1&f.touroperator.facet.limit=2 facet.field=departureairport Specify a field name ...and another
one facet.field=touroperator
rows=0&facet=true&facet.field=departureairport& facet.field=touroperator&facet.limit=-1& facet.mincount=1&f.touroperator.facet.limit=2 Unlimited field values (globally) facet.limit=-1
rows=0&facet=true&facet.field=departureairport& facet.field=touroperator&facet.limit=-1& facet.mincount=1&f.touroperator.facet.limit=2 Unlimited field values (globally) facet.limit=-1 Basically, always
a good idea
rows=0&facet=true&facet.field=departureairport& facet.field=touroperator&facet.limit=-1& facet.mincount=1&f.touroperator.facet.limit=2 Override global limit for specific field names
f.touroperator.facet.limit=2
rows=0&facet=true&facet.field=departureairport& facet.field=touroperator&facet.limit=-1& facet.mincount=1&f.touroperator.facet.limit=2 At least 1 document per field value
facet.mincount=1
q=*:*&fq={!tag=country}country:AN&facet=true& facet.field={!ex=country}country&facet.limit=-1& facet.mincount=1 Multi-select faceting...
q=*:*&fq={!tag=country}country:AN&facet=true& facet.field={!ex=country}country&facet.limit=-1& facet.mincount=1 fq={!tag=country}country:AN Tag a filter query... ...and exclude
it for a field value facet.field={!ex=country}country
FACET ALL THE THINGS! FACET ALL THE THINGS!
Grouping
group=true&group.field=accoid& group.sort=price asc&sort=popularity asc& group.facets=UNGROUPED A grouping query...
group=true&group.field=accoid& group.sort=price asc&sort=popularity asc& group.facets=UNGROUPED Enable grouping group=true
group=true&group.field=accoid& group.sort=price asc&sort=popularity asc& group.facets=UNGROUPED Specify the field name group.field=accoid
group=true&group.field=accoid& group.sort=price asc&sort=popularity asc& group.facets=UNGROUPED Determines group head group.sort=price asc
Determine order of document results sort=popularity asc
group=true&group.field=accoid& group.sort=price asc&sort=popularity asc& group.facets=UNGROUPED Determines group head group.sort=price asc
Determine order of document results sort=popularity asc Only group heads are returned!
ONE DOES NOT SIMPLY EXPLAIN SOLR QUERIES ONE DOES NOT
SIMPLY EXPLAIN SOLR QUERIES
debugQuery=true
Solr 4.3 is coming http://docs.lucidworks.com/display/solr/Major+Changes+from+Solr+3+to+Solr+4
Queries?