Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Apache Solr: Lessons Learned
Search
Jeroen Rosenberg
June 11, 2013
Technology
2
100
Apache Solr: Lessons Learned
Lessons learned when working with (a custom version of) Solr for 3 years
Jeroen Rosenberg
June 11, 2013
Tweet
Share
More Decks by Jeroen Rosenberg
See All by Jeroen Rosenberg
Cooking your Ravioli "al dente" with Hexagonal Architecture
jeroenr
0
36
CoffeeScript
jeroenr
2
330
Websocket on Rails
jeroenr
4
580
Stop thinking, go faster
jeroenr
2
210
Git
jeroenr
3
460
Provisioning with Vagrant & Puppet
jeroenr
5
820
Monit
jeroenr
2
230
Other Decks in Technology
See All in Technology
ここ一年のCCoEとしてのAWSコスト最適化を振り返る / CCoE AWS Cost Optimization devio2025
masahirokawahara
1
630
Jaws-ug名古屋_LT資料_20250829
azoo2024
3
190
認知戦の理解と、市民としての対抗策
hogehuga
0
420
実践アプリケーション設計 ②トランザクションスクリプトへの対応
recruitengineers
PRO
4
1.1k
カミナシ社の『ID管理基盤』製品内製 - その意思決定背景と2年間の進化 #AWSUnicornDay / Kaminashi ID - The Big Whys
kaminashi
3
610
知られざるprops命名の慣習 アクション編
uhyo
11
2.8k
AWSで推進するデータマネジメント
kawanago
0
230
KiroでGameDay開催してみよう(準備編)
yuuuuuuu168
1
160
Microsoft Fabric のネットワーク保護のアップデートについて
ryomaru0825
1
120
トヨタ生産方式(TPS)入門
recruitengineers
PRO
5
1.3k
Grafana MCPサーバーによるAIエージェント経由でのGrafanaダッシュボード動的生成
hamadakoji
0
710
異業種出身エンジニアが気づいた、転向して十数年経っても変わらない自分の武器とは
macnekoayu
0
230
Featured
See All Featured
Docker and Python
trallard
45
3.5k
Fireside Chat
paigeccino
39
3.6k
Product Roadmaps are Hard
iamctodd
PRO
54
11k
Stop Working from a Prison Cell
hatefulcrawdad
271
21k
Building Applications with DynamoDB
mza
96
6.6k
Why Our Code Smells
bkeepers
PRO
339
57k
Optimizing for Happiness
mojombo
379
70k
YesSQL, Process and Tooling at Scale
rocio
173
14k
The Cult of Friendly URLs
andyhume
79
6.6k
Imperfection Machines: The Place of Print at Facebook
scottboms
268
13k
Refactoring Trust on Your Teams (GOTO; Chicago 2020)
rmw
34
3.1k
We Have a Design System, Now What?
morganepeng
53
7.8k
Transcript
lessons learned Solr @jeroenrosenberg
Frontend of Lucene
Lucene xml/json api + field types + caching + faceting
+ grouping +
Indexing
Indexing
Lucene's inverted index
Efficient when many docs share the same value
Field types
<field name="id" type="string" indexed="true" stored="true" required=" true" multiValued="false"/> <field name="name"
type="string" indexed="false" stored="true" required="true" multiValued="false"/> Field type definition
<field name="id" type="string" indexed="true" stored="true" required=" true" multiValued="false"/> <field name="name"
type="string" indexed="false" stored="true" required="true" multiValued="false"/> Field type definition
... <fieldtype name="pdate" class="solr.DateField" sortMissingLast="true"/> ... <field name="date" type="pdate" indexed="false"
stored="true"/> <field name="range_date" type="pdate" indexed="true" stored="false"/> <copyField source="date" dest="range_date"/> Field type definition
... <fieldtype name="pdate" class="solr.DateField" sortMissingLast="true"/> ... <field name="date" type="pdate" indexed="false"
stored="true"/> <field name="range_date" type="pdate" indexed="true" stored="false"/> <copyField source="date" dest="range_date"/> Field type definition
<dynamicField name="*_s" type="string" indexed="true" stored="true"/> Schemaless
Segments
Tune the merge factor
Max. # of segments Faster search, but slower indexing Faster
indexing, but slower search
Don't commit. Ever.
Don't commit often.
Sharding
Manual distribution
foo foo foo core1 core2 core3 Index distributor replication
Look Ma, no downtime!
q=name:hotel1& shards=solr2:7070/solr/foo,solr3: 7070/solr/foo& partialResults=true Distributed search
<requestHandler name="distributedSearch" class="solr.SearchHandler" default="false"> <lst name="defaults"> <int name="rows">10</int> <str name="fl">*</str>
<bool name="partialResults">true</bool> <str name="shards">solr2:7070/solr/foo,solr3:7070/solr/foo</str> </lst> </requestHandler> Distributed search config
<requestHandler name="distributedSearch" class="solr.SearchHandler" default="false"> <lst name="defaults"> <int name="rows">10</int> <str name="fl">*</str>
<bool name="partialResults">true</bool> <str name="shards">solr2:7070/solr/foo,solr3:7070/solr/foo</str> </lst> </requestHandler> Distributed search config
<requestHandler name="distributedSearch" class="solr.SearchHandler" default false"> <lst name="defaults"> <int name="rows">10</int> <str
name="fl">*</str> <bool name="partialResults">true</bool> <str name="shards">solr2:7070/solr/foo,solr3:7070/solr/foo</str> </lst> </requestHandler> Distributed search config
q=name:hotel1&qt=distributedSearch Distributed search
Caching
Field value Filter Document Query result
Document Field value Query result Filter Doc ids of results
per filter query
Query result Document Filter Field value Field names (facets) mapped
to mapping of doc ids to terms
Field value Filter Document Query result Ordered set of doc
ids of top N results
Field value Filter Query result Document Stored fields for each
doc
Autowarming
None
q=*:*&fq=country:AN&fq=duration:[1 TO *]& fq=date:[NOW TO 2013-07-01T00:00:00Z] Filter queries...
q=*:*&fq=country:AN&fq=duration:[1 TO *]& fq=date:[NOW TO 2013-07-01T00:00:00Z] Match all documents q=*:*
q=*:*&fq=country:AN&fq=duration:[1 TO *]& fq=date:[NOW TO 2013-07-01T00:00:00Z] Filter by field value
fq=country:AN
q=*:*&fq=country:AN&fq=duration:[1 TO *]& fq=date:[NOW TO 2013-07-01T00:00:00Z] Range query with wildcard
fq=duration:[1 TO *] range query using DateMath syntax fq=date:[NOW TO 2013-07-01T00:00:00Z]
q=*:*&rows=10000000 Getting all results
Faceting
rows=0&facet=true&facet.field=departureairport& facet.field=touroperator&facet.limit=-1& facet.mincount=1&f.touroperator.facet.limit=2 A facet query...
rows=0&facet=true&facet.field=departureairport& facet.field=touroperator&facet.limit=-1& facet.mincount=1&f.touroperator.facet.limit=2 Enable faceting facet=true
rows=0&facet=true&facet.field=departureairport& facet.field=touroperator&facet.limit=-1& facet.mincount=1&f.touroperator.facet.limit=2 rows=0 Suppress document results
rows=0&facet=true&facet.field=departureairport& facet.field=touroperator&facet.limit=-1& facet.mincount=1&f.touroperator.facet.limit=2 facet.field=departureairport Specify a field name ...and another
one facet.field=touroperator
rows=0&facet=true&facet.field=departureairport& facet.field=touroperator&facet.limit=-1& facet.mincount=1&f.touroperator.facet.limit=2 Unlimited field values (globally) facet.limit=-1
rows=0&facet=true&facet.field=departureairport& facet.field=touroperator&facet.limit=-1& facet.mincount=1&f.touroperator.facet.limit=2 Unlimited field values (globally) facet.limit=-1 Basically, always
a good idea
rows=0&facet=true&facet.field=departureairport& facet.field=touroperator&facet.limit=-1& facet.mincount=1&f.touroperator.facet.limit=2 Override global limit for specific field names
f.touroperator.facet.limit=2
rows=0&facet=true&facet.field=departureairport& facet.field=touroperator&facet.limit=-1& facet.mincount=1&f.touroperator.facet.limit=2 At least 1 document per field value
facet.mincount=1
q=*:*&fq={!tag=country}country:AN&facet=true& facet.field={!ex=country}country&facet.limit=-1& facet.mincount=1 Multi-select faceting...
q=*:*&fq={!tag=country}country:AN&facet=true& facet.field={!ex=country}country&facet.limit=-1& facet.mincount=1 fq={!tag=country}country:AN Tag a filter query... ...and exclude
it for a field value facet.field={!ex=country}country
FACET ALL THE THINGS! FACET ALL THE THINGS!
Grouping
group=true&group.field=accoid& group.sort=price asc&sort=popularity asc& group.facets=UNGROUPED A grouping query...
group=true&group.field=accoid& group.sort=price asc&sort=popularity asc& group.facets=UNGROUPED Enable grouping group=true
group=true&group.field=accoid& group.sort=price asc&sort=popularity asc& group.facets=UNGROUPED Specify the field name group.field=accoid
group=true&group.field=accoid& group.sort=price asc&sort=popularity asc& group.facets=UNGROUPED Determines group head group.sort=price asc
Determine order of document results sort=popularity asc
group=true&group.field=accoid& group.sort=price asc&sort=popularity asc& group.facets=UNGROUPED Determines group head group.sort=price asc
Determine order of document results sort=popularity asc Only group heads are returned!
ONE DOES NOT SIMPLY EXPLAIN SOLR QUERIES ONE DOES NOT
SIMPLY EXPLAIN SOLR QUERIES
debugQuery=true
Solr 4.3 is coming http://docs.lucidworks.com/display/solr/Major+Changes+from+Solr+3+to+Solr+4
Queries?