Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Apache Solr: Lessons Learned
Search
Jeroen Rosenberg
June 11, 2013
Technology
2
95
Apache Solr: Lessons Learned
Lessons learned when working with (a custom version of) Solr for 3 years
Jeroen Rosenberg
June 11, 2013
Tweet
Share
More Decks by Jeroen Rosenberg
See All by Jeroen Rosenberg
Cooking your Ravioli "al dente" with Hexagonal Architecture
jeroenr
0
27
CoffeeScript
jeroenr
2
310
Websocket on Rails
jeroenr
4
570
Stop thinking, go faster
jeroenr
2
210
Git
jeroenr
3
450
Provisioning with Vagrant & Puppet
jeroenr
5
820
Monit
jeroenr
2
230
Other Decks in Technology
See All in Technology
AIエージェントについてまとめてみた
pharma_x_tech
10
6.9k
企業テックブログにおける執筆ネタの考え方・見つけ方・広げ方 / How to Think of, Find, and Expand Writing Topics for Corporate Tech Blogs
honyanya
0
810
Skip Skip Run Run Run ♫
temoki
0
360
Platform EngineeringがあればSREはいらない!? 新時代のSREに求められる役割とは
mshibuya
2
4k
Makuake*UPSIDER_LightningTalk
upsider_tech
0
200
2週に1度のビッグバンリリースをデイリーリリース化するまでの苦悩 ~急成長するスタートアップのリアルな裏側~
kworkdev
PRO
8
6.5k
Creative Pair
kawaguti
PRO
1
130
パブリッククラウドのプロダクトマネジメントとアーキテクト
tagomoris
4
770
Oracle Cloud Infrastructure:2025年1月度サービス・アップデート
oracle4engineer
PRO
0
190
Zenn のウラガワ ~エンジニアのアウトプットを支える環境で Google Cloud が採用されているワケ~ #burikaigi #burikaigi_h
kongmingstrap
18
6.8k
ハンズオンで学ぶ Databricks - Databricksにおけるデータエンジニアリング
taka_aki
1
2.1k
プロダクト観点で考えるデータ基盤の育成戦略 / Growth Strategy of Data Analytics Platforms from a Product Perspective
yamamotoyuta
0
190
Featured
See All Featured
Distributed Sagas: A Protocol for Coordinating Microservices
caitiem20
330
21k
Why You Should Never Use an ORM
jnunemaker
PRO
55
9.2k
The Illustrated Children's Guide to Kubernetes
chrisshort
48
49k
The Art of Programming - Codeland 2020
erikaheidi
53
13k
Art, The Web, and Tiny UX
lynnandtonic
298
20k
StorybookのUI Testing Handbookを読んだ
zakiyama
28
5.4k
Visualizing Your Data: Incorporating Mongo into Loggly Infrastructure
mongodb
44
9.4k
Automating Front-end Workflow
addyosmani
1367
200k
We Have a Design System, Now What?
morganepeng
51
7.4k
Templates, Plugins, & Blocks: Oh My! Creating the theme that thinks of everything
marktimemedia
28
2.2k
Exploring the Power of Turbo Streams & Action Cable | RailsConf2023
kevinliebholz
28
4.5k
実際に使うSQLの書き方 徹底解説 / pgcon21j-tutorial
soudai
175
51k
Transcript
lessons learned Solr @jeroenrosenberg
Frontend of Lucene
Lucene xml/json api + field types + caching + faceting
+ grouping +
Indexing
Indexing
Lucene's inverted index
Efficient when many docs share the same value
Field types
<field name="id" type="string" indexed="true" stored="true" required=" true" multiValued="false"/> <field name="name"
type="string" indexed="false" stored="true" required="true" multiValued="false"/> Field type definition
<field name="id" type="string" indexed="true" stored="true" required=" true" multiValued="false"/> <field name="name"
type="string" indexed="false" stored="true" required="true" multiValued="false"/> Field type definition
... <fieldtype name="pdate" class="solr.DateField" sortMissingLast="true"/> ... <field name="date" type="pdate" indexed="false"
stored="true"/> <field name="range_date" type="pdate" indexed="true" stored="false"/> <copyField source="date" dest="range_date"/> Field type definition
... <fieldtype name="pdate" class="solr.DateField" sortMissingLast="true"/> ... <field name="date" type="pdate" indexed="false"
stored="true"/> <field name="range_date" type="pdate" indexed="true" stored="false"/> <copyField source="date" dest="range_date"/> Field type definition
<dynamicField name="*_s" type="string" indexed="true" stored="true"/> Schemaless
Segments
Tune the merge factor
Max. # of segments Faster search, but slower indexing Faster
indexing, but slower search
Don't commit. Ever.
Don't commit often.
Sharding
Manual distribution
foo foo foo core1 core2 core3 Index distributor replication
Look Ma, no downtime!
q=name:hotel1& shards=solr2:7070/solr/foo,solr3: 7070/solr/foo& partialResults=true Distributed search
<requestHandler name="distributedSearch" class="solr.SearchHandler" default="false"> <lst name="defaults"> <int name="rows">10</int> <str name="fl">*</str>
<bool name="partialResults">true</bool> <str name="shards">solr2:7070/solr/foo,solr3:7070/solr/foo</str> </lst> </requestHandler> Distributed search config
<requestHandler name="distributedSearch" class="solr.SearchHandler" default="false"> <lst name="defaults"> <int name="rows">10</int> <str name="fl">*</str>
<bool name="partialResults">true</bool> <str name="shards">solr2:7070/solr/foo,solr3:7070/solr/foo</str> </lst> </requestHandler> Distributed search config
<requestHandler name="distributedSearch" class="solr.SearchHandler" default false"> <lst name="defaults"> <int name="rows">10</int> <str
name="fl">*</str> <bool name="partialResults">true</bool> <str name="shards">solr2:7070/solr/foo,solr3:7070/solr/foo</str> </lst> </requestHandler> Distributed search config
q=name:hotel1&qt=distributedSearch Distributed search
Caching
Field value Filter Document Query result
Document Field value Query result Filter Doc ids of results
per filter query
Query result Document Filter Field value Field names (facets) mapped
to mapping of doc ids to terms
Field value Filter Document Query result Ordered set of doc
ids of top N results
Field value Filter Query result Document Stored fields for each
doc
Autowarming
None
q=*:*&fq=country:AN&fq=duration:[1 TO *]& fq=date:[NOW TO 2013-07-01T00:00:00Z] Filter queries...
q=*:*&fq=country:AN&fq=duration:[1 TO *]& fq=date:[NOW TO 2013-07-01T00:00:00Z] Match all documents q=*:*
q=*:*&fq=country:AN&fq=duration:[1 TO *]& fq=date:[NOW TO 2013-07-01T00:00:00Z] Filter by field value
fq=country:AN
q=*:*&fq=country:AN&fq=duration:[1 TO *]& fq=date:[NOW TO 2013-07-01T00:00:00Z] Range query with wildcard
fq=duration:[1 TO *] range query using DateMath syntax fq=date:[NOW TO 2013-07-01T00:00:00Z]
q=*:*&rows=10000000 Getting all results
Faceting
rows=0&facet=true&facet.field=departureairport& facet.field=touroperator&facet.limit=-1& facet.mincount=1&f.touroperator.facet.limit=2 A facet query...
rows=0&facet=true&facet.field=departureairport& facet.field=touroperator&facet.limit=-1& facet.mincount=1&f.touroperator.facet.limit=2 Enable faceting facet=true
rows=0&facet=true&facet.field=departureairport& facet.field=touroperator&facet.limit=-1& facet.mincount=1&f.touroperator.facet.limit=2 rows=0 Suppress document results
rows=0&facet=true&facet.field=departureairport& facet.field=touroperator&facet.limit=-1& facet.mincount=1&f.touroperator.facet.limit=2 facet.field=departureairport Specify a field name ...and another
one facet.field=touroperator
rows=0&facet=true&facet.field=departureairport& facet.field=touroperator&facet.limit=-1& facet.mincount=1&f.touroperator.facet.limit=2 Unlimited field values (globally) facet.limit=-1
rows=0&facet=true&facet.field=departureairport& facet.field=touroperator&facet.limit=-1& facet.mincount=1&f.touroperator.facet.limit=2 Unlimited field values (globally) facet.limit=-1 Basically, always
a good idea
rows=0&facet=true&facet.field=departureairport& facet.field=touroperator&facet.limit=-1& facet.mincount=1&f.touroperator.facet.limit=2 Override global limit for specific field names
f.touroperator.facet.limit=2
rows=0&facet=true&facet.field=departureairport& facet.field=touroperator&facet.limit=-1& facet.mincount=1&f.touroperator.facet.limit=2 At least 1 document per field value
facet.mincount=1
q=*:*&fq={!tag=country}country:AN&facet=true& facet.field={!ex=country}country&facet.limit=-1& facet.mincount=1 Multi-select faceting...
q=*:*&fq={!tag=country}country:AN&facet=true& facet.field={!ex=country}country&facet.limit=-1& facet.mincount=1 fq={!tag=country}country:AN Tag a filter query... ...and exclude
it for a field value facet.field={!ex=country}country
FACET ALL THE THINGS! FACET ALL THE THINGS!
Grouping
group=true&group.field=accoid& group.sort=price asc&sort=popularity asc& group.facets=UNGROUPED A grouping query...
group=true&group.field=accoid& group.sort=price asc&sort=popularity asc& group.facets=UNGROUPED Enable grouping group=true
group=true&group.field=accoid& group.sort=price asc&sort=popularity asc& group.facets=UNGROUPED Specify the field name group.field=accoid
group=true&group.field=accoid& group.sort=price asc&sort=popularity asc& group.facets=UNGROUPED Determines group head group.sort=price asc
Determine order of document results sort=popularity asc
group=true&group.field=accoid& group.sort=price asc&sort=popularity asc& group.facets=UNGROUPED Determines group head group.sort=price asc
Determine order of document results sort=popularity asc Only group heads are returned!
ONE DOES NOT SIMPLY EXPLAIN SOLR QUERIES ONE DOES NOT
SIMPLY EXPLAIN SOLR QUERIES
debugQuery=true
Solr 4.3 is coming http://docs.lucidworks.com/display/solr/Major+Changes+from+Solr+3+to+Solr+4
Queries?