Slide 1

Slide 1 text

Open Source RESTful Search Scott Smerchek @smerchek scottsmerchek.com

Slide 2

Slide 2 text

Lucene Lucene Core: – indexing and search technology – ranking/scoring results – spellchecking – hit highlighting – advanced analysis/tokenization – long history (est. 2000)

Slide 3

Slide 3 text

Inverted Index

Slide 4

Slide 4 text

Lucene, but better • Solr: – built on Lucene Core – REST API (xml/json) – explicit field types – caching – faceting/grouping – highly configurable/extensible – replication/distribution (scalable)

Slide 5

Slide 5 text

Indexing Just a POST curl “http://localhost:8983/solr/update/json?commit=true” --data-binary @listings.json -H “Content-type:application/json”

Slide 6

Slide 6 text

Query Syntax: Term test “You need to test your code.” test* “Your code should be testable.”

Slide 7

Slide 7 text

Query Syntax: Field title:"The Right Way" AND text:go { id: 1, title: “The Right Way”, description: “Go the right way, or else.” } +title:"The Right Way" +text:go

Slide 8

Slide 8 text

Query Syntax: Range quantity:[2 TO *] { id: 2, title: “Things”, quantity: 2 }

Slide 9

Slide 9 text

Query Syntax: Range quantity:[2 TO *] { id: 2, title: “Things”, quantity: 4 }

Slide 10

Slide 10 text

Query Syntax: Range quantity:[2 TO *] { id: 2, title: “Things”, quantity: 8 }

Slide 11

Slide 11 text

Query Syntax: Range quantity:[2 TO *] { id: 2, title: “Things”, quantity: 1 }

Slide 12

Slide 12 text

Query Syntax: Range creation:[NOW/DAY-30DAY TO *] { id: 2, title: “Things”, quantity: 2, creation: "2012-04-26T02:42:07Z" }

Slide 13

Slide 13 text

Query Syntax: Range creation:[NOW/DAY-30DAY TO *] { id: 2, title: “Things”, quantity: 2, creation: "2012-03-29T05:22:06Z" }

Slide 14

Slide 14 text

Query Syntax: Range creation:[NOW/DAY-30DAY TO *] { id: 2, title: “Things”, quantity: 2, creation: "2012-04-21T11:36:52Z" }

Slide 15

Slide 15 text

Query Syntax: Range creation:[NOW/DAY-30DAY TO *] { id: 2, title: “Things”, quantity: 2, creation: "2012-03-15T11:36:52Z" }

Slide 16

Slide 16 text

A Solr Query http://localhost:8983/solr/select? q=*:*& start=0& rows=10& fq=tags:Jewelry& debugQuery=true& wt=json

Slide 17

Slide 17 text

DebugQuery: Explaining http://localhost:8983/solr/select? q=beautiful& debugQuery=true 71407194: 0.6276805 = (MATCH) weight(text:beautiful in 3948), result of: 0.6276805 = fieldWeight in 3948, product of: 1.4142135 = tf(freq=2.0), with freq of: 2.0 = termFreq=2.0 2.8405576 = idf(docFreq=1972, maxDocs=12430) 0.15625 = fieldNorm(doc=3948)

Slide 18

Slide 18 text

A Solr DisMax Query http://localhost:8983/solr/select? q=beautiful& defType=edismax& qf=title^2,description& tie=.1

Slide 19

Slide 19 text

Faceting Source: newegg.com - Internal SSD listings

Slide 20

Slide 20 text

A Solr Query: Facets http://localhost:8983/solr/select? q=*:*& facet=true& facet.field=who_made& facet.sort=count& f.who_made.facet.method=enum

Slide 21

Slide 21 text

A Solr Query: Date Facets http://localhost:8983/solr/select? q=*:*& facet=true& facet.date=original_creation_tsz& facet.date.start=NOW/YEAR-4YEARS& facet.date.gap=+1YEAR& facet.date.end=NOW/YEAR& facet.date.other=before

Slide 22

Slide 22 text

A Solr Query: Term Suggest http://localhost:8983/solr/select? q=tags:Clutch& facet=true& rows=0& facet.field=text& facet.limit=10& facet.mincount=1& facet.prefix=be

Slide 23

Slide 23 text

A Solr Query: Highlighting http://localhost:8983/solr/select? q=beautiful& hl=true& hl.fl=title,description& hl.simple.pre=& hl.simple.post=

Slide 24

Slide 24 text

A Solr Query: Grouping http://localhost:8983/solr/select? q=beautiful& group=true& group.field=user_id& group.limit=3

Slide 25

Slide 25 text

A Solr Query: Statistics http://localhost:8983/solr/select? q=*:*& rows=0& stats=true& stats.field=quantity

Slide 26

Slide 26 text

Solr Cores *Use multiple cores* • Rebuild an index without downtime • Test configuration changes • Merge indexes of two cores into a 3rd

Slide 27

Slide 27 text

Solr Caching • Filter Cache – Caches unordered sets of doc ids that match a key (query) – Used for results of fq filter queries and faceting • Field Value Cache – Primarily used by faceting • Query Result Cache – Stores ordered sets of doc ids • Document Cache – Stores Lucene Document objects • User/Generic Caches – Generic object cache for custom Solr plugins

Slide 28

Slide 28 text

Solr in applications • Ruby • .NET • Java • Python • Javascript • Anything that make a HTTP request…

Slide 29

Slide 29 text

Solr in production • Best on linux • Lots of RAM • Manage config.xml with git • Restrict access by IP • Servlet monitoring solution • Localize fields with field alias (dismax) • Increase memory allocated to the JVM – java -Xms512M -Xmx1024M -jar start.jar

Slide 30

Slide 30 text

Other Features • NearRealtime Searching • Payloads • SolrCloud (distributed) • Custom plugins • (Geo)Spatial Search • Joins • Indexing Rich Documents (PDF, etc) • Clustering

Slide 31

Slide 31 text

Solr Resources • Apache Solr 3 Enterprise Search Server • Lucene in Action (2nd Edition) • http://wiki.apache.org/solr/ • http://wiki.apache.org/solr/SolrResources