Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Open Source RESTful Search: Solr

Open Source RESTful Search: Solr

Solr is an open source search platform built on top of the Apache Lucene project. Solr wraps Lucene with a nice RESTful API, and adds other features like faceted search, grouping, field types, caching, xml configuration, an administration interface, and the ability to scale with distributed search. This session will start with the basics of Solr and Lucene. We’ll then touch on some of the advanced/awesome features of Solr and also look at how to extend Solr with additional pluggable functionality. This will of course be supplemented with working examples and demos.

Scott Smerchek

April 28, 2012
Tweet

More Decks by Scott Smerchek

Other Decks in Programming

Transcript

  1. Lucene Lucene Core: – indexing and search technology – ranking/scoring

    results – spellchecking – hit highlighting – advanced analysis/tokenization – long history (est. 2000)
  2. Lucene, but better • Solr: – built on Lucene Core

    – REST API (xml/json) – explicit field types – caching – faceting/grouping – highly configurable/extensible – replication/distribution (scalable)
  3. Query Syntax: Term test “You need to test your code.”

    test* “Your code should be testable.”
  4. Query Syntax: Field title:"The Right Way" AND text:go { id:

    1, title: “The Right Way”, description: “Go the right way, or else.” } +title:"The Right Way" +text:go
  5. Query Syntax: Range creation:[NOW/DAY-30DAY TO *] { id: 2, title:

    “Things”, quantity: 2, creation: "2012-04-26T02:42:07Z" }
  6. Query Syntax: Range creation:[NOW/DAY-30DAY TO *] { id: 2, title:

    “Things”, quantity: 2, creation: "2012-03-29T05:22:06Z" }
  7. Query Syntax: Range creation:[NOW/DAY-30DAY TO *] { id: 2, title:

    “Things”, quantity: 2, creation: "2012-04-21T11:36:52Z" }
  8. Query Syntax: Range creation:[NOW/DAY-30DAY TO *] { id: 2, title:

    “Things”, quantity: 2, creation: "2012-03-15T11:36:52Z" }
  9. DebugQuery: Explaining http://localhost:8983/solr/select? q=beautiful& debugQuery=true 71407194: 0.6276805 = (MATCH) weight(text:beautiful

    in 3948), result of: 0.6276805 = fieldWeight in 3948, product of: 1.4142135 = tf(freq=2.0), with freq of: 2.0 = termFreq=2.0 2.8405576 = idf(docFreq=1972, maxDocs=12430) 0.15625 = fieldNorm(doc=3948)
  10. Solr Cores *Use multiple cores* • Rebuild an index without

    downtime • Test configuration changes • Merge indexes of two cores into a 3rd
  11. Solr Caching • Filter Cache – Caches unordered sets of

    doc ids that match a key (query) – Used for results of fq filter queries and faceting • Field Value Cache – Primarily used by faceting • Query Result Cache – Stores ordered sets of doc ids • Document Cache – Stores Lucene Document objects • User/Generic Caches – Generic object cache for custom Solr plugins
  12. Solr in applications • Ruby • .NET • Java •

    Python • Javascript • Anything that make a HTTP request…
  13. Solr in production • Best on linux • Lots of

    RAM • Manage config.xml with git • Restrict access by IP • Servlet monitoring solution • Localize fields with field alias (dismax) • Increase memory allocated to the JVM – java -Xms512M -Xmx1024M -jar start.jar
  14. Other Features • NearRealtime Searching • Payloads • SolrCloud (distributed)

    • Custom plugins • (Geo)Spatial Search • Joins • Indexing Rich Documents (PDF, etc) • Clustering
  15. Solr Resources • Apache Solr 3 Enterprise Search Server •

    Lucene in Action (2nd Edition) • http://wiki.apache.org/solr/ • http://wiki.apache.org/solr/SolrResources