Introduction to Searching
with Solr
Krzysztof Skarbek
October 30, 2015
Slide 2
Slide 2 text
Searching
Slide 3
Slide 3 text
Searching
● Most of us start browsing the Internet with a
search engine – that's how natural
searching is
Slide 4
Slide 4 text
Searching
● Most of us start browsing the Internet with a
search engine – that's how natural
searching is
● Unless your site has very little information,
or everything is reachable with a few clicks,
searching becomes a vital feature
Slide 5
Slide 5 text
Searching
● Poor man's solution:
Slide 6
Slide 6 text
Searching
● Poor man's solution:
Slide 7
Slide 7 text
Searching
● Poor man's solution:
● You have to expose the data to Google
Slide 8
Slide 8 text
Searching
● Poor man's solution:
● You have to expose the data to Google
● Guess what? Ads
Slide 9
Slide 9 text
Searching in examples
Slide 10
Slide 10 text
Searching in examples
● We'll demonstrate and compare SQL
database searches and searching with Solr
Slide 11
Slide 11 text
Searching in examples
● We'll demonstrate and compare SQL
database searches and searching with Solr
● We'll use the same example database in all
cases
Slide 12
Slide 12 text
Example database
Slide 13
Slide 13 text
Example database
● Extracted from a real-life web application
Slide 14
Slide 14 text
Example database
● Extracted from a real-life web application
● Contains blog posts and culinary recipes
scraped from cooking blogs
Slide 15
Slide 15 text
Example database
● Extracted from a real-life web application
● Contains blog posts and culinary recipes
scraped from cooking blogs
● 22 tables, 69300 records in posts table,
550 MB of textual data
Slide 16
Slide 16 text
Example database
blogs
id
name
url
...
blogs
id
name
url
...
Slide 17
Slide 17 text
Example database
posts
id
blog_id
title
body
...
posts
id
blog_id
title
body
...
blogs
id
name
url
...
blogs
id
name
url
...
Slide 18
Slide 18 text
Example database
posts
id
blog_id
title
body
...
posts
id
blog_id
title
body
...
recipes
id
post_id
title
...
recipes
id
post_id
title
...
blogs
id
name
url
...
blogs
id
name
url
...
Slide 19
Slide 19 text
Example database
posts
id
blog_id
title
body
...
posts
id
blog_id
title
body
...
recipes
id
post_id
title
...
recipes
id
post_id
title
...
ingredients
id
name
...
ingredients
id
name
...
recipes_ingredients
recipe_id
ingredient_id
amount
...
recipes_ingredients
recipe_id
ingredient_id
amount
...
blogs
id
name
url
...
blogs
id
name
url
...
Slide 20
Slide 20 text
Naïve approach with SQL LIKE
Slide 21
Slide 21 text
Naïve approach with SQL LIKE
●
SELECT posts.id FROM recipes
INNER JOIN posts ON posts.id = recipes.post_id
INNER JOIN blogs ON blogs.id = posts.blog_id
WHERE posts.title LIKE '%cake%'
OR posts.body LIKE '%cake%'
OR recipes.title LIKE '%cake%';
Slide 22
Slide 22 text
Naïve approach with SQL LIKE
●
SELECT posts.id FROM recipes
INNER JOIN posts ON posts.id = recipes.post_id
INNER JOIN blogs ON blogs.id = posts.blog_id
WHERE posts.title LIKE '%cake%'
OR posts.body LIKE '%cake%'
OR recipes.title LIKE '%cake%';
949 rows in set (10.07 sec)
Slide 23
Slide 23 text
Naïve approach with SQL LIKE
● Slow
Slide 24
Slide 24 text
Naïve approach with SQL LIKE
● Slow
● Very bad effect on database performance
Slide 25
Slide 25 text
Naïve approach with SQL LIKE
● Slow
● Very bad effect on database performance
● Problems with whitespace/punctuation
(e.g.: LIKE '% cake %' against “cake!”)
Slide 26
Slide 26 text
Naïve approach with SQL LIKE
● Slow
● Very bad effect on database performance
● Problems with whitespace/punctuation
(e.g.: LIKE '% cake %' against “cake!”)
● Really, a bad idea
Slide 27
Slide 27 text
Searching features built into RDBMS
● MySQL – full text search indexes
Slide 28
Slide 28 text
Searching features built into RDBMS
● MySQL – full text search indexes
CREATE TABLE posts (
id INT NOT NULL AUTO INCREMENT,
title VARCHAR(200),
body TEXT,
...
FULLTEXT (title, body),
...
Slide 29
Slide 29 text
Searching features built into RDBMS
SELECT posts.id FROM recipes
INNER JOIN posts ON posts.id = recipes.post_id
INNER JOIN blogs ON blogs.id = posts.blog_id
WHERE MATCH (posts.title, posts.body)
AGAINST ('cake')
OR MATCH (recipes.title) AGAINST ('cake');
Slide 30
Slide 30 text
Searching features built into RDBMS
SELECT posts.id FROM recipes
INNER JOIN posts ON posts.id = recipes.post_id
INNER JOIN blogs ON blogs.id = posts.blog_id
WHERE MATCH (posts.title, posts.body)
AGAINST ('cake')
OR MATCH (recipes.title) AGAINST ('cake');
1042 rows in set (0.22 sec)
Slide 31
Slide 31 text
Searching features built into RDBMS
● Much faster
Slide 32
Slide 32 text
Searching features built into RDBMS
● Much faster
● But still not blazingly fast
Slide 33
Slide 33 text
Searching features built into RDBMS
● Much faster
● But still not blazingly fast
● Used to only be available for MyISAM tables
Slide 34
Slide 34 text
Searching features built into RDBMS
● Much faster
● But still not blazingly fast
● Used to only be available for MyISAM tables
(not anymore with MySQL 5.6)
Slide 35
Slide 35 text
Searching features built into RDBMS
● Similar features in other RDBMS:
– PostgreSQL: Full Text Searching
– MSSQL: Full-Text Search
– Oracle: Oracle Text
– SQLite: FTS5 extension
Slide 36
Slide 36 text
Enter Solr
Slide 37
Slide 37 text
Enter Solr
● Based on Lucene, a high-performance, full-
featured text search engine library
Slide 38
Slide 38 text
Enter Solr
● Based on Lucene, a high-performance, full-
featured text search engine library
● Notable features:
Slide 39
Slide 39 text
Enter Solr
● Based on Lucene, a high-performance, full-
featured text search engine library
● Notable features:
– ranked searching
Slide 40
Slide 40 text
Enter Solr
● Based on Lucene, a high-performance, full-
featured text search engine library
● Notable features:
– ranked searching
– flexible faceting, highlighting, joins and result
grouping
Slide 41
Slide 41 text
Enter Solr
● Based on Lucene, a high-performance, full-
featured text search engine library
● Notable features:
– ranked searching
– flexible faceting, highlighting, joins and result
grouping
– allows simultaneous update and searching
Slide 42
Slide 42 text
Enter Solr
● Based on Lucene, a high-performance, full-
featured text search engine library
● Notable features:
– ranked searching
– flexible faceting, highlighting, joins and result
grouping
– allows simultaneous update and searching
– support for filters
Slide 43
Slide 43 text
How Solr works
Slide 44
Slide 44 text
How Solr works
● Importing data from a source (e.g. SQL
database)
Slide 45
Slide 45 text
How Solr works
● Importing data from a source (e.g. SQL
database)
– full import
Slide 46
Slide 46 text
How Solr works
● Importing data from a source (e.g. SQL
database)
– full import
– delta import
Slide 47
Slide 47 text
How Solr works
● Importing data from a source (e.g. SQL
database)
– full import
– delta import
● Indexing imported data (with index filters)
Slide 48
Slide 48 text
How Solr works
● Importing data from a source (e.g. SQL
database)
– full import
– delta import
● Indexing imported data (with index filters)
● Searching (with query filters)
Slide 49
Slide 49 text
Solr configuration
Slide 50
Slide 50 text
Solr configuration
● Brace yourselves, XML is coming
Slide 51
Slide 51 text
Solr configuration
...
Slide 52
Slide 52 text
Solr configuration
Slide 53
Slide 53 text
Solr configuration
deltaQuery="
SELECT
r.id AS id,
r.post_id AS post_id,
r.title AS recipe_title,
...
FROM recipes AS r
INNER JOIN posts AS p ON p.id = r.post_id
INNER JOIN blogs AS b ON b.id = p.blog_id
WHERE
p.added_on > '${dataimporter.last_index_time}'
ORDER BY posted_on DESC
"
Using Solr in application code
● As simple as making an HTTP request:
$phrase = 'cake';
$url = 'http://host/solr/database/select?q=' .
urlencode('post_body:' . $phrase);
$data = json_decode(file_get_contents($url), TRUE);
Slide 62
Slide 62 text
Using Solr in application code
● There are client libraries/language bindings
available for all popular languages
Slide 63
Slide 63 text
Filters
Slide 64
Slide 64 text
Filters
● Transform and normalize the search data
Slide 65
Slide 65 text
Filters
● Transform and normalize the search data
● Examples:
Slide 66
Slide 66 text
Filters
● Transform and normalize the search data
● Examples:
– LowerCaseFilterFactory – lower case tokens
Slide 67
Slide 67 text
Filters
● Transform and normalize the search data
● Examples:
– LowerCaseFilterFactory – lower case tokens
– ASCIIFoldingFilterFactory – converts >127
unicode characters into their ASCII equivalents
Slide 68
Slide 68 text
Filters
● Transform and normalize the search data
● Examples:
– LowerCaseFilterFactory – lower case tokens
– ASCIIFoldingFilterFactory – converts >127
unicode characters into their ASCII equivalents
– External: MorfologikFilterFactory – filter to use
with Morfologik stemmer