Slide 1

Slide 1 text

“Terms of Endearment” The ElasticSearch query language explained Clinton Gormley, YAPC::EU 2011 DRTECH @clintongormley

Slide 2

Slide 2 text

search for : “DELETE QUERY ” We can

Slide 3

Slide 3 text

search for : “DELETE QUERY ” and find : “deleteByQuery ” We can

Slide 4

Slide 4 text

but you can only find what is stored in the database

Slide 5

Slide 5 text

Normalise values “deleteByQuery” 'delete' 'by' 'query' 'deletebyquery'

Slide 6

Slide 6 text

Normalise values and search terms “deleteByQuery” “DELETE QUERY” 'delete' 'by' 'query' 'deletebyquery'

Slide 7

Slide 7 text

Normalise values and search terms “deleteByQuery” “DELETE QUERY” 'delete' 'by' 'query' 'deletebyquery'

Slide 8

Slide 8 text

Analyse values and search terms “deleteByQuery” “DELETE QUERY” 'delete' 'by' 'query' 'deletebyquery'

Slide 9

Slide 9 text

What is stored in ElasticSearch?

Slide 10

Slide 10 text

{ tweet => "Perl is GREAT!", posted => "2011-08-15", user => { name => "Clinton Gormley", email => "[email protected]", }, tags => ["perl","opinion"], posts => 2, } Document:

Slide 11

Slide 11 text

{ tweet => "Perl is GREAT!", posted => "2011-08-15", user => { name => "Clinton Gormley", email => "[email protected]", }, tags => ["perl","opinion"], posts => 2, } Fields:

Slide 12

Slide 12 text

{ tweet => "Perl is GREAT!", posted => "2011-08-15", user => { name => "Clinton Gormley", email => "[email protected]", }, tags => ["perl","opinion"], posts => 2, } Values:

Slide 13

Slide 13 text

{ tweet => "Perl is GREAT!", posted => "2011-08-15", user => { name => "Clinton Gormley", email => "[email protected]" }, tags => ["perl","opinion"], posts => 2, } Field types: # object # string # date # nested object # string # string # array of enums # integer

Slide 14

Slide 14 text

{ tweet => "Perl is GREAT!", posted => "2011-08-15", user => { name => "Clinton Gormley", email => "[email protected]", }, tags => ["perl","opinion"], posts => 2, } Nested objects flattened:

Slide 15

Slide 15 text

{ tweet => "Perl is GREAT!", posted => "2011-08-15", user.name => "Clinton Gormley", user.email => "[email protected]", tags => ["perl","opinion"], posts => 2, } Nested objects flattened

Slide 16

Slide 16 text

{ tweet => "Perl is GREAT!", posted => "2011-08-15", user.name => "Clinton Gormley", user.email => "[email protected]", tags => ["perl","opinion"], posts => 2, } Values analyzed into terms

Slide 17

Slide 17 text

{ tweet => ['perl','great'], posted => [Date(2011-08-15)], user.name => ['clinton','gormley'], user.email => ['drtech','cpan.org'], tags => ['perl','opinion'], posts => [2], } Values analyzed into terms

Slide 18

Slide 18 text

database table row ⇒ many tables ⇒ many rows ⇒ one schema ⇒ many columns In MySQL

Slide 19

Slide 19 text

index type document ⇒ many types ⇒ many documents ⇒ one mapping ⇒ many fields In ElasticSearch

Slide 20

Slide 20 text

Create index with mappings $es->create_index( index => 'twitter', mappings => { tweet => { properties => { title => { type => 'string' }, created => { type => 'date' } } } } );

Slide 21

Slide 21 text

Add a mapping $es->put_mapping( index => 'twitter', type => 'user', mapping => { properties => { name => { type => 'string' }, created => { type => 'date' }, } } );

Slide 22

Slide 22 text

Can add to existing mapping

Slide 23

Slide 23 text

Can add to existing mapping Cannot change mapping for field

Slide 24

Slide 24 text

Core field types { type => 'string', }

Slide 25

Slide 25 text

Core field types { type => 'string', # byte|short|integer|long|double|float # date, ip addr, geolocation # boolean # binary (as base 64) }

Slide 26

Slide 26 text

Core field types { type => 'string', index => 'analyzed', # 'Foo Bar' ⇒ [ 'foo', 'bar' ] }

Slide 27

Slide 27 text

Core field types { type => 'string', index => 'not_analyzed', # 'Foo Bar' ⇒ [ 'Foo Bar' ] }

Slide 28

Slide 28 text

Core field types { type => 'string', index => 'no', # 'Foo Bar' ⇒ [ ] }

Slide 29

Slide 29 text

Core field types { type => 'string', index => 'analyzed', analyzer => 'default', }

Slide 30

Slide 30 text

Core field types { type => 'string', index => 'analyzed', index_analyzer => 'default', search_analyzer => 'default', }

Slide 31

Slide 31 text

Core field types { type => 'string', index => 'analyzed', analyzer => 'default', boost => 2, }

Slide 32

Slide 32 text

Core field types { type => 'string', index => 'analyzed', analyzer => 'default', boost => 2, include_in_all => 1 |0 }

Slide 33

Slide 33 text

● Standard ● Simple ● Whitespace ● Stop ● Keyword Built in analyzers ● Pattern ● Language ● Snowball ● Custom

Slide 34

Slide 34 text

The Brown-Cow's Part_No. #A.BC123-456 [email protected] keyword: The Brown-Cow's Part_No. #A.BC123-456 [email protected] whitespace: The, Brown-Cow's, Part_No., #A.BC123-456, [email protected] simple: the, brown, cow, s, part, no, a, bc, joe, bloggs, com standard: brown, cow's, part_no, a.bc123, 456, joe, bloggs.com snowball (English): brown, cow, part_no, a.bc123, 456, joe, bloggs.com

Slide 35

Slide 35 text

Token filters ● Standard ● ASCII Folding ● Length ● Lowercase ● NGram ● Edge NGram ● Porter Stem ● Shingle ● Stop ● Word Delimiter ● Stemmer ● KStem ● Snowball ● Phonetic ● Synonym ● Compound Word ● Reverse ● Elision ● Truncate ● Unique

Slide 36

Slide 36 text

Custom Analyzer $c->create_index( index => 'twitter', settings => { analysis => { analyzer => { ascii_html => { type => 'custom', tokenizer => 'standard', filter => [ qw( standard lowercase asciifolding stop ) ], char_filter => ['html_strip'] } } }} );

Slide 37

Slide 37 text

Searching $result = $es->search( index => 'twitter', type => 'tweet', );

Slide 38

Slide 38 text

Searching $result = $es->search( index => ['twitter','facebook'], type => ['tweet','post'], );

Slide 39

Slide 39 text

Searching $result = $es->search( # all indices # all types );

Slide 40

Slide 40 text

Searching $result = $es->search( index => 'twitter', type => 'tweet', query => { text => { _all => 'foo' }}, );

Slide 41

Slide 41 text

Searching $result = $es->search( index => 'twitter', type => 'tweet', queryb => 'foo', # b == ElasticSearch::SearchBuilder );

Slide 42

Slide 42 text

Searching $result = $es->search( index => 'twitter', type => 'tweet', query => { text => { _all => 'foo' }}, sort => [{ '_score': 'desc' }] );

Slide 43

Slide 43 text

Searching $result = $es->search( index => 'twitter', type => 'tweet', query => { text => { _all => 'foo' }}, sort => [{ '_score': 'desc' }] from => 0, size => 10, );

Slide 44

Slide 44 text

Query DSL

Slide 45

Slide 45 text

Queries vs Filters

Slide 46

Slide 46 text

Queries vs Filters ● full text & terms ● terms only

Slide 47

Slide 47 text

Queries vs Filters ● full text & terms ● relevance scoring ● terms only ● no scoring

Slide 48

Slide 48 text

Queries vs Filters ● full text & terms ● relevance scoring ● slower ● terms only ● no scoring ● faster

Slide 49

Slide 49 text

Queries vs Filters ● full text & terms ● relevance scoring ● slower ● no caching ● terms only ● no scoring ● faster ● cacheable

Slide 50

Slide 50 text

Queries vs Filters ● full text & terms ● relevance scoring ● slower ● no caching ● terms only ● no scoring ● faster ● cacheable Use filters for anything that doesn't affect the relevance score!

Slide 51

Slide 51 text

Query only Query DSL: $es->search( query => { text => { title => 'perl' } } ); SearchBuilder: $es->search( queryb => { title => 'perl' } );

Slide 52

Slide 52 text

Filter only Query DSL: $es->search( query => { constant_score => { filter => {term => { tag => 'perl }} } }); SearchBuilder: $es->search( queryb => { -filter => { tag => 'perl' } });

Slide 53

Slide 53 text

Query and filter Query DSL: $es->search( query => { filtered => { query => { text => { title => 'perl' }}, filter =>{ term => { tag => 'perl' }} } }); SearchBuilder: $es->search( queryb => { title => 'perl', -filter => { tag => 'perl' } });

Slide 54

Slide 54 text

Filters

Slide 55

Slide 55 text

Filters : equality Query DSL: { term => { tags => 'perl' }} { terms => { tags => ['perl','ruby'] }} SearchBuilder: { tags => 'perl' } { tags => ['perl','ruby'] }

Slide 56

Slide 56 text

Filters : range Query DSL: { range => { date => { gte => '2010-11-01', lt => '2010-12-01' }} SearchBuilder: { date => { gte => '2010-11-01', lt => '2011-12-01' }}

Slide 57

Slide 57 text

Filters : range (many values) Query DSL: { numeric_range => { date => { gte => '2010-11-01', lt => '2010-12-01 }} SearchBuilder: { date => { '>=' => '2010-11-01', '<' => '2011-12-01' }}

Slide 58

Slide 58 text

Filters : and | or | not Query DSL: { and => [ {term=>{X=>1}}, {term=>{Y=>2}} ]} { or => [ {term=>{X=>1}}, {term=>{Y=>2}} ]} { not => { or => [ {term=>{X=>1}}, {term=>{Y=>2}} ] }} SearchBuilder: { X => 1, Y => 2 } [ X => 1, Y => 2 ] { -not => { X => 1, Y => 2 } } # and { -not => [ X => 1, Y => 2 ] } # or

Slide 59

Slide 59 text

Filters : exists | missing Query DSL: { exists => { field => 'title' }} { missing => { field => 'title' }} SearchBuilder: { -exists => 'title' } { -missing => 'title' }

Slide 60

Slide 60 text

Filter example SearchBuilder: { -filter => [ featured => 1, { created_at => { gt => '2011-08-01' }, status => { '!=' => 'pending' }, }, ] }

Slide 61

Slide 61 text

Filter example Query DSL: { constant_score => { filter => { or => [ { term => { featured => 1 }}, { and => [ { not => { term => { status => 'pending' }}, { range => { created_at => { gt => '2011-08-01' }}}, ] } ] } } }

Slide 62

Slide 62 text

Filters : others ● script ● nested ● has_child ● query ● match_all ● prefix ● limit ● ids ● type ● geo_distance ● geo_distance_range ● geo_bbox ● geo_polygon

Slide 63

Slide 63 text

Text / Analyzed: ● text ● query_string / field ● flt / flt_field ● mlt / mlt_field Term / Not analyzed: ● term / terms ● range ● prefix ● fuzzy ● wildcard ● ids ● span queries Combining: ● bool ● dis_max ● boosting Scripting: ● custom_score ● custom_filters_score Wrappers: ● match_all ● constant_score ● filtered “Joins”: ● nested ● has_child ● top_children Queries

Slide 64

Slide 64 text

Text / Analyzed: ● text ● query_string / field ● flt / flt_field ● mlt / mlt_field Term / Not analyzed: ● term / terms ● range ● prefix ● fuzzy ● wildcard ● ids ● span queries Combining: ● bool ● dis_max ● boosting Scripting: ● custom_score ● custom_filters_score Wrappers: ● match_all ● constant_score ● filtered “Joins”: ● nested ● has_child ● top_children Queries

Slide 65

Slide 65 text

Text/Analyzed Queries mapping aware

Slide 66

Slide 66 text

Text/Analyzed Queries not_analyzed ⇒ term query

Slide 67

Slide 67 text

Text/Analyzed Queries analyzed ⇒ text query using search_analyzer

Slide 68

Slide 68 text

Text-Query Family Query DSL: { text => { title => 'great perl' }} Search Builder: { title => 'great perl' }

Slide 69

Slide 69 text

Text-Query Family Query DSL: { text => { title => { query => 'great perl' }}} Search Builder: { title => { '=' => { query => 'great perl' }}}

Slide 70

Slide 70 text

Text-Query Family Query DSL: { text => { title => { query => 'great perl' , operator => 'and' }}} Search Builder: { title => { '=' => { query => 'great perl', operator => 'and' }}}

Slide 71

Slide 71 text

Text-Query Family Query DSL: { text => { title => { query => 'great perl' , fuzziness => 0.5 }}} Search Builder: { title => { '=' => { query => 'great perl', fuzziness => 0.5 }}}

Slide 72

Slide 72 text

Text-Query Family Query DSL: { text => { title => { query => 'great perl', type => 'phrase' }}} Search Builder: { title => { '==' => { query => 'great perl', }}}

Slide 73

Slide 73 text

Text-Query Family Query DSL: { text => { title => { query => 'great perl', type => 'phrase' }}} Search Builder: { title => { '==' => { query => 'great perl', }}}

Slide 74

Slide 74 text

Text-Query Family Query DSL: { text => { title => { query => 'perl is great', type => 'phrase' }}} Search Builder: { title => { '==' => { query => 'perl is great', }}}

Slide 75

Slide 75 text

Text-Query Family Query DSL: { text => { title => { query => 'perl great', type => 'phrase', slop => 3 }}} Search Builder: { title => { '==' => { query => 'perl great', slop => 3 }}}

Slide 76

Slide 76 text

Text-Query Family Query DSL: { text => { title => { query => 'perl is gr', type => 'phrase_prefix', }}} Search Builder: { title => { '^' => { query => 'perl is gr', }}}

Slide 77

Slide 77 text

Query string / Field Lucene Query Syntax aware “perl is great”~5 AND author:clint* -deleted

Slide 78

Slide 78 text

Query string / Field Syntax errors: AND perl is great” author: clint* -

Slide 79

Slide 79 text

Query string / Field Syntax errors: AND perl is great” author: clint* - ElasticSearch::QueryParser

Slide 80

Slide 80 text

Combining: Bool Query DSL: { bool => { must => [ { term => { foo => 1}}, ... ], must_not => [ { term => { bar => 1}}, ... ], should => [ { term => { X => 2}}, { term => { Y => 2}},... ], minimum_number_should_match => 1, }}

Slide 81

Slide 81 text

Combining: Bool SearchBuilder: { foo => 1, bar => { '!=' => 1}, -or => [ X => 2, Y => 2], } { -bool => { must => { foo => 1 }, must_not => { bar => 1 }, should => [{ X => 2}, { Y => 2 }], minimum_number_should_match => 1, }}

Slide 82

Slide 82 text

Combining: DisMax Query DSL: { dis_max => { queries => [ { term => { foo => 1}}, { term => { bar => 1}}, ] }} SearchBuilder: { -dis_max => [ { term => { foo => 1}}, { term => { bar => 1}}, ], }

Slide 83

Slide 83 text

Bool: combines scores DisMax: uses highest score from all matching clauses

Slide 84

Slide 84 text

Tweaking relevance:

Slide 85

Slide 85 text

Tweaking relevance: Boosting

Slide 86

Slide 86 text

Boosting: at index time { properties => { content => { type => “string” }, title => { type => “string” }, }

Slide 87

Slide 87 text

Boosting: at index time { properties => { content => { type => “string” }, title => { type => “string”, boost => 2, }, }, }

Slide 88

Slide 88 text

Boosting: at index time { properties => { content => { type => “string” }, title => { type => “string”, boost => 2, }, rank => { type => “integer” }, }, _boost => { name => 'rank', null_value => 1.0 }, }

Slide 89

Slide 89 text

Boosting: at search time Query DSL: { bool => { should => [ { text => { content => 'perl' }}, { text => { title => 'perl' }}, ] }} SearchBuilder: { content => 'perl', title => 'perl' }

Slide 90

Slide 90 text

Boosting: at search time Query DSL: { bool => { should => [ { text => { content => 'perl' }}, { text => { title => { query => 'perl', }}, ] }} SearchBuilder: { content => 'perl', title => { '=' => { query => 'perl' }} }

Slide 91

Slide 91 text

Boosting: at search time Query DSL: { bool => { should => [ { text => { content => 'perl' }}, { text => { title => { query => 'perl', boost => 2 }}, ] }} SearchBuilder: { content => 'perl', title => { '=' => { query => 'perl', boost=> 2 }} }

Slide 92

Slide 92 text

Boosting: custom_score Query DSL: { custom_score => { query => { text => { title => 'perl' }}, script => “_score * foo /doc['rank'].value”, }} SearchBuilder: { -custom_score => { query => { title => 'perl' }, script => “_score * foo /doc['rank'].value”, }}

Slide 93

Slide 93 text

Query example SearchBuilder: { -or => [ title => { '=' => { query => 'custom score', boost => 2 }}, content => 'custom score', ], -filter => { repo => 'elasticsearch/elasticsearch', created_at => { '>=' => '2011-07-01', '<' => '2011-08-01'}, -or => [ creator_id => 123, assignee_id => 123, ], labels => ['bug','breaking'] } }

Slide 94

Slide 94 text

Query example Query DSL: { query => { filtered => { query => { bool => { should => [ { text => { content => "custom score" } }, { text => { title => { boost => 2, query => "custom score" } } }, ], }, }, filter => { and => [ { or => [ { term => { creator_id => 123 } }, { term => { assignee_id => 123 } }, ]}, { terms => { labels => ["bug", "breaking"] } }, { term => { repo => "elasticsearch/elasticsearch" } }, { numeric_range => { created_at => { gte => "2011-07-01", lt => "2011-08-01" }}}, ]}, }}

Slide 95

Slide 95 text

No content

Slide 96

Slide 96 text

https://github.com/clintongormley/GitHubSearch