Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Terms of endearment - the ElasticSearch Query D...

Terms of endearment - the ElasticSearch Query DSL explained

Given at YAPC::EU 2011

Clinton Gormley

August 17, 2011
Tweet

More Decks by Clinton Gormley

Other Decks in Programming

Transcript

  1. { tweet => "Perl is GREAT!", posted => "2011-08-15", user

    => { name => "Clinton Gormley", email => "[email protected]", }, tags => ["perl","opinion"], posts => 2, } Document:
  2. { tweet => "Perl is GREAT!", posted => "2011-08-15", user

    => { name => "Clinton Gormley", email => "[email protected]", }, tags => ["perl","opinion"], posts => 2, } Fields:
  3. { tweet => "Perl is GREAT!", posted => "2011-08-15", user

    => { name => "Clinton Gormley", email => "[email protected]", }, tags => ["perl","opinion"], posts => 2, } Values:
  4. { tweet => "Perl is GREAT!", posted => "2011-08-15", user

    => { name => "Clinton Gormley", email => "[email protected]" }, tags => ["perl","opinion"], posts => 2, } Field types: # object # string # date # nested object # string # string # array of enums # integer
  5. { tweet => "Perl is GREAT!", posted => "2011-08-15", user

    => { name => "Clinton Gormley", email => "[email protected]", }, tags => ["perl","opinion"], posts => 2, } Nested objects flattened:
  6. { tweet => "Perl is GREAT!", posted => "2011-08-15", user.name

    => "Clinton Gormley", user.email => "[email protected]", tags => ["perl","opinion"], posts => 2, } Nested objects flattened
  7. { tweet => "Perl is GREAT!", posted => "2011-08-15", user.name

    => "Clinton Gormley", user.email => "[email protected]", tags => ["perl","opinion"], posts => 2, } Values analyzed into terms
  8. { tweet => ['perl','great'], posted => [Date(2011-08-15)], user.name => ['clinton','gormley'],

    user.email => ['drtech','cpan.org'], tags => ['perl','opinion'], posts => [2], } Values analyzed into terms
  9. database table row ⇒ many tables ⇒ many rows ⇒

    one schema ⇒ many columns In MySQL
  10. index type document ⇒ many types ⇒ many documents ⇒

    one mapping ⇒ many fields In ElasticSearch
  11. Create index with mappings $es->create_index( index => 'twitter', mappings =>

    { tweet => { properties => { title => { type => 'string' }, created => { type => 'date' } } } } );
  12. Add a mapping $es->put_mapping( index => 'twitter', type => 'user',

    mapping => { properties => { name => { type => 'string' }, created => { type => 'date' }, } } );
  13. Core field types { type => 'string', # byte|short|integer|long|double|float #

    date, ip addr, geolocation # boolean # binary (as base 64) }
  14. Core field types { type => 'string', index => 'analyzed',

    index_analyzer => 'default', search_analyzer => 'default', }
  15. Core field types { type => 'string', index => 'analyzed',

    analyzer => 'default', boost => 2, }
  16. Core field types { type => 'string', index => 'analyzed',

    analyzer => 'default', boost => 2, include_in_all => 1 |0 }
  17. • Standard • Simple • Whitespace • Stop • Keyword

    Built in analyzers • Pattern • Language • Snowball • Custom
  18. The Brown-Cow's Part_No. #A.BC123-456 [email protected] keyword: The Brown-Cow's Part_No. #A.BC123-456

    [email protected] whitespace: The, Brown-Cow's, Part_No., #A.BC123-456, [email protected] simple: the, brown, cow, s, part, no, a, bc, joe, bloggs, com standard: brown, cow's, part_no, a.bc123, 456, joe, bloggs.com snowball (English): brown, cow, part_no, a.bc123, 456, joe, bloggs.com
  19. Token filters • Standard • ASCII Folding • Length •

    Lowercase • NGram • Edge NGram • Porter Stem • Shingle • Stop • Word Delimiter • Stemmer • KStem • Snowball • Phonetic • Synonym • Compound Word • Reverse • Elision • Truncate • Unique
  20. Custom Analyzer $c->create_index( index => 'twitter', settings => { analysis

    => { analyzer => { ascii_html => { type => 'custom', tokenizer => 'standard', filter => [ qw( standard lowercase asciifolding stop ) ], char_filter => ['html_strip'] } } }} );
  21. Searching $result = $es->search( index => 'twitter', type => 'tweet',

    queryb => 'foo', # b == ElasticSearch::SearchBuilder );
  22. Searching $result = $es->search( index => 'twitter', type => 'tweet',

    query => { text => { _all => 'foo' }}, sort => [{ '_score': 'desc' }] );
  23. Searching $result = $es->search( index => 'twitter', type => 'tweet',

    query => { text => { _all => 'foo' }}, sort => [{ '_score': 'desc' }] from => 0, size => 10, );
  24. Queries vs Filters • full text & terms • relevance

    scoring • terms only • no scoring
  25. Queries vs Filters • full text & terms • relevance

    scoring • slower • terms only • no scoring • faster
  26. Queries vs Filters • full text & terms • relevance

    scoring • slower • no caching • terms only • no scoring • faster • cacheable
  27. Queries vs Filters • full text & terms • relevance

    scoring • slower • no caching • terms only • no scoring • faster • cacheable Use filters for anything that doesn't affect the relevance score!
  28. Query only Query DSL: $es->search( query => { text =>

    { title => 'perl' } } ); SearchBuilder: $es->search( queryb => { title => 'perl' } );
  29. Filter only Query DSL: $es->search( query => { constant_score =>

    { filter => {term => { tag => 'perl }} } }); SearchBuilder: $es->search( queryb => { -filter => { tag => 'perl' } });
  30. Query and filter Query DSL: $es->search( query => { filtered

    => { query => { text => { title => 'perl' }}, filter =>{ term => { tag => 'perl' }} } }); SearchBuilder: $es->search( queryb => { title => 'perl', -filter => { tag => 'perl' } });
  31. Filters : equality Query DSL: { term => { tags

    => 'perl' }} { terms => { tags => ['perl','ruby'] }} SearchBuilder: { tags => 'perl' } { tags => ['perl','ruby'] }
  32. Filters : range Query DSL: { range => { date

    => { gte => '2010-11-01', lt => '2010-12-01' }} SearchBuilder: { date => { gte => '2010-11-01', lt => '2011-12-01' }}
  33. Filters : range (many values) Query DSL: { numeric_range =>

    { date => { gte => '2010-11-01', lt => '2010-12-01 }} SearchBuilder: { date => { '>=' => '2010-11-01', '<' => '2011-12-01' }}
  34. Filters : and | or | not Query DSL: {

    and => [ {term=>{X=>1}}, {term=>{Y=>2}} ]} { or => [ {term=>{X=>1}}, {term=>{Y=>2}} ]} { not => { or => [ {term=>{X=>1}}, {term=>{Y=>2}} ] }} SearchBuilder: { X => 1, Y => 2 } [ X => 1, Y => 2 ] { -not => { X => 1, Y => 2 } } # and { -not => [ X => 1, Y => 2 ] } # or
  35. Filters : exists | missing Query DSL: { exists =>

    { field => 'title' }} { missing => { field => 'title' }} SearchBuilder: { -exists => 'title' } { -missing => 'title' }
  36. Filter example SearchBuilder: { -filter => [ featured => 1,

    { created_at => { gt => '2011-08-01' }, status => { '!=' => 'pending' }, }, ] }
  37. Filter example Query DSL: { constant_score => { filter =>

    { or => [ { term => { featured => 1 }}, { and => [ { not => { term => { status => 'pending' }}, { range => { created_at => { gt => '2011-08-01' }}}, ] } ] } } }
  38. Filters : others • script • nested • has_child •

    query • match_all • prefix • limit • ids • type • geo_distance • geo_distance_range • geo_bbox • geo_polygon
  39. Text / Analyzed: • text • query_string / field •

    flt / flt_field • mlt / mlt_field Term / Not analyzed: • term / terms • range • prefix • fuzzy • wildcard • ids • span queries Combining: • bool • dis_max • boosting Scripting: • custom_score • custom_filters_score Wrappers: • match_all • constant_score • filtered “Joins”: • nested • has_child • top_children Queries
  40. Text / Analyzed: • text • query_string / field •

    flt / flt_field • mlt / mlt_field Term / Not analyzed: • term / terms • range • prefix • fuzzy • wildcard • ids • span queries Combining: • bool • dis_max • boosting Scripting: • custom_score • custom_filters_score Wrappers: • match_all • constant_score • filtered “Joins”: • nested • has_child • top_children Queries
  41. Text-Query Family Query DSL: { text => { title =>

    'great perl' }} Search Builder: { title => 'great perl' }
  42. Text-Query Family Query DSL: { text => { title =>

    { query => 'great perl' }}} Search Builder: { title => { '=' => { query => 'great perl' }}}
  43. Text-Query Family Query DSL: { text => { title =>

    { query => 'great perl' , operator => 'and' }}} Search Builder: { title => { '=' => { query => 'great perl', operator => 'and' }}}
  44. Text-Query Family Query DSL: { text => { title =>

    { query => 'great perl' , fuzziness => 0.5 }}} Search Builder: { title => { '=' => { query => 'great perl', fuzziness => 0.5 }}}
  45. Text-Query Family Query DSL: { text => { title =>

    { query => 'great perl', type => 'phrase' }}} Search Builder: { title => { '==' => { query => 'great perl', }}}
  46. Text-Query Family Query DSL: { text => { title =>

    { query => 'great perl', type => 'phrase' }}} Search Builder: { title => { '==' => { query => 'great perl', }}}
  47. Text-Query Family Query DSL: { text => { title =>

    { query => 'perl is great', type => 'phrase' }}} Search Builder: { title => { '==' => { query => 'perl is great', }}}
  48. Text-Query Family Query DSL: { text => { title =>

    { query => 'perl great', type => 'phrase', slop => 3 }}} Search Builder: { title => { '==' => { query => 'perl great', slop => 3 }}}
  49. Text-Query Family Query DSL: { text => { title =>

    { query => 'perl is gr', type => 'phrase_prefix', }}} Search Builder: { title => { '^' => { query => 'perl is gr', }}}
  50. Query string / Field Lucene Query Syntax aware “perl is

    great”~5 AND author:clint* -deleted
  51. Query string / Field Syntax errors: AND perl is great”

    author: clint* - ElasticSearch::QueryParser
  52. Combining: Bool Query DSL: { bool => { must =>

    [ { term => { foo => 1}}, ... ], must_not => [ { term => { bar => 1}}, ... ], should => [ { term => { X => 2}}, { term => { Y => 2}},... ], minimum_number_should_match => 1, }}
  53. Combining: Bool SearchBuilder: { foo => 1, bar => {

    '!=' => 1}, -or => [ X => 2, Y => 2], } { -bool => { must => { foo => 1 }, must_not => { bar => 1 }, should => [{ X => 2}, { Y => 2 }], minimum_number_should_match => 1, }}
  54. Combining: DisMax Query DSL: { dis_max => { queries =>

    [ { term => { foo => 1}}, { term => { bar => 1}}, ] }} SearchBuilder: { -dis_max => [ { term => { foo => 1}}, { term => { bar => 1}}, ], }
  55. Boosting: at index time { properties => { content =>

    { type => “string” }, title => { type => “string” }, }
  56. Boosting: at index time { properties => { content =>

    { type => “string” }, title => { type => “string”, boost => 2, }, }, }
  57. Boosting: at index time { properties => { content =>

    { type => “string” }, title => { type => “string”, boost => 2, }, rank => { type => “integer” }, }, _boost => { name => 'rank', null_value => 1.0 }, }
  58. Boosting: at search time Query DSL: { bool => {

    should => [ { text => { content => 'perl' }}, { text => { title => 'perl' }}, ] }} SearchBuilder: { content => 'perl', title => 'perl' }
  59. Boosting: at search time Query DSL: { bool => {

    should => [ { text => { content => 'perl' }}, { text => { title => { query => 'perl', }}, ] }} SearchBuilder: { content => 'perl', title => { '=' => { query => 'perl' }} }
  60. Boosting: at search time Query DSL: { bool => {

    should => [ { text => { content => 'perl' }}, { text => { title => { query => 'perl', boost => 2 }}, ] }} SearchBuilder: { content => 'perl', title => { '=' => { query => 'perl', boost=> 2 }} }
  61. Boosting: custom_score Query DSL: { custom_score => { query =>

    { text => { title => 'perl' }}, script => “_score * foo /doc['rank'].value”, }} SearchBuilder: { -custom_score => { query => { title => 'perl' }, script => “_score * foo /doc['rank'].value”, }}
  62. Query example SearchBuilder: { -or => [ title => {

    '=' => { query => 'custom score', boost => 2 }}, content => 'custom score', ], -filter => { repo => 'elasticsearch/elasticsearch', created_at => { '>=' => '2011-07-01', '<' => '2011-08-01'}, -or => [ creator_id => 123, assignee_id => 123, ], labels => ['bug','breaking'] } }
  63. Query example Query DSL: { query => { filtered =>

    { query => { bool => { should => [ { text => { content => "custom score" } }, { text => { title => { boost => 2, query => "custom score" } } }, ], }, }, filter => { and => [ { or => [ { term => { creator_id => 123 } }, { term => { assignee_id => 123 } }, ]}, { terms => { labels => ["bug", "breaking"] } }, { term => { repo => "elasticsearch/elasticsearch" } }, { numeric_range => { created_at => { gte => "2011-07-01", lt => "2011-08-01" }}}, ]}, }}