Upgrade to Pro — share decks privately, control downloads, hide ads and more …

better searching with elasticsearch

better searching with elasticsearch

Elasticsearch is a distributed, schemaless, document oriented, Lucene based search engine with a REST API. This talk looks at what that all that actually means in practice moving from interacting with it directly with cURL to integrating it into PHP applications using Elastica.

Richard Miller

October 23, 2013
Tweet

More Decks by Richard Miller

Other Decks in Technology

Transcript

  1. Given I am a hungry person When I search for

    somewhere to eat Then I should see relevant results And the results should be ordered by relevance
  2. Given I am a hungry person When I search for

    somewhere to eat Then I should see results for relevant reviews
  3. Given I am a hungry person When I search for

    somewhere to eat Then I should see results for similar words
  4. Given I am a hungry person When I search for

    somewhere to eat near me Then I should see results close to my location
  5. Given I am a hungry person When I typo entering

    my search terms Then I should see “did you mean?” suggestions
  6. Given I am a hungry person When I start entering

    my search terms Then I should see suggestions straight away
  7. Given I am a hungry person When I start searching

    for somewhere to eat Then I should be able to filter my results
  8. Given I am a hungry person When I start view

    details of somewhere to eat Then I should see suggestions of similar eateries
  9. Given I am a restaurateur When I upload my PDF

    menu Then hungry people should be able to search it
  10. Given I am a hungry person When I search for

    somewhere to eat Then I should see relevant results And the results should be ordered by relevance
  11. Given I am a hungry person When I search for

    somewhere to eat Then I should see relevant results And the results should be ordered by relevance
  12. Given I am a hungry person When I search for

    somewhere to eat Then I should see relevant results And the results should be ordered by relevance
  13. the

  14. Given I am a hungry person When I search for

    somewhere to eat Then I should see results for relevant reviews
  15. Given I am a hungry person When I search for

    somewhere to eat Then I should see results for similar words
  16. Given I am a hungry person When I search for

    somewhere to eat near me Then I should see results close to my location
  17. { "ok" : true, "status" : 200, "name" : "Mr.

    Wu", "version" : { "number" : "0.90.5", "build_hash" : "c8714e8...dedee", "build_timestamp" : "2013-09-17T12:50:20Z", "build_snapshot" : false, "lucene_version" : "4.4" }, "tagline" : "You Know, for Search" } response
  18. { "ok" : true, "_index" : "eatly", "_type" : "eateries",

    "_id" : "cmElnobYSy6386TOUVbGZQ", "_version" : 1 }
  19. { "_index" : "eatly", "_type" : "eateries", "_id" : "1",

    "_version" : 1, "exists" : true, "_source" : { "name" : "Jeff's Burgers", "desc" : "Blah Blah dirty burgers..." } }
  20. { "_index" : "eatly", "_type" : "eateries", "_id" : "1",

    "_version" : 2, "exists" : true, "_source" : { "name" : "Jeff's Burger Joint" } }
  21. { "took" : 7, "timed_out" : false, "_shards" : {

    "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 1, "max_score" : 0.15342641, "hits" : [ { "_index" : "eatly", "_type" : "eateries", "_id" : "1", "_score" : 0.15342641, "_source" : {"name" : "Jeff's Burger Joint"} } ] } }
  22. curl -XPUT 'http://localhost:9200/eatly/' -d '{ "settings" : { "index" :

    { "number_of_shards" : 4 "number_of_replicas" : 2 } } }'
  23. Given I am a hungry person When I search for

    somewhere to eat Then I should see relevant results And the results should be ordered by relevance
  24. curl -XPUT '.../eatly/eateries/_mapping' -d '{ "eateries" : { "properties" :

    { "name" : {"type" : "string", "boost" : "1.5"}, "desc" : {"type" : "string"} } } }'
  25. Given I am a hungry person When I search for

    somewhere to eat Then I should see results for relevant reviews
  26. curl -XPUT '.../eatly/eateries/_mapping' -d '{ "eateries" : { "properties" :

    { "name" : {"type" : "string", "boost" : "1.5"}, "desc" : {"type" : "string"} "reviews" : { "properties" : { "review" : {"type" : "string"}, "reviewer" : {"type" : "string"} } } } } }'
  27. curl -XPOST 'http://localhost:9200/eatly/eateries/' -d '{ "name" : "Jeff''s Burgers", "desc"

    : "Blah Blah dirty burgers...", "reviews" : [ { "review" : "Yadda, yadda, yadda", "reviewer" : "John Smith" }, { "review" : "na na na", "reviewer" : "Billy Badger" } ] }'
  28. Given I am a hungry person When I search for

    somewhere to eat Then I should see results for similar words
  29. burgers I really enjoyed Jeff's Burger Joint, the burgers were

    excellent! can't fault it standard tokenizer I really enjoyed Jeff's Burger Joint the were excellent can't fault it
  30. excellent! burgers I really enjoyed Jeff's Burger Joint, the burgers

    were excellent! can't fault it whitespace tokenizer I really enjoyed Jeff's Burger Joint, the were can't fault it
  31. burgers I really enjoyed Jeff's Burger Joint, the burgers were

    excellent! can't fault it letter tokenizer I really enjoyed Jeff Burger Joint the were excellent can fault it s t
  32. burgers I really enjoyed Jeff's Burger Joint, the burgers were

    excellent! can't fault it standard tokenizer + lowercase filter i really enjoyed jeff's burger joint the were excellent can't fault it
  33. burgers I really enjoyed Jeff's Burger Joint, the burgers were

    excellent! can't fault it standard tokenizer + lowercase filter + stop filter i really enjoyed jeff's burger joint were excellent can't fault
  34. burger I really enjoyed Jeff's Burger Joint, the burgers were

    excellent! can't fault it standard tokenizer + lowercase filter + stop filter + stemmer filter i realli enjoi jeff burger joint were excel can't fault
  35. curl -XPUT 'http://localhost:9200/eatly/' -d '{ "settings" : { "index" :

    { "analysis" : { "analyzer" : { "snowball_analyzer" : { "type" : "snowball" } } } } } }'
  36. curl -XPUT '.../eatly/eateries/_mapping' -d '{ "eateries" : { "properties" :

    { "name" : {"type" : "string", "boost" : "1.5"}, "desc" : { "type" : "string", "analyzer": "snowball_analyzer" } } } }'
  37. curl -XPUT 'http://localhost:9200/eatly/' -d '{ "settings" : { "index" :

    { "analysis" : { "analyzer" : { "custom_analyzer" : { "type" : "custom", "tokenizer" : "lowercase" "filter" : ["stop", "extra_stop"] } }, "filter" : { "extra_stop":{ "type" : "stop", "stopwords" : ["food"] } } } } } }'
  38. curl -XPUT '.../eatly/eateries/_mapping' -d '{ "eateries" : { "properties" :

    { "name" : {"type" : "string", "boost" : "1.5"}, "desc" : { "type" : "string", "analyzer": "custom_analyzer" } } } }'
  39. curl -XGET '.../eatly/eateries/_search' -d '{ "query" : { "match" :

    { "name" : { "query" : "burger bar", "operator" : "and" } } } }'
  40. curl -XGET '.../eatly/eateries/_search' -d '{ "query" : { "multi_match" :

    { "query" : "burger bar", "fields" : ["name","desc","reviews.review"] } } }'
  41. curl -XGET '.../eatly/eateries/_search' -d '{ "query" : { "multi_match" :

    { "query" : "burger bar", "fields" : ["name^2","desc","reviews.review"] } } }'
  42. Given I am a hungry person When I start searching

    for somewhere to eat Then I should be able to filter my results
  43. curl -XPUT 'http://localhost:9200/eatly/eateries/1' -d '{ "name" : "Jeff''s Burger Joint",

    "desc" : "Blah Blah dirty burgers...", "tags" : ["greasy", "retro"] }'
  44. curl -XGET '.../eatly/eateries/_search' -d '{ "query" : { "multi_match" :

    { "query" : "burger bar", "fields" : ["name^2", "desc", "reviews.review"] } }, "filter" : { "not" : { "terms" : { "tags" : ["greasy", "meaty"]} } } }'
  45. Given I am a hungry person When I search for

    somewhere to eat near me Then I should see results close to my location
  46. curl -XPUT '.../eatly/eateries/_mapping' -d '{ "eateries" : { "properties" :

    { "name" : {"type" : "string", "boost" : "1.5"}, "desc" : {"type" : "string"} "reviews" : { "properties" : { "review" : {"type" : "string"}, "reviewer" : {"type" : "string"} } }, "location" : { "type" : "geo_point" } } } }'
  47. curl -XPOST 'http://localhost:9200/eatly/eateries/' -d '{ "name" : "Jeff''s Burgers", "desc"

    : "Blah Blah dirty burgers...", "reviews" : [ { "review" : "Yadda, yadda, yadda", "reviewer" : "John Smith" }, { "review" : "na na na", "reviewer" : "Billy Badger" } ], "location" : { "lat" : 41.12, "lon" : -71.34 } }'
  48. curl -XGET '.../eatly/eateries/_search' -d '{ "sort" : [ { "_geo_distance"

    : { "eatery.location" : [-40, 70], "order" : "asc", "unit" : "miles" } } ], "query" : { "multi_match" : { "query" : "burger bar", "fields" : ["name^2","desc","reviews.review"] } } }'
  49. $type->addDocument( new \Elastica\Document( $id, array( "id" => $id, "name" =>

    "Jeff's Burger Joint", "desc" => "Blah Blah..." ) ) );
  50. $multiMatchQuery = new \Elastica\Query\MultiMatch(); $multiMatchQuery ->setQuery("burgers bar") ->setFields( array("name^2", "desc",

    "reviews.review") ) ; $query = new \Elastica\Query($multiMatchQuery); $query->setFilter( new \Elastica\Filter\BoolNot( new \Elastica\Filter\Terms( "tags", array("greasy", "meaty") ) ) ); $results = $type->search($query);
  51. $mapping = new \Elastica\Type\Mapping(); $mapping->setType($type); $mapping->setProperties( array( "name" => array(

    "type" => "string" ), "desc" => array( "type" => "string", "boost" => 1.5 ) ) ); $mapping->send();
  52. Given I am a hungry person When I search for

    somewhere to eat Then I should see relevant results And the results should be ordered by relevance ✔
  53. Given I am a hungry person When I search for

    somewhere to eat Then I should see results for relevant reviews ✔
  54. Given I am a hungry person When I search for

    somewhere to eat Then I should see results for similar words ✔
  55. Given I am a hungry person When I search for

    somewhere to eat near me Then I should see results close to my location ✔
  56. Given I am a hungry person When I start searching

    for somewhere to eat Then I should be able to filter my results ?
  57. $facets = $resultSet->getFacets(); foreach ($facets["tags"]["terms"] as $facet) { printf( "%s:

    %s\n", $facet["term"], $facet["count"] ); } thai: 4 italian: 8 greek:4
  58. Given I am a hungry person When I start view

    details of somewhere to eat Then I should see suggestions of similar eateries
  59. Given I am a restaurateur When I upload my PDF

    menu Then hungry people should be able to search it
  60. Given I am a hungry person When I typo entering

    my search terms Then I should see “did you mean?” suggestions
  61. $query = new \Elastica\Query( array( "query" => array(...), "suggest" =>

    array( "check1" => array( "text" => "berger", "term" => array( "field" => "name" ) ) ) ) );
  62. Array ( [check1] => Array ( [0] => Array (

    [text] => berger [offset] => 0 [length] => 6 [options] => Array ( [0] => Array ( [text] => burger [score] => 0.8333333 [freq] => 1 ) ) ) ) )
  63. Given I am a hungry person When I start entering

    my search terms Then I should see suggestions straight away
  64. $mapping->setProperties( array( "name" => array( "type" => "string" ), "desc"

    => array( "type" => "string", "boost" => 1.5 ), "name_suggest" => array( "type" => "completion" ) ) );
  65. new \Elastica\Document( $id, array( "id" => $id, "name" => "Jeff's

    Burger Joint", "name_suggest" => "Jeff's Burger Joint", "desc" => "Blah Blah..." ) );
  66. $query = array( "eateries" => array( "text" => 'j', "completion"

    => array( "field" => "name_suggest", ) ) );
  67. Array ( [0] => Array ( [text] => j [offset]

    => 0 [length] => 1 [options] => Array ( [0] => Array ( [text] => Jeffs Burger Joint [score] => 1 ) ) ) )