Upgrade to Pro — share decks privately, control downloads, hide ads and more …

better searching with elasticsearch

better searching with elasticsearch

Elasticsearch is a distributed, schemaless, document oriented, Lucene based search engine with a REST API. This talk looks at what that all that actually means in practice moving from interacting with it directly with cURL to integrating it into PHP applications using Elastica.

Ab44158da0498db70754ee8061e69c31?s=128

Richard Miller

October 23, 2013
Tweet

Transcript

  1. better searching with elasticsearch

  2. richard miller @mr_r_miller richarddmiller.co.uk sensiolabs uk

  3. why worry about search?

  4. ok, so what might someone want from search?

  5. Given I am a hungry person When I search for

    somewhere to eat Then I should see relevant results And the results should be ordered by relevance
  6. yep, no problem

  7. Given I am a hungry person When I search for

    somewhere to eat Then I should see results for relevant reviews
  8. sure

  9. Given I am a hungry person When I search for

    somewhere to eat Then I should see results for similar words
  10. yeah, we can do that do that

  11. Given I am a hungry person When I search for

    somewhere to eat near me Then I should see results close to my location
  12. that’s certainly possible

  13. Given I am a hungry person When I typo entering

    my search terms Then I should see “did you mean?” suggestions
  14. well other sites do that, so why not?

  15. Given I am a hungry person When I start entering

    my search terms Then I should see suggestions straight away
  16. ok, ok

  17. Given I am a hungry person When I start searching

    for somewhere to eat Then I should be able to filter my results
  18. now this we can do

  19. Given I am a hungry person When I start view

    details of somewhere to eat Then I should see suggestions of similar eateries
  20. yep, yep

  21. Given I am a restaurateur When I upload my PDF

    menu Then hungry people should be able to search it
  22. hmm...

  23. well let’s get the easy stuff out of the way

  24. Given I am a hungry person When I search for

    somewhere to eat Then I should see relevant results And the results should be ordered by relevance
  25. SELECT * FROM eateries WHERE NAME LIKE “%searchterm%”

  26. multiple searchterms?

  27. ...WHERE ( name LIKE “%multiple%” OR name LIKE “%searchterm%”)

  28. but wait...

  29. but wait... ...multiple fields

  30. ...WHERE ( name LIKE “%multiple%” OR name LIKE “%searchterm%”...

  31. ...OR desc LIKE “%multiple%” OR desc LIKE “%searchterm%”)

  32. Given I am a hungry person When I search for

    somewhere to eat Then I should see relevant results And the results should be ordered by relevance
  33. ORDER?

  34. Given I am a hungry person When I search for

    somewhere to eat Then I should see relevant results And the results should be ordered by relevance
  35. the

  36. the with

  37. the with be

  38. the but with be

  39. the but with be it

  40. it the but with be food

  41. oh ok,

  42. oh ok, full text search...

  43. SELECT * FROM eateries MATCH (name, desc) AGAINST( “multiple searchterms”)

  44. that should do!

  45. what was left?

  46. Given I am a hungry person When I search for

    somewhere to eat Then I should see results for relevant reviews
  47. ...LEFT JOIN...

  48. Given I am a hungry person When I search for

    somewhere to eat Then I should see results for similar words
  49. grilled

  50. grilling grilled

  51. grilling grilled grills

  52. grill grilling grilled grills

  53. Given I am a hungry person When I search for

    somewhere to eat near me Then I should see results close to my location
  54. elasticsearch

  55. distributed, schemaless, document oriented, Lucene based search engine with a

    REST API
  56. github

  57. stackoverflow

  58. gov.uk

  59. foursquare

  60. soundcloud

  61. wget https://download.elasticsearch.org/ elasticsearch/elasticsearch/ elasticsearch-0.90.5.tar.gz tar -zxvf elasticsearch-0.90.5.tar.gz cd elasticsearch-0.90.5

  62. bin/elasticsearch

  63. Lucene based

  64. request curl -X GET http://localhost:9200/

  65. { "ok" : true, "status" : 200, "name" : "Mr.

    Wu", "version" : { "number" : "0.90.5", "build_hash" : "c8714e8...dedee", "build_timestamp" : "2013-09-17T12:50:20Z", "build_snapshot" : false, "lucene_version" : "4.4" }, "tagline" : "You Know, for Search" } response
  66. database index

  67. table type

  68. record document

  69. distributed

  70. 0 1 2 3 4 Instance

  71. 0 1 Instance 1 Instance 2 Instance 3 2 3

    4
  72. 0 1 2 3 4 Instance 1 Instance 2 Instance

    3 0 1 2 3 4
  73. document oriented

  74. REST API

  75. curl -XPOST 'http://localhost:9200/eatly/eateries/' -d '{ "name" : "Jeff''s Burgers", "desc"

    : "Blah Blah dirty burgers..." }'
  76. curl -XPOST 'http://localhost:9200/eatly/eateries/' -d '{ "name" : "Jeff''s Burgers", "desc"

    : "Blah Blah dirty burgers..." }'
  77. curl -XPOST 'http://localhost:9200/eatly/eateries/' -d '{ "name" : "Jeff''s Burgers", "desc"

    : "Blah Blah dirty burgers..." }'
  78. {"ok":true,"_index":"eatly","_type":"eateries","_id":"EGHS ABBxQkaPR_DJxPxXZA","_version":1}

  79. ?pretty=true

  80. { "ok" : true, "_index" : "eatly", "_type" : "eateries",

    "_id" : "cmElnobYSy6386TOUVbGZQ", "_version" : 1 }
  81. curl -XPUT 'http://localhost:9200/eatly/eateries/1' -d '{ "name" : "Jeff''s Burgers", "desc"

    : "Blah Blah dirty burgers..." }'
  82. { "ok" : true, "_index" : "eatly", "_type" : "eateries",

    "_id" : "1", "_version" : 1 }
  83. curl -XGET 'http://localhost:9200/eatly/eateries/1'

  84. { "_index" : "eatly", "_type" : "eateries", "_id" : "1",

    "_version" : 1, "exists" : true, "_source" : { "name" : "Jeff's Burgers", "desc" : "Blah Blah dirty burgers..." } }
  85. curl -XPUT 'http://localhost:9200/eatly/eateries/1' -d '{ "name" : "Jeff''s Burger Joint"

    }'
  86. { "ok" : true, "_index" : "eatly", "_type" : "eateries",

    "_id" : "1", "_version" : 2 }
  87. curl -XGET 'http://localhost:9200/eatly/eateries/1'

  88. { "_index" : "eatly", "_type" : "eateries", "_id" : "1",

    "_version" : 2, "exists" : true, "_source" : { "name" : "Jeff's Burger Joint" } }
  89. curl -XDELETE 'http://localhost:9200/eatly/eateries/1'

  90. curl -XGET 'http://localhost:9200/eatly/eateries/1'

  91. { "_index" : "eatly", "_type" : "eateries", "_id" : "1",

    "_version" : 2, "exists" : false }
  92. search engine

  93. curl -XGET '.../eatly/eateries/_search?q=name:Burger'

  94. { "took" : 7, "timed_out" : false, "_shards" : {

    "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 1, "max_score" : 0.15342641, "hits" : [ { "_index" : "eatly", "_type" : "eateries", "_id" : "1", "_score" : 0.15342641, "_source" : {"name" : "Jeff's Burger Joint"} } ] } }
  95. http://localhost:9200/eatly/cafes,pubs/_search?q=Burger http://localhost:9200/eatly/_search?q=name:Burger http://localhost:9200/eatly,cookr/_search?q=name:Burger http://localhost:9200/_all/pubs/_search?q=name:Burger http://localhost:9200/_search?q=name:Burger'

  96. schemaless?

  97. schemaless?

  98. curl -XPUT 'http://localhost:9200/eatly/' -d '{ "settings" : { "index" :

    { "number_of_shards" : 4 "number_of_replicas" : 2 } } }'
  99. Given I am a hungry person When I search for

    somewhere to eat Then I should see relevant results And the results should be ordered by relevance
  100. name > desc

  101. curl -XPUT '.../eatly/eateries/_mapping' -d '{ "eateries" : { "properties" :

    { "name" : {"type" : "string", "boost" : "1.5"}, "desc" : {"type" : "string"} } } }'
  102. Given I am a hungry person When I search for

    somewhere to eat Then I should see results for relevant reviews
  103. curl -XPUT '.../eatly/eateries/_mapping' -d '{ "eateries" : { "properties" :

    { "name" : {"type" : "string", "boost" : "1.5"}, "desc" : {"type" : "string"} "reviews" : { "properties" : { "review" : {"type" : "string"}, "reviewer" : {"type" : "string"} } } } } }'
  104. curl -XPOST 'http://localhost:9200/eatly/eateries/' -d '{ "name" : "Jeff''s Burgers", "desc"

    : "Blah Blah dirty burgers...", "reviews" : [ { "review" : "Yadda, yadda, yadda", "reviewer" : "John Smith" }, { "review" : "na na na", "reviewer" : "Billy Badger" } ] }'
  105. curl -XGET '.../eateries/_search?q=reviews.review:Burger'

  106. Given I am a hungry person When I search for

    somewhere to eat Then I should see results for similar words
  107. burgers I really enjoyed Jeff's Burger Joint, the burgers were

    excellent! can't fault it standard tokenizer I really enjoyed Jeff's Burger Joint the were excellent can't fault it
  108. excellent! burgers I really enjoyed Jeff's Burger Joint, the burgers

    were excellent! can't fault it whitespace tokenizer I really enjoyed Jeff's Burger Joint, the were can't fault it
  109. burgers I really enjoyed Jeff's Burger Joint, the burgers were

    excellent! can't fault it letter tokenizer I really enjoyed Jeff Burger Joint the were excellent can fault it s t
  110. burgers I really enjoyed Jeff's Burger Joint, the burgers were

    excellent! can't fault it standard tokenizer + lowercase filter i really enjoyed jeff's burger joint the were excellent can't fault it
  111. burgers I really enjoyed Jeff's Burger Joint, the burgers were

    excellent! can't fault it standard tokenizer + lowercase filter + stop filter i really enjoyed jeff's burger joint were excellent can't fault
  112. burger I really enjoyed Jeff's Burger Joint, the burgers were

    excellent! can't fault it standard tokenizer + lowercase filter + stop filter + stemmer filter i realli enjoi jeff burger joint were excel can't fault
  113. curl -XPUT 'http://localhost:9200/eatly/' -d '{ "settings" : { "index" :

    { "analysis" : { "analyzer" : { "snowball_analyzer" : { "type" : "snowball" } } } } } }'
  114. curl -XPUT '.../eatly/eateries/_mapping' -d '{ "eateries" : { "properties" :

    { "name" : {"type" : "string", "boost" : "1.5"}, "desc" : { "type" : "string", "analyzer": "snowball_analyzer" } } } }'
  115. curl -XPUT 'http://localhost:9200/eatly/' -d '{ "settings" : { "index" :

    { "analysis" : { "analyzer" : { "custom_analyzer" : { "type" : "custom", "tokenizer" : "lowercase" "filter" : ["stop", "extra_stop"] } }, "filter" : { "extra_stop":{ "type" : "stop", "stopwords" : ["food"] } } } } } }'
  116. curl -XPUT '.../eatly/eateries/_mapping' -d '{ "eateries" : { "properties" :

    { "name" : {"type" : "string", "boost" : "1.5"}, "desc" : { "type" : "string", "analyzer": "custom_analyzer" } } } }'
  117. query DSL

  118. curl -XGET '.../eatly/eateries/_search' -d '{ "query" : { "match" :

    { "name" : "burger" } } }'
  119. curl -XGET '.../eatly/eateries/_search' -d '{ "query" : { "match" :

    { "name" : "burger bar" } } }'
  120. curl -XGET '.../eatly/eateries/_search' -d '{ "query" : { "match" :

    { "name" : { "query" : "burger bar", "operator" : "and" } } } }'
  121. curl -XGET '.../eatly/eateries/_search' -d '{ "query" : { "multi_match" :

    { "query" : "burger bar", "fields" : ["name","desc","reviews.review"] } } }'
  122. curl -XGET '.../eatly/eateries/_search' -d '{ "query" : { "multi_match" :

    { "query" : "burger bar", "fields" : ["name^2","desc","reviews.review"] } } }'
  123. filters

  124. Given I am a hungry person When I start searching

    for somewhere to eat Then I should be able to filter my results
  125. curl -XPUT 'http://localhost:9200/eatly/eateries/1' -d '{ "name" : "Jeff''s Burger Joint",

    "desc" : "Blah Blah dirty burgers...", "tags" : ["greasy", "retro"] }'
  126. curl -XGET '.../eatly/eateries/_search' -d '{ "query" : { "multi_match" :

    { "query" : "burger bar", "fields" : ["name^2", "desc", "reviews.review"] } }, "filter" : { "not" : { "terms" : { "tags" : ["greasy", "meaty"]} } } }'
  127. query or filter?

  128. Given I am a hungry person When I search for

    somewhere to eat near me Then I should see results close to my location
  129. curl -XPUT '.../eatly/eateries/_mapping' -d '{ "eateries" : { "properties" :

    { "name" : {"type" : "string", "boost" : "1.5"}, "desc" : {"type" : "string"} "reviews" : { "properties" : { "review" : {"type" : "string"}, "reviewer" : {"type" : "string"} } }, "location" : { "type" : "geo_point" } } } }'
  130. curl -XPOST 'http://localhost:9200/eatly/eateries/' -d '{ "name" : "Jeff''s Burgers", "desc"

    : "Blah Blah dirty burgers...", "reviews" : [ { "review" : "Yadda, yadda, yadda", "reviewer" : "John Smith" }, { "review" : "na na na", "reviewer" : "Billy Badger" } ], "location" : { "lat" : 41.12, "lon" : -71.34 } }'
  131. curl -XGET '.../eatly/eateries/_search' -d '{ "sort" : [ { "_geo_distance"

    : { "eatery.location" : [-40, 70], "order" : "asc", "unit" : "miles" } } ], "query" : { "multi_match" : { "query" : "burger bar", "fields" : ["name^2","desc","reviews.review"] } } }'
  132. sense (chrome extension)

  133. head (elasticsearch plugin)

  134. elastica

  135. sherlock elasticsearch elasticsearch-php (official)

  136. { "require": { "ruflin/elastica": "v0.90.2.0" } }

  137. $client = new \Elastica\Client();

  138. $index = $client->getIndex("eatly"); $type = $index->getType("eateries");

  139. $type->addDocument( new \Elastica\Document( null, array( "name" => "Jeff's Burger Joint",

    "desc" => "Blah Blah..." ) ) );
  140. $type->addDocument( new \Elastica\Document( $id, array( "id" => $id, "name" =>

    "Jeff's Burger Joint", "desc" => "Blah Blah..." ) ) );
  141. $type->addDocuments($documents); $type->getDocument($id); $type->deleteId($id); $type->deleteDocument($document); $type->delete();

  142. $type->search("burger"); $type->search("burger", 10); $type->count("burger");

  143. $multiMatchQuery = new \Elastica\Query\MultiMatch(); $multiMatchQuery ->setQuery("burgers bar") ->setFields( array("name^2", "desc",

    "reviews.review") ) ; $query = new \Elastica\Query($multiMatchQuery); $query->setFilter( new \Elastica\Filter\BoolNot( new \Elastica\Filter\Terms( "tags", array("greasy", "meaty") ) ) ); $results = $type->search($query);
  144. $mapping = new \Elastica\Type\Mapping(); $mapping->setType($type); $mapping->setProperties( array( "name" => array(

    "type" => "string" ), "desc" => array( "type" => "string", "boost" => 1.5 ) ) ); $mapping->send();
  145. Given I am a hungry person When I search for

    somewhere to eat Then I should see relevant results And the results should be ordered by relevance ✔
  146. Given I am a hungry person When I search for

    somewhere to eat Then I should see results for relevant reviews ✔
  147. Given I am a hungry person When I search for

    somewhere to eat Then I should see results for similar words ✔
  148. Given I am a hungry person When I search for

    somewhere to eat near me Then I should see results close to my location ✔
  149. Given I am a hungry person When I start searching

    for somewhere to eat Then I should be able to filter my results ?
  150. $facet = new \Elastica\Facet\Terms("tags"); $facet->setField("tags"); $facet->setSize(10); $facet->setOrder("reverse_count"); $query->addFacet($facet);

  151. $facets = $resultSet->getFacets(); foreach ($facets["tags"]["terms"] as $facet) { printf( "%s:

    %s\n", $facet["term"], $facet["count"] ); } thai: 4 italian: 8 greek:4
  152. Given I am a hungry person When I start view

    details of somewhere to eat Then I should see suggestions of similar eateries
  153. $type->moreLikeThis($document);

  154. $type->moreLikeThis( new \Elastica\Document($id) );

  155. Given I am a restaurateur When I upload my PDF

    menu Then hungry people should be able to search it
  156. bin/plugin -install \ elasticsearch/elasticsearch-mapper-attachments/1.9.0

  157. $propertyMapping = array( //... "menu" => array("type" => "attachment"), )

  158. $document->addFile('menu', $filepath);

  159. Given I am a hungry person When I typo entering

    my search terms Then I should see “did you mean?” suggestions
  160. $query = new \Elastica\Query( array( "query" => array(...), "suggest" =>

    array( "check1" => array( "text" => "berger", "term" => array( "field" => "name" ) ) ) ) );
  161. $data = $results->getResponse()->getData(); print_r($data["suggest"]);

  162. Array ( [check1] => Array ( [0] => Array (

    [text] => berger [offset] => 0 [length] => 6 [options] => Array ( [0] => Array ( [text] => burger [score] => 0.8333333 [freq] => 1 ) ) ) ) )
  163. Given I am a hungry person When I start entering

    my search terms Then I should see suggestions straight away
  164. $mapping->setProperties( array( "name" => array( "type" => "string" ), "desc"

    => array( "type" => "string", "boost" => 1.5 ), "name_suggest" => array( "type" => "completion" ) ) );
  165. new \Elastica\Document( $id, array( "id" => $id, "name" => "Jeff's

    Burger Joint", "name_suggest" => "Jeff's Burger Joint", "desc" => "Blah Blah..." ) );
  166. $query = array( "eateries" => array( "text" => 'j', "completion"

    => array( "field" => "name_suggest", ) ) );
  167. $response = $index->request( "_suggest", \Elastica\Request::GET, $query ); $data = $response->getData();

    print_r($data["eateries"]);
  168. Array ( [0] => Array ( [text] => j [offset]

    => 0 [length] => 1 [options] => Array ( [0] => Array ( [text] => Jeffs Burger Joint [score] => 1 ) ) ) )
  169. scripts

  170. percolator scripts

  171. percolator river scripts

  172. data analysis percolator river scripts

  173. data analysis percolator logstash river scripts

  174. thank you! questions? @mr_r_miller