Upgrade to Pro — share decks privately, control downloads, hide ads and more …

better searching with elasticsearch - PHPConfPL

better searching with elasticsearch - PHPConfPL

Elasticsearch is a distributed, schemaless, document oriented, Lucene based search engine with a REST API. This talk looks at what that all that actually means in practice moving from interacting with it directly with cURL to integrating it into PHP applications using Elastica.

Ab44158da0498db70754ee8061e69c31?s=128

Richard Miller

October 26, 2013
Tweet

Transcript

  1. better searching with elasticsearch Sunday, 3 November 2013

  2. richard miller @mr_r_miller richarddmiller.co.uk sensiolabs uk Sunday, 3 November 2013

  3. why worry about search? Sunday, 3 November 2013

  4. ok, so what might someone want from search? Sunday, 3

    November 2013
  5. Given I am a hungry person When I search for

    somewhere to eat Then I should see relevant results And they should be ordered by relevance Sunday, 3 November 2013
  6. yep, no problem Sunday, 3 November 2013

  7. Given I am a hungry person When I search for

    somewhere to eat Then I should see results for relevant reviews Sunday, 3 November 2013
  8. sure Sunday, 3 November 2013

  9. Given I am a hungry person When I search for

    somewhere to eat Then I should see results for similar words Sunday, 3 November 2013
  10. yeah, we can do that do that Sunday, 3 November

    2013
  11. Given I am a hungry person When I search for

    somewhere to eat near me Then I should see results close to my location Sunday, 3 November 2013
  12. that’s certainly possible Sunday, 3 November 2013

  13. Given I am a hungry person When I typo entering

    my search terms Then I should see “did you mean?” suggestions Sunday, 3 November 2013
  14. well other sites do that, so why not? Sunday, 3

    November 2013
  15. Given I am a hungry person When I start entering

    my search terms Then I should see suggestions straight away Sunday, 3 November 2013
  16. ok, ok Sunday, 3 November 2013

  17. Given I am a hungry person When I start searching

    for somewhere to eat Then I should be able to filter my results Sunday, 3 November 2013
  18. now this we can do Sunday, 3 November 2013

  19. Given I am a hungry person When I start view

    details of somewhere to eat Then I should see suggestions of similar eateries Sunday, 3 November 2013
  20. yep, yep Sunday, 3 November 2013

  21. Given I am a restaurateur When I upload my PDF

    menu Then hungry people should be able to search it Sunday, 3 November 2013
  22. hmm... Sunday, 3 November 2013

  23. well let’s get the easy stuff out of the way

    Sunday, 3 November 2013
  24. Given I am a hungry person When I search for

    somewhere to eat Then I should see relevant results And they should be ordered by relevance Sunday, 3 November 2013
  25. SELECT * FROM eateries WHERE NAME LIKE “%searchterm%” Sunday, 3

    November 2013
  26. multiple searchterms? Sunday, 3 November 2013

  27. ...WHERE ( name LIKE “%multiple%” OR name LIKE “%searchterm%”) Sunday,

    3 November 2013
  28. but wait... Sunday, 3 November 2013

  29. but wait... ...multiple fields Sunday, 3 November 2013

  30. ...WHERE ( name LIKE “%multiple%” OR name LIKE “%searchterm%”... Sunday,

    3 November 2013
  31. ...OR desc LIKE “%multiple%” OR desc LIKE “%searchterm%”) Sunday, 3

    November 2013
  32. Given I am a hungry person When I search for

    somewhere to eat Then I should see relevant results And they should be ordered by relevance Sunday, 3 November 2013
  33. ORDER? Sunday, 3 November 2013

  34. Given I am a hungry person When I search for

    somewhere to eat Then I should see relevant results And they should be ordered by relevance Sunday, 3 November 2013
  35. the Sunday, 3 November 2013

  36. the with Sunday, 3 November 2013

  37. the with be Sunday, 3 November 2013

  38. the but with be Sunday, 3 November 2013

  39. the but with be it Sunday, 3 November 2013

  40. it the but with be food Sunday, 3 November 2013

  41. oh ok, Sunday, 3 November 2013

  42. oh ok, full text search... Sunday, 3 November 2013

  43. SELECT * FROM eateries MATCH (name, desc) AGAINST( “multiple searchterms”)

    Sunday, 3 November 2013
  44. that should do! Sunday, 3 November 2013

  45. what was left? Sunday, 3 November 2013

  46. Given I am a hungry person When I search for

    somewhere to eat Then I should see results for relevant reviews Sunday, 3 November 2013
  47. ...LEFT JOIN... Sunday, 3 November 2013

  48. Given I am a hungry person When I search for

    somewhere to eat Then I should see results for similar words Sunday, 3 November 2013
  49. grilled Sunday, 3 November 2013

  50. grilling grilled Sunday, 3 November 2013

  51. grilling grilled grills Sunday, 3 November 2013

  52. grill grilling grilled grills Sunday, 3 November 2013

  53. Given I am a hungry person When I search for

    somewhere to eat near me Then I should see results close to my location Sunday, 3 November 2013
  54. elasticsearch Sunday, 3 November 2013

  55. distributed, schemaless, document oriented, Lucene based search engine with a

    REST API Sunday, 3 November 2013
  56. github Sunday, 3 November 2013

  57. stackoverflow Sunday, 3 November 2013

  58. gov.uk Sunday, 3 November 2013

  59. foursquare Sunday, 3 November 2013

  60. soundcloud Sunday, 3 November 2013

  61. wget https://download.elasticsearch.org/ elasticsearch/elasticsearch/ elasticsearch-0.90.5.tar.gz tar -zxvf elasticsearch-0.90.5.tar.gz cd elasticsearch-0.90.5 Sunday,

    3 November 2013
  62. bin/elasticsearch Sunday, 3 November 2013

  63. Lucene based Sunday, 3 November 2013

  64. request curl -X GET http://localhost:9200/ Sunday, 3 November 2013

  65. { "ok" : true, "status" : 200, "name" : "Mr.

    Wu", "version" : { "number" : "0.90.5", "build_hash" : "c8714e8...dedee", "build_timestamp" : "2013-09-17T12:50:20Z", "build_snapshot" : false, "lucene_version" : "4.4" }, "tagline" : "You Know, for Search" } response Sunday, 3 November 2013
  66. database index Sunday, 3 November 2013

  67. table type Sunday, 3 November 2013

  68. record document Sunday, 3 November 2013

  69. distributed Sunday, 3 November 2013

  70. 0 1 2 3 4 Instance Sunday, 3 November 2013

  71. 0 1 Instance 1 Instance 2 Instance 3 2 3

    4 Sunday, 3 November 2013
  72. 0 1 2 3 4 Instance 1 Instance 2 Instance

    3 0 1 2 3 4 Sunday, 3 November 2013
  73. document oriented Sunday, 3 November 2013

  74. REST API Sunday, 3 November 2013

  75. curl -XPOST 'http://localhost:9200/eatly/eateries/' -d '{ "name" : "Jeff''s Burgers", "desc"

    : "Blah Blah dirty burgers..." }' Sunday, 3 November 2013
  76. curl -XPOST 'http://localhost:9200/eatly/eateries/' -d '{ "name" : "Jeff''s Burgers", "desc"

    : "Blah Blah dirty burgers..." }' Sunday, 3 November 2013
  77. curl -XPOST 'http://localhost:9200/eatly/eateries/' -d '{ "name" : "Jeff''s Burgers", "desc"

    : "Blah Blah dirty burgers..." }' Sunday, 3 November 2013
  78. {"ok":true,"_index":"eatly","_type":"eateries","_id":"EGHS ABBxQkaPR_DJxPxXZA","_version":1} Sunday, 3 November 2013

  79. ?pretty=true Sunday, 3 November 2013

  80. { "ok" : true, "_index" : "eatly", "_type" : "eateries",

    "_id" : "cmElnobYSy6386TOUVbGZQ", "_version" : 1 } Sunday, 3 November 2013
  81. curl -XPUT 'http://localhost:9200/eatly/eateries/1' -d '{ "name" : "Jeff''s Burgers", "desc"

    : "Blah Blah dirty burgers..." }' Sunday, 3 November 2013
  82. { "ok" : true, "_index" : "eatly", "_type" : "eateries",

    "_id" : "1", "_version" : 1 } Sunday, 3 November 2013
  83. curl -XGET 'http://localhost:9200/eatly/eateries/1' Sunday, 3 November 2013

  84. { "_index" : "eatly", "_type" : "eateries", "_id" : "1",

    "_version" : 1, "exists" : true, "_source" : { "name" : "Jeff's Burgers", "desc" : "Blah Blah dirty burgers..." } } Sunday, 3 November 2013
  85. curl -XPUT 'http://localhost:9200/eatly/eateries/1' -d '{ "name" : "Jeff''s Burger Joint"

    }' Sunday, 3 November 2013
  86. { "ok" : true, "_index" : "eatly", "_type" : "eateries",

    "_id" : "1", "_version" : 2 } Sunday, 3 November 2013
  87. curl -XGET 'http://localhost:9200/eatly/eateries/1' Sunday, 3 November 2013

  88. { "_index" : "eatly", "_type" : "eateries", "_id" : "1",

    "_version" : 2, "exists" : true, "_source" : { "name" : "Jeff's Burger Joint" } } Sunday, 3 November 2013
  89. curl -XDELETE 'http://localhost:9200/eatly/eateries/1' Sunday, 3 November 2013

  90. curl -XGET 'http://localhost:9200/eatly/eateries/1' Sunday, 3 November 2013

  91. { "_index" : "eatly", "_type" : "eateries", "_id" : "1",

    "_version" : 2, "exists" : false } Sunday, 3 November 2013
  92. search engine Sunday, 3 November 2013

  93. curl -XGET '.../eatly/eateries/_search?q=name:Burger' Sunday, 3 November 2013

  94. { "took" : 7, "timed_out" : false, "_shards" : {

    "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 1, "max_score" : 0.15342641, "hits" : [ { "_index" : "eatly", "_type" : "eateries", "_id" : "1", "_score" : 0.15342641, "_source" : {"name" : "Jeff's Burger Joint"} } ] } } Sunday, 3 November 2013
  95. http://localhost:9200/eatly/cafes,pubs/_search?q=Burger http://localhost:9200/eatly/_search?q=name:Burger http://localhost:9200/eatly,cookr/_search?q=name:Burger http://localhost:9200/_all/pubs/_search?q=name:Burger http://localhost:9200/_search?q=name:Burger' Sunday, 3 November 2013

  96. schemaless? Sunday, 3 November 2013

  97. schemaless? Sunday, 3 November 2013

  98. curl -XPUT 'http://localhost:9200/eatly/' -d '{ "settings" : { "index" :

    { "number_of_shards" : 4 "number_of_replicas" : 2 } } }' Sunday, 3 November 2013
  99. Given I am a hungry person When I search for

    somewhere to eat Then I should see relevant results And they should be ordered by relevance Sunday, 3 November 2013
  100. name > desc Sunday, 3 November 2013

  101. curl -XPUT '.../eatly/eateries/_mapping' -d '{ "eateries" : { "properties" :

    { "name" : {"type" : "string", "boost" : "1.5"}, "desc" : {"type" : "string"} } } }' Sunday, 3 November 2013
  102. Given I am a hungry person When I search for

    somewhere to eat Then I should see results for relevant reviews Sunday, 3 November 2013
  103. curl -XPUT '.../eatly/eateries/_mapping' -d '{ "eateries" : { "properties" :

    { "name" : {"type" : "string", "boost" : "1.5"}, "desc" : {"type" : "string"} "reviews" : { "properties" : { "review" : {"type" : "string"}, "reviewer" : {"type" : "string"} } } } } }' Sunday, 3 November 2013
  104. curl -XPOST 'http://localhost:9200/eatly/eateries/' -d '{ "name" : "Jeff''s Burgers", "desc"

    : "Blah Blah dirty burgers...", "reviews" : [ { "review" : "Yadda, yadda, yadda", "reviewer" : "John Smith" }, { "review" : "na na na", "reviewer" : "Billy Badger" } ] }' Sunday, 3 November 2013
  105. curl -XGET '.../eateries/_search?q=reviews.review:Burger' Sunday, 3 November 2013

  106. Given I am a hungry person When I search for

    somewhere to eat Then I should see results for similar words Sunday, 3 November 2013
  107. burgers I really enjoyed Jeff's Burger Joint, the burgers were

    excellent! can't fault it standard tokenizer I really enjoyed Jeff's Burger Joint the were excellent can't fault it Sunday, 3 November 2013
  108. excellent! burgers I really enjoyed Jeff's Burger Joint, the burgers

    were excellent! can't fault it whitespace tokenizer I really enjoyed Jeff's Burger Joint, the were can't fault it Sunday, 3 November 2013
  109. burgers I really enjoyed Jeff's Burger Joint, the burgers were

    excellent! can't fault it letter tokenizer I really enjoyed Jeff Burger Joint the were excellent can fault it s t Sunday, 3 November 2013
  110. burgers I really enjoyed Jeff's Burger Joint, the burgers were

    excellent! can't fault it standard tokenizer + lowercase filter i really enjoyed jeff's burger joint the were excellent can't fault it Sunday, 3 November 2013
  111. burgers I really enjoyed Jeff's Burger Joint, the burgers were

    excellent! can't fault it standard tokenizer + lowercase filter + stop filter i really enjoyed jeff's burger joint were excellent can't fault Sunday, 3 November 2013
  112. burger I really enjoyed Jeff's Burger Joint, the burgers were

    excellent! can't fault it standard tokenizer + lowercase filter + stop filter + stemmer filter i realli enjoi jeff burger joint were excel can't fault Sunday, 3 November 2013
  113. curl -XPUT 'http://localhost:9200/eatly/' -d '{ "settings" : { "index" :

    { "analysis" : { "analyzer" : { "snowball_analyzer" : { "type" : "snowball" } } } } } }' Sunday, 3 November 2013
  114. curl -XPUT '.../eatly/eateries/_mapping' -d '{ "eateries" : { "properties" :

    { "name" : {"type" : "string", "boost" : "1.5"}, "desc" : { "type" : "string", "analyzer": "snowball_analyzer" } } } }' Sunday, 3 November 2013
  115. curl -XPUT 'http://localhost:9200/eatly/' -d '{ "settings" : { "index" :

    { "analysis" : { "analyzer" : { "custom_analyzer" : { "type" : "custom", "tokenizer" : "lowercase" "filter" : ["stop", "extra_stop"] } }, "filter" : { "extra_stop":{ "type" : "stop", "stopwords" : ["food"] } } } } } }' Sunday, 3 November 2013
  116. curl -XPUT '.../eatly/eateries/_mapping' -d '{ "eateries" : { "properties" :

    { "name" : {"type" : "string", "boost" : "1.5"}, "desc" : { "type" : "string", "analyzer": "custom_analyzer" } } } }' Sunday, 3 November 2013
  117. query DSL Sunday, 3 November 2013

  118. curl -XGET '.../eatly/eateries/_search' -d '{ "query" : { "match" :

    { "name" : "burger" } } }' Sunday, 3 November 2013
  119. curl -XGET '.../eatly/eateries/_search' -d '{ "query" : { "match" :

    { "name" : "burger bar" } } }' Sunday, 3 November 2013
  120. curl -XGET '.../eatly/eateries/_search' -d '{ "query" : { "match" :

    { "name" : { "query" : "burger bar", "operator" : "and" } } } }' Sunday, 3 November 2013
  121. curl -XGET '.../eatly/eateries/_search' -d '{ "query" : { "multi_match" :

    { "query" : "burger bar", "fields" : ["name","desc","reviews.review"] } } }' Sunday, 3 November 2013
  122. curl -XGET '.../eatly/eateries/_search' -d '{ "query" : { "multi_match" :

    { "query" : "burger bar", "fields" : ["name^2","desc","reviews.review"] } } }' Sunday, 3 November 2013
  123. filters Sunday, 3 November 2013

  124. Given I am a hungry person When I start searching

    for somewhere to eat Then I should be able to filter my results Sunday, 3 November 2013
  125. curl -XPUT 'http://localhost:9200/eatly/eateries/1' -d '{ "name" : "Jeff''s Burger Joint",

    "desc" : "Blah Blah dirty burgers...", "tags" : ["greasy", "retro"] }' Sunday, 3 November 2013
  126. curl -XGET '.../eatly/eateries/_search' -d '{ "query" : { "multi_match" :

    { "query" : "burger bar", "fields" : ["name^2", "desc", "reviews.review"] } }, "filter" : { "not" : { "terms" : { "tags" : ["greasy", "meaty"]} } } }' Sunday, 3 November 2013
  127. query or filter? Sunday, 3 November 2013

  128. Given I am a hungry person When I search for

    somewhere to eat near me Then I should see results close to my location Sunday, 3 November 2013
  129. curl -XPUT '.../eatly/eateries/_mapping' -d '{ "eateries" : { "properties" :

    { "name" : {"type" : "string", "boost" : "1.5"}, "desc" : {"type" : "string"} "reviews" : { "properties" : { "review" : {"type" : "string"}, "reviewer" : {"type" : "string"} } }, "location" : { "type" : "geo_point" } } } }' Sunday, 3 November 2013
  130. curl -XPOST 'http://localhost:9200/eatly/eateries/' -d '{ "name" : "Jeff''s Burgers", "desc"

    : "Blah Blah dirty burgers...", "reviews" : [ { "review" : "Yadda, yadda, yadda", "reviewer" : "John Smith" }, { "review" : "na na na", "reviewer" : "Billy Badger" } ], "location" : { "lat" : 41.12, "lon" : -71.34 } }' Sunday, 3 November 2013
  131. curl -XGET '.../eatly/eateries/_search' -d '{ "sort" : [ { "_geo_distance"

    : { "eatery.location" : [-40, 70], "order" : "asc", "unit" : "miles" } } ], "query" : { "multi_match" : { "query" : "burger bar", "fields" : ["name^2","desc","reviews.review"] } } }' Sunday, 3 November 2013
  132. sense (chrome extension) Sunday, 3 November 2013

  133. head (elasticsearch plugin) Sunday, 3 November 2013

  134. elastica Sunday, 3 November 2013

  135. sherlock elasticsearch elasticsearch-php (official) Sunday, 3 November 2013

  136. { "require": { "ruflin/elastica": "v0.90.2.0" } } Sunday, 3 November

    2013
  137. $client = new \Elastica\Client(); Sunday, 3 November 2013

  138. $index = $client->getIndex("eatly"); $type = $index->getType("eateries"); Sunday, 3 November 2013

  139. $type->addDocument( new \Elastica\Document( null, array( "name" => "Jeff's Burger Joint",

    "desc" => "Blah Blah..." ) ) ); Sunday, 3 November 2013
  140. $type->addDocument( new \Elastica\Document( $id, array( "id" => $id, "name" =>

    "Jeff's Burger Joint", "desc" => "Blah Blah..." ) ) ); Sunday, 3 November 2013
  141. $type->addDocuments($documents); $type->getDocument($id); $type->deleteId($id); $type->deleteDocument($document); $type->delete(); Sunday, 3 November 2013

  142. $type->search("burger"); $type->search("burger", 10); $type->count("burger"); Sunday, 3 November 2013

  143. $multiMatchQuery = new \Elastica\Query\MultiMatch(); $multiMatchQuery ->setQuery("burgers bar") ->setFields( array("name^2", "desc",

    "reviews.review") ) ; $query = new \Elastica\Query($multiMatchQuery); $query->setFilter( new \Elastica\Filter\BoolNot( new \Elastica\Filter\Terms( "tags", array("greasy", "meaty") ) ) ); $results = $type->search($query); Sunday, 3 November 2013
  144. $mapping = new \Elastica\Type\Mapping(); $mapping->setType($type); $mapping->setProperties( array( "name" => array(

    "type" => "string" ), "desc" => array( "type" => "string", "boost" => 1.5 ) ) ); $mapping->send(); Sunday, 3 November 2013
  145. Given I am a hungry person When I search for

    somewhere to eat Then I should see relevant results And they should be ordered by relevance ✔ Sunday, 3 November 2013
  146. Given I am a hungry person When I search for

    somewhere to eat Then I should see results for relevant reviews ✔ Sunday, 3 November 2013
  147. Given I am a hungry person When I search for

    somewhere to eat Then I should see results for similar words ✔ Sunday, 3 November 2013
  148. Given I am a hungry person When I search for

    somewhere to eat near me Then I should see results close to my location ✔ Sunday, 3 November 2013
  149. Given I am a hungry person When I start searching

    for somewhere to eat Then I should be able to filter my results ? Sunday, 3 November 2013
  150. $facet = new \Elastica\Facet\Terms("tags"); $facet->setField("tags"); $facet->setSize(10); $facet->setOrder("reverse_count"); $query->addFacet($facet); Sunday, 3

    November 2013
  151. $facets = $resultSet->getFacets(); foreach ($facets["tags"]["terms"] as $facet) { printf( "%s:

    %s\n", $facet["term"], $facet["count"] ); } thai: 4 italian: 8 greek:4 Sunday, 3 November 2013
  152. Given I am a hungry person When I start view

    details of somewhere to eat Then I should see suggestions of similar eateries Sunday, 3 November 2013
  153. $type->moreLikeThis($document); Sunday, 3 November 2013

  154. $type->moreLikeThis( new \Elastica\Document($id) ); Sunday, 3 November 2013

  155. Given I am a restaurateur When I upload my PDF

    menu Then hungry people should be able to search it Sunday, 3 November 2013
  156. bin/plugin -install \ elasticsearch/elasticsearch-mapper-attachments/1.9.0 Sunday, 3 November 2013

  157. $propertyMapping = array( //... "menu" => array("type" => "attachment"), )

    Sunday, 3 November 2013
  158. $document->addFile('menu', $filepath); Sunday, 3 November 2013

  159. Given I am a hungry person When I typo entering

    my search terms Then I should see “did you mean?” suggestions Sunday, 3 November 2013
  160. $query = new \Elastica\Query( array( "query" => array(...), "suggest" =>

    array( "check1" => array( "text" => "berger", "term" => array( "field" => "name" ) ) ) ) ); Sunday, 3 November 2013
  161. $data = $results->getResponse()->getData(); print_r($data["suggest"]); Sunday, 3 November 2013

  162. Array ( [check1] => Array ( [0] => Array (

    [text] => berger [offset] => 0 [length] => 6 [options] => Array ( [0] => Array ( [text] => burger [score] => 0.8333333 [freq] => 1 ) ) ) ) ) Sunday, 3 November 2013
  163. Given I am a hungry person When I start entering

    my search terms Then I should see suggestions straight away Sunday, 3 November 2013
  164. $mapping->setProperties( array( "name" => array( "type" => "string" ), "desc"

    => array( "type" => "string", "boost" => 1.5 ), "name_suggest" => array( "type" => "completion" ) ) ); Sunday, 3 November 2013
  165. new \Elastica\Document( $id, array( "id" => $id, "name" => "Jeff's

    Burger Joint", "name_suggest" => "Jeff's Burger Joint", "desc" => "Blah Blah..." ) ); Sunday, 3 November 2013
  166. $query = array( "eateries" => array( "text" => 'j', "completion"

    => array( "field" => "name_suggest", ) ) ); Sunday, 3 November 2013
  167. $response = $index->request( "_suggest", \Elastica\Request::GET, $query ); $data = $response->getData();

    print_r($data["eateries"]); Sunday, 3 November 2013
  168. Array ( [0] => Array ( [text] => j [offset]

    => 0 [length] => 1 [options] => Array ( [0] => Array ( [text] => Jeffs Burger Joint [score] => 1 ) ) ) ) Sunday, 3 November 2013
  169. scripts Sunday, 3 November 2013

  170. percolator scripts Sunday, 3 November 2013

  171. percolator river scripts Sunday, 3 November 2013

  172. data analysis percolator river scripts Sunday, 3 November 2013

  173. data analysis percolator logstash river scripts Sunday, 3 November 2013

  174. thank you! pytania? @mr_r_miller Sunday, 3 November 2013