Upgrade to Pro — share decks privately, control downloads, hide ads and more …

better searching with elasticsearch

better searching with elasticsearch

Elasticsearch is a distributed, schemaless, document oriented, Lucene based search engine with a REST API. This talk looks at what that all that actually means in practice moving from interacting with it directly with cURL to integrating it into PHP applications using Elastica.

Richard Miller

October 23, 2013
Tweet

More Decks by Richard Miller

Other Decks in Technology

Transcript

  1. better
    searching with
    elasticsearch

    View Slide

  2. richard miller
    @mr_r_miller
    richarddmiller.co.uk
    sensiolabs uk

    View Slide

  3. why worry
    about search?

    View Slide

  4. ok, so what
    might
    someone want
    from search?

    View Slide

  5. Given I am a hungry person
    When I search for somewhere to eat
    Then I should see relevant results
    And the results should be ordered by relevance

    View Slide

  6. yep, no problem

    View Slide

  7. Given I am a hungry person
    When I search for somewhere to eat
    Then I should see results for relevant reviews

    View Slide

  8. sure

    View Slide

  9. Given I am a hungry person
    When I search for somewhere to eat
    Then I should see results for similar words

    View Slide

  10. yeah, we can do
    that do that

    View Slide

  11. Given I am a hungry person
    When I search for somewhere to eat near me
    Then I should see results close to my location

    View Slide

  12. that’s
    certainly
    possible

    View Slide

  13. Given I am a hungry person
    When I typo entering my search terms
    Then I should see “did you mean?” suggestions

    View Slide

  14. well other
    sites do that,
    so why not?

    View Slide

  15. Given I am a hungry person
    When I start entering my search terms
    Then I should see suggestions straight away

    View Slide

  16. ok, ok

    View Slide

  17. Given I am a hungry person
    When I start searching for somewhere to eat
    Then I should be able to filter my results

    View Slide

  18. now this we
    can do

    View Slide

  19. Given I am a hungry person
    When I start view details of somewhere to eat
    Then I should see suggestions of similar eateries

    View Slide

  20. yep, yep

    View Slide

  21. Given I am a restaurateur
    When I upload my PDF menu
    Then hungry people should be able to search it

    View Slide

  22. hmm...

    View Slide

  23. well let’s get
    the easy stuff
    out of the way

    View Slide

  24. Given I am a hungry person
    When I search for somewhere to eat
    Then I should see relevant results
    And the results should be ordered by relevance

    View Slide

  25. SELECT *
    FROM
    eateries
    WHERE NAME
    LIKE
    “%searchterm%”

    View Slide

  26. multiple
    searchterms?

    View Slide

  27. ...WHERE (
    name LIKE
    “%multiple%”
    OR
    name LIKE
    “%searchterm%”)

    View Slide

  28. but wait...

    View Slide

  29. but wait...
    ...multiple fields

    View Slide

  30. ...WHERE (
    name LIKE
    “%multiple%”
    OR
    name LIKE
    “%searchterm%”...

    View Slide

  31. ...OR
    desc LIKE
    “%multiple%”
    OR
    desc LIKE
    “%searchterm%”)

    View Slide

  32. Given I am a hungry person
    When I search for somewhere to eat
    Then I should see relevant results
    And the results should be ordered by relevance

    View Slide

  33. ORDER?

    View Slide

  34. Given I am a hungry person
    When I search for somewhere to eat
    Then I should see relevant results
    And the results should be ordered by relevance

    View Slide

  35. the

    View Slide

  36. the
    with

    View Slide

  37. the
    with
    be

    View Slide

  38. the
    but
    with
    be

    View Slide

  39. the
    but
    with
    be
    it

    View Slide

  40. it
    the
    but
    with
    be
    food

    View Slide

  41. oh ok,

    View Slide

  42. oh ok,
    full text search...

    View Slide

  43. SELECT *
    FROM
    eateries
    MATCH (name, desc)
    AGAINST(
    “multiple
    searchterms”)

    View Slide

  44. that should do!

    View Slide

  45. what was left?

    View Slide

  46. Given I am a hungry person
    When I search for somewhere to eat
    Then I should see results for relevant reviews

    View Slide

  47. ...LEFT JOIN...

    View Slide

  48. Given I am a hungry person
    When I search for somewhere to eat
    Then I should see results for similar words

    View Slide

  49. grilled

    View Slide

  50. grilling
    grilled

    View Slide

  51. grilling
    grilled
    grills

    View Slide

  52. grill
    grilling
    grilled
    grills

    View Slide

  53. Given I am a hungry person
    When I search for somewhere to eat near me
    Then I should see results close to my location

    View Slide

  54. elasticsearch

    View Slide

  55. distributed,
    schemaless,
    document oriented,
    Lucene based
    search engine with a
    REST API

    View Slide

  56. github

    View Slide

  57. stackoverflow

    View Slide

  58. gov.uk

    View Slide

  59. foursquare

    View Slide

  60. soundcloud

    View Slide

  61. wget https://download.elasticsearch.org/
    elasticsearch/elasticsearch/
    elasticsearch-0.90.5.tar.gz
    tar -zxvf elasticsearch-0.90.5.tar.gz
    cd elasticsearch-0.90.5

    View Slide

  62. bin/elasticsearch

    View Slide

  63. Lucene based

    View Slide

  64. request
    curl -X GET http://localhost:9200/

    View Slide

  65. {
    "ok" : true,
    "status" : 200,
    "name" : "Mr. Wu",
    "version" : {
    "number" : "0.90.5",
    "build_hash" : "c8714e8...dedee",
    "build_timestamp" : "2013-09-17T12:50:20Z",
    "build_snapshot" : false,
    "lucene_version" : "4.4"
    },
    "tagline" : "You Know, for Search"
    }
    response

    View Slide

  66. database index

    View Slide

  67. table type

    View Slide

  68. record document

    View Slide

  69. distributed

    View Slide

  70. 0 1 2 3 4
    Instance

    View Slide

  71. 0
    1
    Instance 1
    Instance 2
    Instance 3
    2
    3
    4

    View Slide

  72. 0
    1
    2
    3
    4
    Instance 1
    Instance 2
    Instance 3
    0
    1
    2
    3
    4

    View Slide

  73. document oriented

    View Slide

  74. REST API

    View Slide

  75. curl -XPOST 'http://localhost:9200/eatly/eateries/' -d '{
    "name" : "Jeff''s Burgers",
    "desc" : "Blah Blah dirty burgers..."
    }'

    View Slide

  76. curl -XPOST 'http://localhost:9200/eatly/eateries/' -d '{
    "name" : "Jeff''s Burgers",
    "desc" : "Blah Blah dirty burgers..."
    }'

    View Slide

  77. curl -XPOST 'http://localhost:9200/eatly/eateries/' -d '{
    "name" : "Jeff''s Burgers",
    "desc" : "Blah Blah dirty burgers..."
    }'

    View Slide

  78. {"ok":true,"_index":"eatly","_type":"eateries","_id":"EGHS
    ABBxQkaPR_DJxPxXZA","_version":1}

    View Slide

  79. ?pretty=true

    View Slide

  80. {
    "ok" : true,
    "_index" : "eatly",
    "_type" : "eateries",
    "_id" : "cmElnobYSy6386TOUVbGZQ",
    "_version" : 1
    }

    View Slide

  81. curl -XPUT 'http://localhost:9200/eatly/eateries/1' -d '{
    "name" : "Jeff''s Burgers",
    "desc" : "Blah Blah dirty burgers..."
    }'

    View Slide

  82. {
    "ok" : true,
    "_index" : "eatly",
    "_type" : "eateries",
    "_id" : "1",
    "_version" : 1
    }

    View Slide

  83. curl -XGET 'http://localhost:9200/eatly/eateries/1'

    View Slide

  84. {
    "_index" : "eatly",
    "_type" : "eateries",
    "_id" : "1",
    "_version" : 1,
    "exists" : true,
    "_source" : {
    "name" : "Jeff's Burgers",
    "desc" : "Blah Blah dirty burgers..."
    }
    }

    View Slide

  85. curl -XPUT 'http://localhost:9200/eatly/eateries/1' -d '{
    "name" : "Jeff''s Burger Joint"
    }'

    View Slide

  86. {
    "ok" : true,
    "_index" : "eatly",
    "_type" : "eateries",
    "_id" : "1",
    "_version" : 2
    }

    View Slide

  87. curl -XGET 'http://localhost:9200/eatly/eateries/1'

    View Slide

  88. {
    "_index" : "eatly",
    "_type" : "eateries",
    "_id" : "1",
    "_version" : 2,
    "exists" : true,
    "_source" : {
    "name" : "Jeff's Burger Joint"
    }
    }

    View Slide

  89. curl -XDELETE 'http://localhost:9200/eatly/eateries/1'

    View Slide

  90. curl -XGET 'http://localhost:9200/eatly/eateries/1'

    View Slide

  91. {
    "_index" : "eatly",
    "_type" : "eateries",
    "_id" : "1",
    "_version" : 2,
    "exists" : false
    }

    View Slide

  92. search engine

    View Slide

  93. curl -XGET '.../eatly/eateries/_search?q=name:Burger'

    View Slide

  94. {
    "took" : 7,
    "timed_out" : false,
    "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
    },
    "hits" : {
    "total" : 1,
    "max_score" : 0.15342641,
    "hits" : [ {
    "_index" : "eatly",
    "_type" : "eateries",
    "_id" : "1",
    "_score" : 0.15342641,
    "_source" : {"name" : "Jeff's Burger Joint"}
    } ]
    }
    }

    View Slide

  95. http://localhost:9200/eatly/cafes,pubs/_search?q=Burger
    http://localhost:9200/eatly/_search?q=name:Burger
    http://localhost:9200/eatly,cookr/_search?q=name:Burger
    http://localhost:9200/_all/pubs/_search?q=name:Burger
    http://localhost:9200/_search?q=name:Burger'

    View Slide

  96. schemaless?

    View Slide

  97. schemaless?

    View Slide

  98. curl -XPUT 'http://localhost:9200/eatly/' -d '{
    "settings" : {
    "index" : {
    "number_of_shards" : 4
    "number_of_replicas" : 2
    }
    }
    }'

    View Slide

  99. Given I am a hungry person
    When I search for somewhere to eat
    Then I should see relevant results
    And the results should be ordered by relevance

    View Slide

  100. name > desc

    View Slide

  101. curl -XPUT '.../eatly/eateries/_mapping' -d '{
    "eateries" : {
    "properties" : {
    "name" : {"type" : "string", "boost" : "1.5"},
    "desc" : {"type" : "string"}
    }
    }
    }'

    View Slide

  102. Given I am a hungry person
    When I search for somewhere to eat
    Then I should see results for relevant reviews

    View Slide

  103. curl -XPUT '.../eatly/eateries/_mapping' -d '{
    "eateries" : {
    "properties" : {
    "name" : {"type" : "string", "boost" : "1.5"},
    "desc" : {"type" : "string"}
    "reviews" : {
    "properties" : {
    "review" : {"type" : "string"},
    "reviewer" : {"type" : "string"}
    }
    }
    }
    }
    }'

    View Slide

  104. curl -XPOST 'http://localhost:9200/eatly/eateries/' -d '{
    "name" : "Jeff''s Burgers",
    "desc" : "Blah Blah dirty burgers...",
    "reviews" : [
    {
    "review" : "Yadda, yadda, yadda",
    "reviewer" : "John Smith"
    },
    {
    "review" : "na na na",
    "reviewer" : "Billy Badger"
    }
    ]
    }'

    View Slide

  105. curl -XGET '.../eateries/_search?q=reviews.review:Burger'

    View Slide

  106. Given I am a hungry person
    When I search for somewhere to eat
    Then I should see results for similar words

    View Slide

  107. burgers
    I really enjoyed Jeff's Burger Joint, the
    burgers were excellent! can't fault it
    standard tokenizer
    I really enjoyed Jeff's Burger
    Joint the were
    excellent can't fault it

    View Slide

  108. excellent!
    burgers
    I really enjoyed Jeff's Burger Joint, the
    burgers were excellent! can't fault it
    whitespace tokenizer
    I really enjoyed Jeff's Burger
    Joint, the were
    can't fault it

    View Slide

  109. burgers
    I really enjoyed Jeff's Burger Joint, the
    burgers were excellent! can't fault it
    letter tokenizer
    I really enjoyed Jeff Burger
    Joint the were
    excellent can fault it
    s
    t

    View Slide

  110. burgers
    I really enjoyed Jeff's Burger Joint, the
    burgers were excellent! can't fault it
    standard tokenizer + lowercase filter
    i really enjoyed jeff's burger
    joint the were
    excellent can't fault it

    View Slide

  111. burgers
    I really enjoyed Jeff's Burger Joint, the
    burgers were excellent! can't fault it
    standard tokenizer + lowercase filter
    + stop filter
    i really enjoyed jeff's burger
    joint were
    excellent can't fault

    View Slide

  112. burger
    I really enjoyed Jeff's Burger Joint, the
    burgers were excellent! can't fault it
    standard tokenizer + lowercase filter
    + stop filter + stemmer filter
    i realli enjoi jeff burger
    joint were
    excel can't fault

    View Slide

  113. curl -XPUT 'http://localhost:9200/eatly/' -d '{
    "settings" : {
    "index" : {
    "analysis" : {
    "analyzer" : {
    "snowball_analyzer" : {
    "type" : "snowball"
    }
    }
    }
    }
    }
    }'

    View Slide

  114. curl -XPUT '.../eatly/eateries/_mapping' -d '{
    "eateries" : {
    "properties" : {
    "name" : {"type" : "string", "boost" : "1.5"},
    "desc" : {
    "type" : "string",
    "analyzer": "snowball_analyzer"
    }
    }
    }
    }'

    View Slide

  115. curl -XPUT 'http://localhost:9200/eatly/' -d '{
    "settings" : {
    "index" : {
    "analysis" : {
    "analyzer" : {
    "custom_analyzer" : {
    "type" : "custom",
    "tokenizer" : "lowercase"
    "filter" : ["stop", "extra_stop"]
    }
    },
    "filter" : {
    "extra_stop":{
    "type" : "stop",
    "stopwords" : ["food"]
    }
    }
    }
    }
    }
    }'

    View Slide

  116. curl -XPUT '.../eatly/eateries/_mapping' -d '{
    "eateries" : {
    "properties" : {
    "name" : {"type" : "string", "boost" : "1.5"},
    "desc" : {
    "type" : "string",
    "analyzer": "custom_analyzer"
    }
    }
    }
    }'

    View Slide

  117. query DSL

    View Slide

  118. curl -XGET '.../eatly/eateries/_search' -d '{
    "query" : {
    "match" : {
    "name" : "burger"
    }
    }
    }'

    View Slide

  119. curl -XGET '.../eatly/eateries/_search' -d '{
    "query" : {
    "match" : {
    "name" : "burger bar"
    }
    }
    }'

    View Slide

  120. curl -XGET '.../eatly/eateries/_search' -d '{
    "query" : {
    "match" : {
    "name" : {
    "query" : "burger bar",
    "operator" : "and"
    }
    }
    }
    }'

    View Slide

  121. curl -XGET '.../eatly/eateries/_search' -d '{
    "query" : {
    "multi_match" : {
    "query" : "burger bar",
    "fields" : ["name","desc","reviews.review"]
    }
    }
    }'

    View Slide

  122. curl -XGET '.../eatly/eateries/_search' -d '{
    "query" : {
    "multi_match" : {
    "query" : "burger bar",
    "fields" : ["name^2","desc","reviews.review"]
    }
    }
    }'

    View Slide

  123. filters

    View Slide

  124. Given I am a hungry person
    When I start searching for somewhere to eat
    Then I should be able to filter my results

    View Slide

  125. curl -XPUT 'http://localhost:9200/eatly/eateries/1' -d '{
    "name" : "Jeff''s Burger Joint",
    "desc" : "Blah Blah dirty burgers...",
    "tags" : ["greasy", "retro"]
    }'

    View Slide

  126. curl -XGET '.../eatly/eateries/_search' -d '{
    "query" : {
    "multi_match" : {
    "query" : "burger bar",
    "fields" : ["name^2", "desc", "reviews.review"]
    }
    },
    "filter" : {
    "not" : {
    "terms" : { "tags" : ["greasy", "meaty"]}
    }
    }
    }'

    View Slide

  127. query
    or filter?

    View Slide

  128. Given I am a hungry person
    When I search for somewhere to eat near me
    Then I should see results close to my location

    View Slide

  129. curl -XPUT '.../eatly/eateries/_mapping' -d '{
    "eateries" : {
    "properties" : {
    "name" : {"type" : "string", "boost" : "1.5"},
    "desc" : {"type" : "string"}
    "reviews" : {
    "properties" : {
    "review" : {"type" : "string"},
    "reviewer" : {"type" : "string"}
    }
    },
    "location" : {
    "type" : "geo_point"
    }
    }
    }
    }'

    View Slide

  130. curl -XPOST 'http://localhost:9200/eatly/eateries/' -d '{
    "name" : "Jeff''s Burgers",
    "desc" : "Blah Blah dirty burgers...",
    "reviews" : [
    {
    "review" : "Yadda, yadda, yadda",
    "reviewer" : "John Smith"
    },
    {
    "review" : "na na na",
    "reviewer" : "Billy Badger"
    }
    ],
    "location" : {
    "lat" : 41.12,
    "lon" : -71.34
    }
    }'

    View Slide

  131. curl -XGET '.../eatly/eateries/_search' -d '{
    "sort" : [
    {
    "_geo_distance" : {
    "eatery.location" : [-40, 70],
    "order" : "asc",
    "unit" : "miles"
    }
    }
    ],
    "query" : {
    "multi_match" : {
    "query" : "burger bar",
    "fields" : ["name^2","desc","reviews.review"]
    }
    }
    }'

    View Slide

  132. sense (chrome extension)

    View Slide

  133. head (elasticsearch plugin)

    View Slide

  134. elastica

    View Slide

  135. sherlock
    elasticsearch
    elasticsearch-php (official)

    View Slide

  136. {
    "require": {
    "ruflin/elastica": "v0.90.2.0"
    }
    }

    View Slide

  137. $client = new \Elastica\Client();

    View Slide

  138. $index = $client->getIndex("eatly");
    $type = $index->getType("eateries");

    View Slide

  139. $type->addDocument(
    new \Elastica\Document(
    null,
    array(
    "name" => "Jeff's Burger Joint",
    "desc" => "Blah Blah..."
    )
    )
    );

    View Slide

  140. $type->addDocument(
    new \Elastica\Document(
    $id,
    array(
    "id" => $id,
    "name" => "Jeff's Burger Joint",
    "desc" => "Blah Blah..."
    )
    )
    );

    View Slide

  141. $type->addDocuments($documents);
    $type->getDocument($id);
    $type->deleteId($id);
    $type->deleteDocument($document);
    $type->delete();

    View Slide

  142. $type->search("burger");
    $type->search("burger", 10);
    $type->count("burger");

    View Slide

  143. $multiMatchQuery = new \Elastica\Query\MultiMatch();
    $multiMatchQuery
    ->setQuery("burgers bar")
    ->setFields(
    array("name^2", "desc", "reviews.review")
    )
    ;
    $query = new \Elastica\Query($multiMatchQuery);
    $query->setFilter(
    new \Elastica\Filter\BoolNot(
    new \Elastica\Filter\Terms(
    "tags",
    array("greasy", "meaty")
    )
    )
    );
    $results = $type->search($query);

    View Slide

  144. $mapping = new \Elastica\Type\Mapping();
    $mapping->setType($type);
    $mapping->setProperties(
    array(
    "name" => array(
    "type" => "string"
    ),
    "desc" => array(
    "type" => "string",
    "boost" => 1.5
    )
    )
    );
    $mapping->send();

    View Slide

  145. Given I am a hungry person
    When I search for somewhere to eat
    Then I should see relevant results
    And the results should be ordered by relevance

    View Slide

  146. Given I am a hungry person
    When I search for somewhere to eat
    Then I should see results for relevant reviews

    View Slide

  147. Given I am a hungry person
    When I search for somewhere to eat
    Then I should see results for similar words

    View Slide

  148. Given I am a hungry person
    When I search for somewhere to eat near me
    Then I should see results close to my location

    View Slide

  149. Given I am a hungry person
    When I start searching for somewhere to eat
    Then I should be able to filter my results
    ?

    View Slide

  150. $facet = new \Elastica\Facet\Terms("tags");
    $facet->setField("tags");
    $facet->setSize(10);
    $facet->setOrder("reverse_count");
    $query->addFacet($facet);

    View Slide

  151. $facets = $resultSet->getFacets();
    foreach ($facets["tags"]["terms"] as $facet)
    {
    printf(
    "%s: %s\n",
    $facet["term"],
    $facet["count"]
    );
    }
    thai: 4
    italian: 8
    greek:4

    View Slide

  152. Given I am a hungry person
    When I start view details of somewhere to eat
    Then I should see suggestions of similar eateries

    View Slide

  153. $type->moreLikeThis($document);

    View Slide

  154. $type->moreLikeThis(
    new \Elastica\Document($id)
    );

    View Slide

  155. Given I am a restaurateur
    When I upload my PDF menu
    Then hungry people should be able to search it

    View Slide

  156. bin/plugin -install \
    elasticsearch/elasticsearch-mapper-attachments/1.9.0

    View Slide

  157. $propertyMapping = array(
    //...
    "menu" => array("type" => "attachment"),
    )

    View Slide

  158. $document->addFile('menu', $filepath);

    View Slide

  159. Given I am a hungry person
    When I typo entering my search terms
    Then I should see “did you mean?” suggestions

    View Slide

  160. $query = new \Elastica\Query(
    array(
    "query" => array(...),
    "suggest" => array(
    "check1" => array(
    "text" => "berger",
    "term" => array(
    "field" => "name"
    )
    )
    )
    )
    );

    View Slide

  161. $data = $results->getResponse()->getData();
    print_r($data["suggest"]);

    View Slide

  162. Array
    (
    [check1] => Array
    (
    [0] => Array
    (
    [text] => berger
    [offset] => 0
    [length] => 6
    [options] => Array
    (
    [0] => Array
    (
    [text] => burger
    [score] => 0.8333333
    [freq] => 1
    )
    )
    )
    )
    )

    View Slide

  163. Given I am a hungry person
    When I start entering my search terms
    Then I should see suggestions straight away

    View Slide

  164. $mapping->setProperties(
    array(
    "name" => array(
    "type" => "string"
    ),
    "desc" => array(
    "type" => "string",
    "boost" => 1.5
    ),
    "name_suggest" => array(
    "type" => "completion"
    )
    )
    );

    View Slide

  165. new \Elastica\Document(
    $id,
    array(
    "id" => $id,
    "name" => "Jeff's Burger Joint",
    "name_suggest" => "Jeff's Burger Joint",
    "desc" => "Blah Blah..."
    )
    );

    View Slide

  166. $query = array(
    "eateries" => array(
    "text" => 'j',
    "completion" => array(
    "field" => "name_suggest",
    )
    )
    );

    View Slide

  167. $response = $index->request(
    "_suggest",
    \Elastica\Request::GET,
    $query
    );
    $data = $response->getData();
    print_r($data["eateries"]);

    View Slide

  168. Array
    (
    [0] => Array
    (
    [text] => j
    [offset] => 0
    [length] => 1
    [options] => Array
    (
    [0] => Array
    (
    [text] => Jeffs Burger Joint
    [score] => 1
    )
    )
    )
    )

    View Slide

  169. scripts

    View Slide

  170. percolator
    scripts

    View Slide

  171. percolator
    river
    scripts

    View Slide

  172. data analysis
    percolator
    river
    scripts

    View Slide

  173. data analysis
    percolator
    logstash
    river
    scripts

    View Slide

  174. thank you!
    questions?
    @mr_r_miller

    View Slide