Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Elasticsearch

 Elasticsearch

Find all the things! An introduction to Elasticsearch and what it can do for your applications. Learn all about searching, boosting, percolating and aggregating your data.

http://www.phpconference.nl/

D07a7a143b14fc8309f9abb78d569344?s=128

Alexander

June 28, 2014
Tweet

More Decks by Alexander

Other Decks in Programming

Transcript

  1. elasticsearch

  2. asm89 / @iam_asm89

  3. None
  4. None
  5. Why do I need a search engine?

  6. None
  7. None
  8. Elasticsearch

  9. Elasticsearch • Schemaless document store • Distributed and horizontally scalable

    • Zero configuration setup • REST
  10. Unstructured search

  11. Aggregations

  12. Structured search

  13. Enrichment

  14. Sorting

  15. Pagination

  16. Suggestions

  17. Installation

  18. $ wget https://download../.../elasticsearch-1.2.1.tar.gz $ tar xf elasticsearch-1.2.1.tar.gz $ cd elasticsearch-1.2.1/

    $ bin/elasticsearch https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-1.2.1.tar.gz
  19. Terminology

  20. Relational database Elasticsearch database ⇒ index table ⇒ type row

    ⇒ document column ⇒ field schema ⇒ mapping index ⇒ (everything is indexed) SQL ⇒ query DSL
  21. APIs

  22. $ curl -XPUT localhost:9200/social/tweet/42 -d '{ "text": "#dpc14 is awesome!"

    }';
  23. $ curl -XPUT localhost:9200/social/tweet/42 -d '{ "text": "#dpc14 is awesome!"

    }'; $ curl -XGET localhost:9200/social/tweet/42 $ curl -XDELETE localhost:9200/social/tweet/42
  24. $ curl localhost:9200/_search?query=dpc

  25. $ curl localhost:9200/_search -d '{ "query": { "query_string": { "query":

    "dpc awesome" } } }'
  26. { "took" : 3, "timed_out" : false, "_shards" : {

    "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "max_score" : 1, "hits" : [ { "_score" : 1, "_id" : "42", "_source" : { "text" : "#dpc14 is awesome!" }, "_type" : "tweet", "_index" : "social" } ], "total" : 1 } }
  27. How does a search engine work?

  28. cakedc/search liip/search-bundle symfony-cmf/search-bundle elasticsearch/elasticsearch doctrine/orm doctrine/dbal Take some text

  29. cakedc search liip search bundle symfony cmf search bundle elasticsearch

    elasticsearch doctrine orm doctrine dbal Tokenize it
  30. bundle liip cakedc orm cmf search dbal elasticsearch doctrine symfony

    Find unique tokens
  31. search bundle doctrine cakedc/search liip/search-bundle symfony-cmf/search-bundle elasticsearch/elasticsearch doctrine/orm doctrine/dbal Link

    the terms and documents
  32. cakedc search liip search bundle symfony cmf search bundle elasticsearch

    elasticsearch doctrine orm doctrine dbal
  33. search bundle doctrine cakedc/search liip/search-bundle symfony-cmf/search-bundle elasticsearch/elasticsearch doctrine/orm doctrine/dbal Search

    for “bundle”
  34. search bundle doctrine cakedc/search liip/search-bundle symfony-cmf/search-bundle elasticsearch/elasticsearch doctrine/orm doctrine/dbal Search

    for “doctrine”
  35. What is stored in Elasticsearch?

  36. Document { "tweet": "PHP is GREAT", "posted": "2014-06-28", "user": {

    "name": "Alexander", "nick": "iam_asm89" }, "tags": ["php", "opinion"], "retweets": 42 }
  37. Fields { "tweet": "PHP is GREAT", "posted": "2014-06-28", "user": {

    "name": "Alexander", "nick": "iam_asm89" }, "tags": ["php", "opinion"], "retweets": 42 }
  38. Values { "tweet": "PHP is GREAT", "posted": "2014-06-28", "user": {

    "name": "Alexander", "nick": "iam_asm89" }, "tags": ["php", "opinion"], "retweets": 42 }
  39. Field types { # object "tweet": "PHP is GREAT", #

    string "posted": "2014-06-28", # string "user": { # nested object "name": "Alexander", # string "nick": "iam_asm89" # string }, "tags": ["php", "opinion"], # array "retweets": 42 # integer }
  40. Nested objects are flattened { "tweet": "PHP is GREAT", "posted":

    "2014-06-28", "user": { "name": "Alexander", "nick": "iam_asm89" }, "tags": ["php", "opinion"], "retweets": 42 }
  41. Nested objects are flattened { "tweet": "PHP is GREAT", "posted":

    "2014-06-28", "user.name": "Alexander", "user.nick": "iam_asm89", "tags": ["php", "opinion"], "retweets": 42 }
  42. Values are analyzed into terms { "tweet": "PHP is GREAT",

    "posted": "2014-06-28", "user.name": "Alexander", "user.nick": "iam_asm89", "tags": ["php", "opinion"], "retweets": 42 }
  43. Values are analyzed into terms { "tweet": ['php', 'great'], "posted":

    [Date(2014-06-28)], "user.name": ['alexander'], "user.nick": ['iam', 'asm89'], "tags": ['php', 'opinion'], "retweets": [42] }
  44. $ curl -XPUT localhost:9200/social/tweet/_mapping -d '{ "tweet" : { "properties"

    : { "tweet" : {"type" : "string" }, "created" : {"type" : "date" } } } }'
  45. Mapping • CAN add to an existing mapping • CAN

    NOT change the mapping for a field
  46. Field types • String / integer / long / float

    / double / boolean / null • Date • Arrays • IP • Geo point • Geo shape
  47. Index settings { "type": "string", "index": "analyzed" } # "Foo

    Bar" => ['foo', 'bar']
  48. Index settings { "type": "string", "index": "not_analyzed" } # "Foo

    Bar" => ['Foo Bar']
  49. Index settings { "type": "string", "index": "no" } # "Foo

    Bar" => []
  50. Index settings { "type": "string", "index": "no", "analyzer": "default" }

  51. Analyzers • Standard • Simple • Whitespace • Stop •

    Keyword • Pattern • Language • Snowball • Custom http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-analyzers.html
  52. Token filters • standard • ascii folding • length •

    lowercase • uppercase • ngram • edge ngram • porter stem • shingle • stop • word delimiter • stemmer • keyword marker • snowball • phonetic • synonym • compound word • reverse • truncate • unique • pattern replace • trim • hunspell • normalization http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-tokenfilters.html
  53. Aggregations

  54. Aggregations • Analytics histograms, distributions, statistics • Over a partition

    of your data • Can be composed
  55. None
  56. None
  57. None
  58. None
  59. None
  60. Percolator

  61. Percolator • Search backwards • Register a query • Get

    notified when a new document comes in that matches the query
  62. Percolator • Real-time search result updates • News alerts •

    Price monitoring • Logs monitoring
  63. Shards & clusters

  64. None
  65. • More primary shards – faster indexing – scalability •

    More replicas – faster searching – more failover
  66. • Auto-discovery • Single master • Immediate failover with master

    re-election Clustering
  67. Tools

  68. https://github.com/elasticsearch/elasticsearch-php https://github.com/ruflin/Elastica

  69. https://github.com/mobz/elasticsearch-head

  70. Logstash & Kibana

  71. None
  72. Try it!

  73. https://github.com/elasticsearch/elasticsearch http://elasticsearch.org #elasticsearch @ freenode IRC

  74. @iam_asm89 https://joind.in/10882