Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Elasticsearch

 Elasticsearch

Find all the things! An introduction to Elasticsearch and what it can do for your applications. Learn all about searching, boosting, percolating and aggregating your data.

http://www.phpconference.nl/

Alexander

June 28, 2014
Tweet

More Decks by Alexander

Other Decks in Programming

Transcript

  1. $ wget https://download../.../elasticsearch-1.2.1.tar.gz $ tar xf elasticsearch-1.2.1.tar.gz $ cd elasticsearch-1.2.1/

    $ bin/elasticsearch https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-1.2.1.tar.gz
  2. Relational database Elasticsearch database ⇒ index table ⇒ type row

    ⇒ document column ⇒ field schema ⇒ mapping index ⇒ (everything is indexed) SQL ⇒ query DSL
  3. $ curl -XPUT localhost:9200/social/tweet/42 -d '{ "text": "#dpc14 is awesome!"

    }'; $ curl -XGET localhost:9200/social/tweet/42 $ curl -XDELETE localhost:9200/social/tweet/42
  4. { "took" : 3, "timed_out" : false, "_shards" : {

    "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "max_score" : 1, "hits" : [ { "_score" : 1, "_id" : "42", "_source" : { "text" : "#dpc14 is awesome!" }, "_type" : "tweet", "_index" : "social" } ], "total" : 1 } }
  5. cakedc search liip search bundle symfony cmf search bundle elasticsearch

    elasticsearch doctrine orm doctrine dbal Tokenize it
  6. Document { "tweet": "PHP is GREAT", "posted": "2014-06-28", "user": {

    "name": "Alexander", "nick": "iam_asm89" }, "tags": ["php", "opinion"], "retweets": 42 }
  7. Fields { "tweet": "PHP is GREAT", "posted": "2014-06-28", "user": {

    "name": "Alexander", "nick": "iam_asm89" }, "tags": ["php", "opinion"], "retweets": 42 }
  8. Values { "tweet": "PHP is GREAT", "posted": "2014-06-28", "user": {

    "name": "Alexander", "nick": "iam_asm89" }, "tags": ["php", "opinion"], "retweets": 42 }
  9. Field types { # object "tweet": "PHP is GREAT", #

    string "posted": "2014-06-28", # string "user": { # nested object "name": "Alexander", # string "nick": "iam_asm89" # string }, "tags": ["php", "opinion"], # array "retweets": 42 # integer }
  10. Nested objects are flattened { "tweet": "PHP is GREAT", "posted":

    "2014-06-28", "user": { "name": "Alexander", "nick": "iam_asm89" }, "tags": ["php", "opinion"], "retweets": 42 }
  11. Nested objects are flattened { "tweet": "PHP is GREAT", "posted":

    "2014-06-28", "user.name": "Alexander", "user.nick": "iam_asm89", "tags": ["php", "opinion"], "retweets": 42 }
  12. Values are analyzed into terms { "tweet": "PHP is GREAT",

    "posted": "2014-06-28", "user.name": "Alexander", "user.nick": "iam_asm89", "tags": ["php", "opinion"], "retweets": 42 }
  13. Values are analyzed into terms { "tweet": ['php', 'great'], "posted":

    [Date(2014-06-28)], "user.name": ['alexander'], "user.nick": ['iam', 'asm89'], "tags": ['php', 'opinion'], "retweets": [42] }
  14. $ curl -XPUT localhost:9200/social/tweet/_mapping -d '{ "tweet" : { "properties"

    : { "tweet" : {"type" : "string" }, "created" : {"type" : "date" } } } }'
  15. Mapping • CAN add to an existing mapping • CAN

    NOT change the mapping for a field
  16. Field types • String / integer / long / float

    / double / boolean / null • Date • Arrays • IP • Geo point • Geo shape
  17. Analyzers • Standard • Simple • Whitespace • Stop •

    Keyword • Pattern • Language • Snowball • Custom http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-analyzers.html
  18. Token filters • standard • ascii folding • length •

    lowercase • uppercase • ngram • edge ngram • porter stem • shingle • stop • word delimiter • stemmer • keyword marker • snowball • phonetic • synonym • compound word • reverse • truncate • unique • pattern replace • trim • hunspell • normalization http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-tokenfilters.html
  19. Percolator • Search backwards • Register a query • Get

    notified when a new document comes in that matches the query
  20. • More primary shards – faster indexing – scalability •

    More replicas – faster searching – more failover