Upgrade to Pro — share decks privately, control downloads, hide ads and more …

ElasticSearch {r}Evolution. Welcome.

ElasticSearch {r}Evolution. Welcome.

ElasticSearch is quickly becoming one of the primary contenders in the search space: it is distributed, highly available, fast, RESTful, and ready to be plugged into Web applications. Its developers have been busy in the last year; this talk will do a quick introduction to ElasticSearch and cover some of the most interesting and exciting new features.

Andrei Zmievski

February 29, 2012
Tweet

More Decks by Andrei Zmievski

Other Decks in Technology

Transcript

  1. who am i? curl http://localhost:9200/speaker/info/andrei {“name”: “Andrei Zmievski”, “works”: “AppDynamics”,

    “projects”: [“PHP”, “PHP-GTK”, “Smarty”, “Unicode/i18n”], “likes”: [“coding”, “beer”, “brewing”, “photography”], “twitter”: “@a”, “email”: “[email protected]”} Wednesday, February 29, 12
  2. what is elasticsearch? a search engine for the NoSQL generation

    domain-driven document-oriented distributed RESTful Lucene-based engine Wednesday, February 29, 12
  3. what has happened? A year ago was at 0.15.0 Just

    released 0.19.0RC3 Continuous progress, lots of new features, improved stability, and more Increasing adoption, small and big companies No cloud hosting option yet, but maybe soon Wednesday, February 29, 12
  4. API conventions append ?pretty=true to get readable JSON boolean values:

    false/0/off = false, rest is true JSONP support via callback parameter Wednesday, February 29, 12
  5. API structure http://host:port/[index]/[type]/[_action/id] GET http://es:9200/twitter/_status GET http://es:9200/twitter/tweet/1 GET http://es:9200/twitter/tweet/_search GET

    http://es:9200/twitter/tweet,user/_search GET http://es:9200/twitter,facebook/_search GET http://es:9200/_search Wednesday, February 29, 12
  6. API query example {        "query":  {  

                 "filtered":  {                        "query":  {                                "query_string":  {                                        "query":  "foo  bar",                                        "default_operator":  "AND",                                        "fields":  ["title",  "description"],                                        "boost":  2.0                                }                        },                        "filter":  {                                "range":  {"date":  {"gt":  "2012-­‐02-­‐09"}}                        }                }        },        "from:  10,        "size":  10 } Wednesday, February 29, 12
  7. 1. index curl  -­‐XPOST  http://localhost:9200/conf/speaker/1  -­‐d' {      

     "name":  "Andrei  Zmievski",        "talk":  "ElasticSearch  Revolution.  Welcome.",        "likes":  ["coding",  "beer",  "photography"],        "twitter":  "a",        "height":  187 }' request {        "ok":true        "_index":"conf"        "_type":"speaker"        "_id":"1" } response Wednesday, February 29, 12
  8. 2. search curl  http://localhost:9200/conf/speaker/_search?q=beer request {  "took"  :  3,  

     "_shards"  :  {        "total"  :  1,        "successful"  :  1,        "failed"  :  0    },    "hits"  :  {        "total"  :  1,        "max_score"  :  0.5908709,        "hits"  :  [  {            "_index"  :  "conf",            "_type"  :  "speaker",            "_id"  :  "1",            "_score"  :  0.5908709,            "_source"  :   {        "name":  "Andrei  Zmievski",        "lives":  "San  Francisco",        "likes":  ["coding",  "beer",  "photography"],        "twitter":  "a",        "height":  187 }  }  ]  }  } response Wednesday, February 29, 12
  9. 2. search curl  http://localhost:9200/conf/speaker/_search?q=beer request {  "took"  :  3,  

     "_shards"  :  {        "total"  :  1,        "successful"  :  1,        "failed"  :  0    },    "hits"  :  {        "total"  :  1,        "max_score"  :  0.5908709,        "hits"  :  [  {            "_index"  :  "conf",            "_type"  :  "speaker",            "_id"  :  "1",            "_score"  :  0.5908709,            "_source"  :   {        "name":  "Andrei  Zmievski",        "lives":  "San  Francisco",        "likes":  ["coding",  "beer",  "photography"],        "twitter":  "a",        "height":  187 }  }  ]  }  } response total number of hits Wednesday, February 29, 12
  10. 2. search curl  http://localhost:9200/conf/speaker/_search?q=beer request {  "took"  :  3,  

     "_shards"  :  {        "total"  :  1,        "successful"  :  1,        "failed"  :  0    },    "hits"  :  {        "total"  :  1,        "max_score"  :  0.5908709,        "hits"  :  [  {            "_index"  :  "conf",            "_type"  :  "speaker",            "_id"  :  "1",            "_score"  :  0.5908709,            "_source"  :   {        "name":  "Andrei  Zmievski",        "lives":  "San  Francisco",        "likes":  ["coding",  "beer",  "photography"],        "twitter":  "a",        "height":  187 }  }  ]  }  } response the index of the doc Wednesday, February 29, 12
  11. 2. search curl  http://localhost:9200/conf/speaker/_search?q=beer request {  "took"  :  3,  

     "_shards"  :  {        "total"  :  1,        "successful"  :  1,        "failed"  :  0    },    "hits"  :  {        "total"  :  1,        "max_score"  :  0.5908709,        "hits"  :  [  {            "_index"  :  "conf",            "_type"  :  "speaker",            "_id"  :  "1",            "_score"  :  0.5908709,            "_source"  :   {        "name":  "Andrei  Zmievski",        "lives":  "San  Francisco",        "likes":  ["coding",  "beer",  "photography"],        "twitter":  "a",        "height":  187 }  }  ]  }  } response the type of the doc Wednesday, February 29, 12
  12. 2. search curl  http://localhost:9200/conf/speaker/_search?q=beer request {  "took"  :  3,  

     "_shards"  :  {        "total"  :  1,        "successful"  :  1,        "failed"  :  0    },    "hits"  :  {        "total"  :  1,        "max_score"  :  0.5908709,        "hits"  :  [  {            "_index"  :  "conf",            "_type"  :  "speaker",            "_id"  :  "1",            "_score"  :  0.5908709,            "_source"  :   {        "name":  "Andrei  Zmievski",        "lives":  "San  Francisco",        "likes":  ["coding",  "beer",  "photography"],        "twitter":  "a",        "height":  187 }  }  ]  }  } response the id of the doc Wednesday, February 29, 12
  13. 2. search curl  http://localhost:9200/conf/speaker/_search?q=beer request {  "took"  :  3,  

     "_shards"  :  {        "total"  :  1,        "successful"  :  1,        "failed"  :  0    },    "hits"  :  {        "total"  :  1,        "max_score"  :  0.5908709,        "hits"  :  [  {            "_index"  :  "conf",            "_type"  :  "speaker",            "_id"  :  "1",            "_score"  :  0.5908709,            "_source"  :   {        "name":  "Andrei  Zmievski",        "lives":  "San  Francisco",        "likes":  ["coding",  "beer",  "photography"],        "twitter":  "a",        "height":  187 }  }  ]  }  } response the hit score Wednesday, February 29, 12
  14. 2. search curl  http://localhost:9200/conf/speaker/_search?q=beer request {  "took"  :  3,  

     "_shards"  :  {        "total"  :  1,        "successful"  :  1,        "failed"  :  0    },    "hits"  :  {        "total"  :  1,        "max_score"  :  0.5908709,        "hits"  :  [  {            "_index"  :  "conf",            "_type"  :  "speaker",            "_id"  :  "1",            "_score"  :  0.5908709,            "_source"  :   {        "name":  "Andrei  Zmievski",        "lives":  "San  Francisco",        "likes":  ["coding",  "beer",  "photography"],        "twitter":  "a",        "height":  187 }  }  ]  }  } response the original doc contents Wednesday, February 29, 12
  15. 2. search curl  http://localhost:9200/conf/speaker/_search?q=beer request {  "took"  :  3,  

     "_shards"  :  {        "total"  :  1,        "successful"  :  1,        "failed"  :  0    },    "hits"  :  {        "total"  :  1,        "max_score"  :  0.5908709,        "hits"  :  [  {            "_index"  :  "conf",            "_type"  :  "speaker",            "_id"  :  "1",            "_score"  :  0.5908709,            "_source"  :   {        "name":  "Andrei  Zmievski",        "lives":  "San  Francisco",        "likes":  ["coding",  "beer",  "photography"],        "twitter":  "a",        "height":  187 }  }  ]  }  } response the execution time Wednesday, February 29, 12
  16. transactional model write consistency is per-document uses write-ahead transaction log

    1 second index refresh rate by default Wednesday, February 29, 12
  17. storage node data considered transient can be stored in local

    file system, JVM heap, native OS memory, or FS & memory combination gateway is a persistent storage mechanism local, shared FS, HDFS, S3 Wednesday, February 29, 12
  18. mapping describes document structure automatically created with sensible defaults, but

    can be overridden per field many field types: string, integer/long, float/double, boolean, date, geo, array, object, and more Wednesday, February 29, 12
  19. sample mapping {"user":            "derick",  "title":

             "Don’t  Panic",  "tags":            ["profiling",  "debugging",  "php"],  "postDate":    "2010-­‐12-­‐22T17:14:12",  "priority":    2} document {"post":  {    "properties"  :  {        "user":            {"type":  "string",  "index":  "not_analyzed"},        "message":      {"type":  "string",  “boost”:  1.5},        "tags":            {"type":  "string",  "include_in_all":  "no"},        "postDate"  :  {"type"  :  "date",  “store”:  “no”},        "priority"  :  {"type"  :  "integer"} }}} mapping Wednesday, February 29, 12
  20. sample mapping {"user":            "derick",  "title":

             "Don’t  Panic",  "tags":            ["profiling",  "debugging",  "php"],  "postDate":    "2010-­‐12-­‐22T17:14:12",  "priority":    2} document {"post":  {    "properties"  :  {        "user":            {"type":  "string",  "index":  "not_analyzed"},        "message":      {"type":  "string",  “boost”:  1.5},        "tags":            {"type":  "string",  "include_in_all":  "no"},        "postDate"  :  {"type"  :  "date",  “store”:  “no”},        "priority"  :  {"type"  :  "integer"}  not  really  needed }}} mapping Wednesday, February 29, 12
  21. analyzers break down (tokenize) and normalize fields during indexing and

    query strings at search time index:    analysis:        analyzer:            eulang:                type:  custom                tokenizer:  standard                filter:  [standard,  lowercase,  stop,                                  asciifolding,  porterStem] elasticsearch.yml … "title":  {"type":  "string",  "analyzer":  "eulang"}, … mapping Wednesday, February 29, 12
  22. filters share some similar features with queries apply to the

    result of the query why use a filter? Wednesday, February 29, 12
  23. filters faster than queries cached (depends on the filter) the

    cache is used for different queries against the same filter no scoring more useful ones: term, terms, range, prefix, and, or, not, exists, missing, query Wednesday, February 29, 12
  24. facets provide aggregated data based on the search request usual

    purpose is to offer a faceted navigation, or faceted search (EBay and more) facet types: terms, histogram, date histogram, range, statistical, and more Wednesday, February 29, 12
  25. rivers pluggable service running within the cluster pulls in data

    from external sources and indexes it automatic failover current: Twitter, MongoDB, CouchDB, RabbitMQ, RSS, Wikipedia Wednesday, February 29, 12
  26. percolator turns searching on its head search: index docs and

    run queries for matches percolator: index queries and run docs for matches great feature for notification/triggers implementation Wednesday, February 29, 12
  27. index aliases each index name can have one or more

    aliases atomic renames allow on-the-fly index switching actual index: tweets{date} alias: tweets on update, create new index and switch alias Wednesday, February 29, 12
  28. filtered index aliases allows creation of “views” into an index

    associates a filter with an alias curl  -­‐XPOST  http://localhost:9200/_aliases  -­‐d' {        "actions"  :  [                {                        "add"  :  {                                  "index"  :  "posts",                                  "alias"  :  "posts_by_andrei",                                  "filter"  :  {  "term"  :  {  "user"  :  "andrei"  }  }                        }                }        ] }' filtered alias Wednesday, February 29, 12
  29. parent/child docs _parent field in mapping establishes relationship between doc

    types, e.g. comment and post used with has_child and top_children queries Wednesday, February 29, 12
  30. plugins add custom functionality to ES written in Java installable

    from GitHub custom mapping types, scripting language support, custom discovery, admin tools, and more Wednesday, February 29, 12
  31. interfaces REST Java / Groovy clients/integration: Python, PHP, Ruby, Perl,

    Erlang, Django, Drupal, Symfony2, CouchDB, Flume, Flume sink implementation Wednesday, February 29, 12
  32. data import ES is not the primary data store (usually)

    to import/synchronize data: write an agent (Gearman, message queues, etc) use a river plugin (CouchDB, RabbitMQ, Twitter) Wednesday, February 29, 12