ElasticSearch {r}Evolution. Welcome. [DPC12]

ElasticSearch {r}Evolution. Welcome. [DPC12]

ElasticSearch is quickly becoming one of the primary contenders in the search space: it is distributed, highly available, fast, RESTful, and ready to be plugged into Web applications. Its developers have been busy in the last year; this talk will do a quick introduction to ElasticSearch and cover some of the most interesting and exciting new features. We might even take down a live server or two to illustrate a point.

Aa4af19d5034741a0864f0f0738800f2?s=128

Andrei Zmievski

June 08, 2012
Tweet

Transcript

  1. elasticsearch {r}evolution welcome. Andrei Zmievski • DPC • June 8,

    2012
  2. TRUTH

  3. who am i? curl http://localhost:9200/speaker/info/andrei {“name”: “Andrei Zmievski”, “works”: “AppDynamics”,

    “projects”: [“PHP”, “PHP-GTK”, “Smarty”, “Unicode/i18n”], “likes”: [“coding”, “beer”, “brewing”, “photography”], “twitter”: “@a”, “email”: “andrei@zmievski.org”}
  4. what is elasticsearch? a search engine for the NoSQL generation

    domain-driven document-oriented distributed RESTful Lucene-based engine
  5. what has happened? A year ago was at 0.15.0 Just

    released 0.19.4 Continuous progress, lots of new features, improved stability, and more Increasing adoption, small and big companies No cloud hosting option yet, but maybe soon
  6. API conventions append ?pretty=true to get readable JSON boolean values:

    false/0/off = false, rest is true JSONP support via callback parameter
  7. API structure http://host:port/[index]/[type]/[_action/id] GET http://es:9200/twitter/_status GET http://es:9200/twitter/tweet/1 GET http://es:9200/twitter/tweet/_search GET

    http://es:9200/twitter/tweet,user/_search GET http://es:9200/twitter,facebook/_search GET http://es:9200/_search
  8. API query example {        "query":  {  

                 "filtered":  {                        "query":  {                                "query_string":  {                                        "query":  "foo  bar",                                        "default_operator":  "AND",                                        "fields":  ["title",  "description"],                                        "boost":  2.0                                }                        },                        "filter":  {                                "range":  {"date":  {"gt":  "2012-­‐02-­‐09"}}                        }                }        },        "from:  10,        "size":  10 }
  9. 3 easy steps

  10. 1. index curl  -­‐XPOST  http://localhost:9200/conf/speaker/1  -­‐d' {      

     "name":  "Andrei  Zmievski",        "talk":  "ElasticSearch  Revolution.  Welcome.",        "likes":  ["coding",  "beer",  "photography"],        "twitter":  "a",        "height":  187 }' request {        "ok":true        "_index":"conf"        "_type":"speaker"        "_id":"1" } response
  11. 2. search curl  http://localhost:9200/conf/speaker/_search?q=beer request {  "took"  :  3,  

     "_shards"  :  {        "total"  :  1,        "successful"  :  1,        "failed"  :  0    },    "hits"  :  {        "total"  :  1,        "max_score"  :  0.5908709,        "hits"  :  [  {            "_index"  :  "conf",            "_type"  :  "speaker",            "_id"  :  "1",            "_score"  :  0.5908709,            "_source"  :   {        "name":  "Andrei  Zmievski",        "lives":  "San  Francisco",        "likes":  ["coding",  "beer",  "photography"],        "twitter":  "a",        "height":  187 }  }  ]  }  } response
  12. 2. search curl  http://localhost:9200/conf/speaker/_search?q=beer request {  "took"  :  3,  

     "_shards"  :  {        "total"  :  1,        "successful"  :  1,        "failed"  :  0    },    "hits"  :  {        "total"  :  1,        "max_score"  :  0.5908709,        "hits"  :  [  {            "_index"  :  "conf",            "_type"  :  "speaker",            "_id"  :  "1",            "_score"  :  0.5908709,            "_source"  :   {        "name":  "Andrei  Zmievski",        "lives":  "San  Francisco",        "likes":  ["coding",  "beer",  "photography"],        "twitter":  "a",        "height":  187 }  }  ]  }  } response total number of hits
  13. 2. search curl  http://localhost:9200/conf/speaker/_search?q=beer request {  "took"  :  3,  

     "_shards"  :  {        "total"  :  1,        "successful"  :  1,        "failed"  :  0    },    "hits"  :  {        "total"  :  1,        "max_score"  :  0.5908709,        "hits"  :  [  {            "_index"  :  "conf",            "_type"  :  "speaker",            "_id"  :  "1",            "_score"  :  0.5908709,            "_source"  :   {        "name":  "Andrei  Zmievski",        "lives":  "San  Francisco",        "likes":  ["coding",  "beer",  "photography"],        "twitter":  "a",        "height":  187 }  }  ]  }  } response the index of the doc
  14. 2. search curl  http://localhost:9200/conf/speaker/_search?q=beer request {  "took"  :  3,  

     "_shards"  :  {        "total"  :  1,        "successful"  :  1,        "failed"  :  0    },    "hits"  :  {        "total"  :  1,        "max_score"  :  0.5908709,        "hits"  :  [  {            "_index"  :  "conf",            "_type"  :  "speaker",            "_id"  :  "1",            "_score"  :  0.5908709,            "_source"  :   {        "name":  "Andrei  Zmievski",        "lives":  "San  Francisco",        "likes":  ["coding",  "beer",  "photography"],        "twitter":  "a",        "height":  187 }  }  ]  }  } response the type of the doc
  15. 2. search curl  http://localhost:9200/conf/speaker/_search?q=beer request {  "took"  :  3,  

     "_shards"  :  {        "total"  :  1,        "successful"  :  1,        "failed"  :  0    },    "hits"  :  {        "total"  :  1,        "max_score"  :  0.5908709,        "hits"  :  [  {            "_index"  :  "conf",            "_type"  :  "speaker",            "_id"  :  "1",            "_score"  :  0.5908709,            "_source"  :   {        "name":  "Andrei  Zmievski",        "lives":  "San  Francisco",        "likes":  ["coding",  "beer",  "photography"],        "twitter":  "a",        "height":  187 }  }  ]  }  } response the id of the doc
  16. 2. search curl  http://localhost:9200/conf/speaker/_search?q=beer request {  "took"  :  3,  

     "_shards"  :  {        "total"  :  1,        "successful"  :  1,        "failed"  :  0    },    "hits"  :  {        "total"  :  1,        "max_score"  :  0.5908709,        "hits"  :  [  {            "_index"  :  "conf",            "_type"  :  "speaker",            "_id"  :  "1",            "_score"  :  0.5908709,            "_source"  :   {        "name":  "Andrei  Zmievski",        "lives":  "San  Francisco",        "likes":  ["coding",  "beer",  "photography"],        "twitter":  "a",        "height":  187 }  }  ]  }  } response the hit score
  17. 2. search curl  http://localhost:9200/conf/speaker/_search?q=beer request {  "took"  :  3,  

     "_shards"  :  {        "total"  :  1,        "successful"  :  1,        "failed"  :  0    },    "hits"  :  {        "total"  :  1,        "max_score"  :  0.5908709,        "hits"  :  [  {            "_index"  :  "conf",            "_type"  :  "speaker",            "_id"  :  "1",            "_score"  :  0.5908709,            "_source"  :   {        "name":  "Andrei  Zmievski",        "lives":  "San  Francisco",        "likes":  ["coding",  "beer",  "photography"],        "twitter":  "a",        "height":  187 }  }  ]  }  } response the original doc contents
  18. 2. search curl  http://localhost:9200/conf/speaker/_search?q=beer request {  "took"  :  3,  

     "_shards"  :  {        "total"  :  1,        "successful"  :  1,        "failed"  :  0    },    "hits"  :  {        "total"  :  1,        "max_score"  :  0.5908709,        "hits"  :  [  {            "_index"  :  "conf",            "_type"  :  "speaker",            "_id"  :  "1",            "_score"  :  0.5908709,            "_source"  :   {        "name":  "Andrei  Zmievski",        "lives":  "San  Francisco",        "likes":  ["coding",  "beer",  "photography"],        "twitter":  "a",        "height":  187 }  }  ]  }  } response the execution time
  19. 3. profit that’s up to you

  20. demo

  21. distributed model built for performance and resiliency zero-conf discovery sharding/replication

    auto-routing
  22. replicas each shard can have 1 or more replicas #

    of replicas can be updated dynamically after index creation replicas can be used for querying in parallel
  23. shard allocation node 1 start with a single node

  24. shard allocation PUT /person { “index”: { “number_of_shards”: 2, “number_of_replicas”:

    1 }} node 1 person1 person2
  25. shard allocation node 1 person1 person2 node 2 person1 person2

    start the second node
  26. shard allocation node 1 node 2 node 3 node 4

    person1 person2 person1 person2 start 2 more nodes
  27. shard allocation node 1 node 2 node 3 node 4

    person1 person2 person1 person2 start 2 more nodes
  28. document sharding node 1 node 2 node 3 node 4

    person1 person2 person1 person2 PUT /person/info/1 { … }
  29. document sharding node 1 node 2 node 3 node 4

    person1 person2 person1 person2 hashed to shard 1 PUT /person/info/1 { … }
  30. document sharding node 1 node 2 node 3 node 4

    person1 person2 person1 person2 replicated PUT /person/info/1 { … }
  31. document sharding node 1 node 2 node 3 node 4

    person1 person2 person1 person2 PUT /person/info/2 { … }
  32. document sharding node 1 node 2 node 3 node 4

    person1 person2 person1 person2 hashed to shard 2 PUT /person/info/2 { … }
  33. document sharding node 1 node 2 node 3 node 4

    person1 person2 person1 person2 replicated PUT /person/info/2 { … }
  34. scatter-gather node 1 node 2 node 3 node 4 person1

    person2 person1 person2 GET /person/_search?q=name:thomas
  35. shard allocation node 1 node 2 node 3 node 4

    person1 person2 person1 person2 GET /person/_search?q=name:thomas
  36. shard allocation node 1 node 2 node 3 node 4

    person1 person2 person1 person2 GET /person/_search?q=name:thomas
  37. shard allocation node 1 node 2 node 3 node 4

    person1 person2 person1 person2 GET /person/_search?q=name:thomas
  38. transactional model write consistency is per-document uses write-ahead transaction log

    1 second index refresh rate by default
  39. storage node data considered transient can be stored in local

    file system, JVM heap, native OS memory, or FS & memory combination gateway is a persistent storage mechanism local, shared FS, HDFS, S3
  40. mapping describes document structure automatically created with sensible defaults, but

    can be overridden per field many field types: string, integer/long, float/double, boolean, date, geo, array, object, and more
  41. sample mapping {"user":            "derick",  "title":

             "Don’t  Panic",  "tags":            ["profiling",  "debugging",  "php"],  "postDate":    "2010-­‐12-­‐22T17:14:12",  "priority":    2} document {"post":  {    "properties"  :  {        "user":            {"type":  "string",  "index":  "not_analyzed"},        "message":      {"type":  "string",  “boost”:  1.5},        "tags":            {"type":  "string",  "include_in_all":  "no"},        "postDate"  :  {"type"  :  "date",  “store”:  “no”},        "priority"  :  {"type"  :  "integer"} }}} mapping
  42. sample mapping {"user":            "derick",  "title":

             "Don’t  Panic",  "tags":            ["profiling",  "debugging",  "php"],  "postDate":    "2010-­‐12-­‐22T17:14:12",  "priority":    2} document {"post":  {    "properties"  :  {        "user":            {"type":  "string",  "index":  "not_analyzed"},        "message":      {"type":  "string",  “boost”:  1.5},        "tags":            {"type":  "string",  "include_in_all":  "no"},        "postDate"  :  {"type"  :  "date",  “store”:  “no”},        "priority"  :  {"type"  :  "integer"}  not  really  needed }}} mapping
  43. analyzers break down (tokenize) and normalize fields during indexing and

    query strings at search time index:    analysis:        analyzer:            eulang:                type:  custom                tokenizer:  standard                filter:  [standard,  lowercase,  stop,                                  asciifolding,  porterStem] elasticsearch.yml … "title":  {"type":  "string",  "analyzer":  "eulang"}, … mapping
  44. filters share some similar features with queries apply to the

    result of the query why use a filter?
  45. filters faster than queries cached (depends on the filter) the

    cache is used for different queries against the same filter no scoring more useful ones: term, terms, range, prefix, and, or, not, exists, missing, query
  46. facets provide aggregated data based on the search request usual

    purpose is to offer a faceted navigation, or faceted search (EBay and more) facet types: terms, histogram, date histogram, range, statistical, and more
  47. rivers pluggable service running within the cluster pulls in data

    from external sources and indexes it automatic failover current: Twitter, MongoDB, CouchDB, RabbitMQ, RSS, Wikipedia
  48. percolator turns searching on its head search: index docs and

    run queries for matches percolator: index queries and run docs for matches great feature for notification/triggers implementation
  49. index aliases each index name can have one or more

    aliases atomic renames allow on-the-fly index switching actual index: tweets{date} alias: tweets on update, create new index and switch alias
  50. filtered index aliases allows creation of “views” into an index

    associates a filter with an alias curl  -­‐XPOST  http://localhost:9200/_aliases  -­‐d' {        "actions"  :  [                {                        "add"  :  {                                  "index"  :  "posts",                                  "alias"  :  "posts_by_andrei",                                  "filter"  :  {  "term"  :  {  "user"  :  "andrei"  }  }                        }                }        ] }' filtered alias
  51. parent/child docs _parent field in mapping establishes relationship between doc

    types, e.g. comment and post used with has_child and top_children queries
  52. geo search implemented as filters (and a facet) geo_distance geo_bounding_box

    geo_polygon
  53. plugins add custom functionality to ES written in Java installable

    from GitHub custom mapping types, scripting language support, custom discovery, admin tools, and more
  54. interfaces REST Java / Groovy clients/integration: Python, PHP, Ruby, Perl,

    Erlang, Django, Drupal, Symfony2, CouchDB, Flume, Flume sink implementation
  55. References http://github.com/elasticsearch/elasticsearch https://groups.google.com/group/elasticsearch IRC: #elasticsearch on irc.freenode.net twitter: @elasticsearch Useful

    tutorials: Query DSL Explained ElasticSearch on EC2
  56. Dank u wel! http://joind.in/6236