Slide 1

Slide 1 text

elasticsearch {r}evolution welcome. Andrei Zmievski • ConFoo • Feb 29, 2012 Wednesday, February 29, 12

Slide 2

Slide 2 text

who am i? curl http://localhost:9200/speaker/info/andrei {“name”: “Andrei Zmievski”, “works”: “AppDynamics”, “projects”: [“PHP”, “PHP-GTK”, “Smarty”, “Unicode/i18n”], “likes”: [“coding”, “beer”, “brewing”, “photography”], “twitter”: “@a”, “email”: “[email protected]”} Wednesday, February 29, 12

Slide 3

Slide 3 text

what is elasticsearch? a search engine for the NoSQL generation domain-driven document-oriented distributed RESTful Lucene-based engine Wednesday, February 29, 12

Slide 4

Slide 4 text

what has happened? A year ago was at 0.15.0 Just released 0.19.0RC3 Continuous progress, lots of new features, improved stability, and more Increasing adoption, small and big companies No cloud hosting option yet, but maybe soon Wednesday, February 29, 12

Slide 5

Slide 5 text

API conventions append ?pretty=true to get readable JSON boolean values: false/0/off = false, rest is true JSONP support via callback parameter Wednesday, February 29, 12

Slide 6

Slide 6 text

API structure http://host:port/[index]/[type]/[_action/id] GET http://es:9200/twitter/_status GET http://es:9200/twitter/tweet/1 GET http://es:9200/twitter/tweet/_search GET http://es:9200/twitter/tweet,user/_search GET http://es:9200/twitter,facebook/_search GET http://es:9200/_search Wednesday, February 29, 12

Slide 7

Slide 7 text

API query example {        "query":  {                "filtered":  {                        "query":  {                                "query_string":  {                                        "query":  "foo  bar",                                        "default_operator":  "AND",                                        "fields":  ["title",  "description"],                                        "boost":  2.0                                }                        },                        "filter":  {                                "range":  {"date":  {"gt":  "2012-­‐02-­‐09"}}                        }                }        },        "from:  10,        "size":  10 } Wednesday, February 29, 12

Slide 8

Slide 8 text

3 easy steps Wednesday, February 29, 12

Slide 9

Slide 9 text

1. index curl  -­‐XPOST  http://localhost:9200/conf/speaker/1  -­‐d' {        "name":  "Andrei  Zmievski",        "talk":  "ElasticSearch  Revolution.  Welcome.",        "likes":  ["coding",  "beer",  "photography"],        "twitter":  "a",        "height":  187 }' request {        "ok":true        "_index":"conf"        "_type":"speaker"        "_id":"1" } response Wednesday, February 29, 12

Slide 10

Slide 10 text

2. search curl  http://localhost:9200/conf/speaker/_search?q=beer request {  "took"  :  3,    "_shards"  :  {        "total"  :  1,        "successful"  :  1,        "failed"  :  0    },    "hits"  :  {        "total"  :  1,        "max_score"  :  0.5908709,        "hits"  :  [  {            "_index"  :  "conf",            "_type"  :  "speaker",            "_id"  :  "1",            "_score"  :  0.5908709,            "_source"  :   {        "name":  "Andrei  Zmievski",        "lives":  "San  Francisco",        "likes":  ["coding",  "beer",  "photography"],        "twitter":  "a",        "height":  187 }  }  ]  }  } response Wednesday, February 29, 12

Slide 11

Slide 11 text

2. search curl  http://localhost:9200/conf/speaker/_search?q=beer request {  "took"  :  3,    "_shards"  :  {        "total"  :  1,        "successful"  :  1,        "failed"  :  0    },    "hits"  :  {        "total"  :  1,        "max_score"  :  0.5908709,        "hits"  :  [  {            "_index"  :  "conf",            "_type"  :  "speaker",            "_id"  :  "1",            "_score"  :  0.5908709,            "_source"  :   {        "name":  "Andrei  Zmievski",        "lives":  "San  Francisco",        "likes":  ["coding",  "beer",  "photography"],        "twitter":  "a",        "height":  187 }  }  ]  }  } response total number of hits Wednesday, February 29, 12

Slide 12

Slide 12 text

2. search curl  http://localhost:9200/conf/speaker/_search?q=beer request {  "took"  :  3,    "_shards"  :  {        "total"  :  1,        "successful"  :  1,        "failed"  :  0    },    "hits"  :  {        "total"  :  1,        "max_score"  :  0.5908709,        "hits"  :  [  {            "_index"  :  "conf",            "_type"  :  "speaker",            "_id"  :  "1",            "_score"  :  0.5908709,            "_source"  :   {        "name":  "Andrei  Zmievski",        "lives":  "San  Francisco",        "likes":  ["coding",  "beer",  "photography"],        "twitter":  "a",        "height":  187 }  }  ]  }  } response the index of the doc Wednesday, February 29, 12

Slide 13

Slide 13 text

2. search curl  http://localhost:9200/conf/speaker/_search?q=beer request {  "took"  :  3,    "_shards"  :  {        "total"  :  1,        "successful"  :  1,        "failed"  :  0    },    "hits"  :  {        "total"  :  1,        "max_score"  :  0.5908709,        "hits"  :  [  {            "_index"  :  "conf",            "_type"  :  "speaker",            "_id"  :  "1",            "_score"  :  0.5908709,            "_source"  :   {        "name":  "Andrei  Zmievski",        "lives":  "San  Francisco",        "likes":  ["coding",  "beer",  "photography"],        "twitter":  "a",        "height":  187 }  }  ]  }  } response the type of the doc Wednesday, February 29, 12

Slide 14

Slide 14 text

2. search curl  http://localhost:9200/conf/speaker/_search?q=beer request {  "took"  :  3,    "_shards"  :  {        "total"  :  1,        "successful"  :  1,        "failed"  :  0    },    "hits"  :  {        "total"  :  1,        "max_score"  :  0.5908709,        "hits"  :  [  {            "_index"  :  "conf",            "_type"  :  "speaker",            "_id"  :  "1",            "_score"  :  0.5908709,            "_source"  :   {        "name":  "Andrei  Zmievski",        "lives":  "San  Francisco",        "likes":  ["coding",  "beer",  "photography"],        "twitter":  "a",        "height":  187 }  }  ]  }  } response the id of the doc Wednesday, February 29, 12

Slide 15

Slide 15 text

2. search curl  http://localhost:9200/conf/speaker/_search?q=beer request {  "took"  :  3,    "_shards"  :  {        "total"  :  1,        "successful"  :  1,        "failed"  :  0    },    "hits"  :  {        "total"  :  1,        "max_score"  :  0.5908709,        "hits"  :  [  {            "_index"  :  "conf",            "_type"  :  "speaker",            "_id"  :  "1",            "_score"  :  0.5908709,            "_source"  :   {        "name":  "Andrei  Zmievski",        "lives":  "San  Francisco",        "likes":  ["coding",  "beer",  "photography"],        "twitter":  "a",        "height":  187 }  }  ]  }  } response the hit score Wednesday, February 29, 12

Slide 16

Slide 16 text

2. search curl  http://localhost:9200/conf/speaker/_search?q=beer request {  "took"  :  3,    "_shards"  :  {        "total"  :  1,        "successful"  :  1,        "failed"  :  0    },    "hits"  :  {        "total"  :  1,        "max_score"  :  0.5908709,        "hits"  :  [  {            "_index"  :  "conf",            "_type"  :  "speaker",            "_id"  :  "1",            "_score"  :  0.5908709,            "_source"  :   {        "name":  "Andrei  Zmievski",        "lives":  "San  Francisco",        "likes":  ["coding",  "beer",  "photography"],        "twitter":  "a",        "height":  187 }  }  ]  }  } response the original doc contents Wednesday, February 29, 12

Slide 17

Slide 17 text

2. search curl  http://localhost:9200/conf/speaker/_search?q=beer request {  "took"  :  3,    "_shards"  :  {        "total"  :  1,        "successful"  :  1,        "failed"  :  0    },    "hits"  :  {        "total"  :  1,        "max_score"  :  0.5908709,        "hits"  :  [  {            "_index"  :  "conf",            "_type"  :  "speaker",            "_id"  :  "1",            "_score"  :  0.5908709,            "_source"  :   {        "name":  "Andrei  Zmievski",        "lives":  "San  Francisco",        "likes":  ["coding",  "beer",  "photography"],        "twitter":  "a",        "height":  187 }  }  ]  }  } response the execution time Wednesday, February 29, 12

Slide 18

Slide 18 text

3. profit that’s up to you Wednesday, February 29, 12

Slide 19

Slide 19 text

demo Wednesday, February 29, 12

Slide 20

Slide 20 text

distributed model built for performance and resiliency zero-conf discovery sharding/replication auto-routing Wednesday, February 29, 12

Slide 21

Slide 21 text

transactional model write consistency is per-document uses write-ahead transaction log 1 second index refresh rate by default Wednesday, February 29, 12

Slide 22

Slide 22 text

storage node data considered transient can be stored in local file system, JVM heap, native OS memory, or FS & memory combination gateway is a persistent storage mechanism local, shared FS, HDFS, S3 Wednesday, February 29, 12

Slide 23

Slide 23 text

mapping describes document structure automatically created with sensible defaults, but can be overridden per field many field types: string, integer/long, float/double, boolean, date, geo, array, object, and more Wednesday, February 29, 12

Slide 24

Slide 24 text

sample mapping {"user":            "derick",  "title":          "Don’t  Panic",  "tags":            ["profiling",  "debugging",  "php"],  "postDate":    "2010-­‐12-­‐22T17:14:12",  "priority":    2} document {"post":  {    "properties"  :  {        "user":            {"type":  "string",  "index":  "not_analyzed"},        "message":      {"type":  "string",  “boost”:  1.5},        "tags":            {"type":  "string",  "include_in_all":  "no"},        "postDate"  :  {"type"  :  "date",  “store”:  “no”},        "priority"  :  {"type"  :  "integer"} }}} mapping Wednesday, February 29, 12

Slide 25

Slide 25 text

sample mapping {"user":            "derick",  "title":          "Don’t  Panic",  "tags":            ["profiling",  "debugging",  "php"],  "postDate":    "2010-­‐12-­‐22T17:14:12",  "priority":    2} document {"post":  {    "properties"  :  {        "user":            {"type":  "string",  "index":  "not_analyzed"},        "message":      {"type":  "string",  “boost”:  1.5},        "tags":            {"type":  "string",  "include_in_all":  "no"},        "postDate"  :  {"type"  :  "date",  “store”:  “no”},        "priority"  :  {"type"  :  "integer"}  not  really  needed }}} mapping Wednesday, February 29, 12

Slide 26

Slide 26 text

analyzers break down (tokenize) and normalize fields during indexing and query strings at search time index:    analysis:        analyzer:            eulang:                type:  custom                tokenizer:  standard                filter:  [standard,  lowercase,  stop,                                  asciifolding,  porterStem] elasticsearch.yml … "title":  {"type":  "string",  "analyzer":  "eulang"}, … mapping Wednesday, February 29, 12

Slide 27

Slide 27 text

filters share some similar features with queries apply to the result of the query why use a filter? Wednesday, February 29, 12

Slide 28

Slide 28 text

filters faster than queries cached (depends on the filter) the cache is used for different queries against the same filter no scoring more useful ones: term, terms, range, prefix, and, or, not, exists, missing, query Wednesday, February 29, 12

Slide 29

Slide 29 text

facets provide aggregated data based on the search request usual purpose is to offer a faceted navigation, or faceted search (EBay and more) facet types: terms, histogram, date histogram, range, statistical, and more Wednesday, February 29, 12

Slide 30

Slide 30 text

rivers pluggable service running within the cluster pulls in data from external sources and indexes it automatic failover current: Twitter, MongoDB, CouchDB, RabbitMQ, RSS, Wikipedia Wednesday, February 29, 12

Slide 31

Slide 31 text

percolator turns searching on its head search: index docs and run queries for matches percolator: index queries and run docs for matches great feature for notification/triggers implementation Wednesday, February 29, 12

Slide 32

Slide 32 text

index aliases each index name can have one or more aliases atomic renames allow on-the-fly index switching actual index: tweets{date} alias: tweets on update, create new index and switch alias Wednesday, February 29, 12

Slide 33

Slide 33 text

filtered index aliases allows creation of “views” into an index associates a filter with an alias curl  -­‐XPOST  http://localhost:9200/_aliases  -­‐d' {        "actions"  :  [                {                        "add"  :  {                                  "index"  :  "posts",                                  "alias"  :  "posts_by_andrei",                                  "filter"  :  {  "term"  :  {  "user"  :  "andrei"  }  }                        }                }        ] }' filtered alias Wednesday, February 29, 12

Slide 34

Slide 34 text

parent/child docs _parent field in mapping establishes relationship between doc types, e.g. comment and post used with has_child and top_children queries Wednesday, February 29, 12

Slide 35

Slide 35 text

geo search implemented as filters (and a facet) geo_distance geo_bounding_box geo_polygon Wednesday, February 29, 12

Slide 36

Slide 36 text

plugins add custom functionality to ES written in Java installable from GitHub custom mapping types, scripting language support, custom discovery, admin tools, and more Wednesday, February 29, 12

Slide 37

Slide 37 text

interfaces REST Java / Groovy clients/integration: Python, PHP, Ruby, Perl, Erlang, Django, Drupal, Symfony2, CouchDB, Flume, Flume sink implementation Wednesday, February 29, 12

Slide 38

Slide 38 text

data import ES is not the primary data store (usually) to import/synchronize data: write an agent (Gearman, message queues, etc) use a river plugin (CouchDB, RabbitMQ, Twitter) Wednesday, February 29, 12

Slide 39

Slide 39 text

References http://github.com/elasticsearch/elasticsearch https://groups.google.com/group/elasticsearch IRC: #elasticsearch on irc.freenode.net twitter: @elasticsearch Useful tutorials: Query DSL Explained ElasticSearch on EC2 Wednesday, February 29, 12

Slide 40

Slide 40 text

Merci! http://joind.in/5998 Wednesday, February 29, 12