Slide 1

Slide 1 text

elasticsearch {r}evolution welcome. Andrei Zmievski • DPC • June 8, 2012

Slide 2

Slide 2 text

TRUTH

Slide 3

Slide 3 text

who am i? curl http://localhost:9200/speaker/info/andrei {“name”: “Andrei Zmievski”, “works”: “AppDynamics”, “projects”: [“PHP”, “PHP-GTK”, “Smarty”, “Unicode/i18n”], “likes”: [“coding”, “beer”, “brewing”, “photography”], “twitter”: “@a”, “email”: “[email protected]”}

Slide 4

Slide 4 text

what is elasticsearch? a search engine for the NoSQL generation domain-driven document-oriented distributed RESTful Lucene-based engine

Slide 5

Slide 5 text

what has happened? A year ago was at 0.15.0 Just released 0.19.4 Continuous progress, lots of new features, improved stability, and more Increasing adoption, small and big companies No cloud hosting option yet, but maybe soon

Slide 6

Slide 6 text

API conventions append ?pretty=true to get readable JSON boolean values: false/0/off = false, rest is true JSONP support via callback parameter

Slide 7

Slide 7 text

API structure http://host:port/[index]/[type]/[_action/id] GET http://es:9200/twitter/_status GET http://es:9200/twitter/tweet/1 GET http://es:9200/twitter/tweet/_search GET http://es:9200/twitter/tweet,user/_search GET http://es:9200/twitter,facebook/_search GET http://es:9200/_search

Slide 8

Slide 8 text

API query example {        "query":  {                "filtered":  {                        "query":  {                                "query_string":  {                                        "query":  "foo  bar",                                        "default_operator":  "AND",                                        "fields":  ["title",  "description"],                                        "boost":  2.0                                }                        },                        "filter":  {                                "range":  {"date":  {"gt":  "2012-­‐02-­‐09"}}                        }                }        },        "from:  10,        "size":  10 }

Slide 9

Slide 9 text

3 easy steps

Slide 10

Slide 10 text

1. index curl  -­‐XPOST  http://localhost:9200/conf/speaker/1  -­‐d' {        "name":  "Andrei  Zmievski",        "talk":  "ElasticSearch  Revolution.  Welcome.",        "likes":  ["coding",  "beer",  "photography"],        "twitter":  "a",        "height":  187 }' request {        "ok":true        "_index":"conf"        "_type":"speaker"        "_id":"1" } response

Slide 11

Slide 11 text

2. search curl  http://localhost:9200/conf/speaker/_search?q=beer request {  "took"  :  3,    "_shards"  :  {        "total"  :  1,        "successful"  :  1,        "failed"  :  0    },    "hits"  :  {        "total"  :  1,        "max_score"  :  0.5908709,        "hits"  :  [  {            "_index"  :  "conf",            "_type"  :  "speaker",            "_id"  :  "1",            "_score"  :  0.5908709,            "_source"  :   {        "name":  "Andrei  Zmievski",        "lives":  "San  Francisco",        "likes":  ["coding",  "beer",  "photography"],        "twitter":  "a",        "height":  187 }  }  ]  }  } response

Slide 12

Slide 12 text

2. search curl  http://localhost:9200/conf/speaker/_search?q=beer request {  "took"  :  3,    "_shards"  :  {        "total"  :  1,        "successful"  :  1,        "failed"  :  0    },    "hits"  :  {        "total"  :  1,        "max_score"  :  0.5908709,        "hits"  :  [  {            "_index"  :  "conf",            "_type"  :  "speaker",            "_id"  :  "1",            "_score"  :  0.5908709,            "_source"  :   {        "name":  "Andrei  Zmievski",        "lives":  "San  Francisco",        "likes":  ["coding",  "beer",  "photography"],        "twitter":  "a",        "height":  187 }  }  ]  }  } response total number of hits

Slide 13

Slide 13 text

2. search curl  http://localhost:9200/conf/speaker/_search?q=beer request {  "took"  :  3,    "_shards"  :  {        "total"  :  1,        "successful"  :  1,        "failed"  :  0    },    "hits"  :  {        "total"  :  1,        "max_score"  :  0.5908709,        "hits"  :  [  {            "_index"  :  "conf",            "_type"  :  "speaker",            "_id"  :  "1",            "_score"  :  0.5908709,            "_source"  :   {        "name":  "Andrei  Zmievski",        "lives":  "San  Francisco",        "likes":  ["coding",  "beer",  "photography"],        "twitter":  "a",        "height":  187 }  }  ]  }  } response the index of the doc

Slide 14

Slide 14 text

2. search curl  http://localhost:9200/conf/speaker/_search?q=beer request {  "took"  :  3,    "_shards"  :  {        "total"  :  1,        "successful"  :  1,        "failed"  :  0    },    "hits"  :  {        "total"  :  1,        "max_score"  :  0.5908709,        "hits"  :  [  {            "_index"  :  "conf",            "_type"  :  "speaker",            "_id"  :  "1",            "_score"  :  0.5908709,            "_source"  :   {        "name":  "Andrei  Zmievski",        "lives":  "San  Francisco",        "likes":  ["coding",  "beer",  "photography"],        "twitter":  "a",        "height":  187 }  }  ]  }  } response the type of the doc

Slide 15

Slide 15 text

2. search curl  http://localhost:9200/conf/speaker/_search?q=beer request {  "took"  :  3,    "_shards"  :  {        "total"  :  1,        "successful"  :  1,        "failed"  :  0    },    "hits"  :  {        "total"  :  1,        "max_score"  :  0.5908709,        "hits"  :  [  {            "_index"  :  "conf",            "_type"  :  "speaker",            "_id"  :  "1",            "_score"  :  0.5908709,            "_source"  :   {        "name":  "Andrei  Zmievski",        "lives":  "San  Francisco",        "likes":  ["coding",  "beer",  "photography"],        "twitter":  "a",        "height":  187 }  }  ]  }  } response the id of the doc

Slide 16

Slide 16 text

2. search curl  http://localhost:9200/conf/speaker/_search?q=beer request {  "took"  :  3,    "_shards"  :  {        "total"  :  1,        "successful"  :  1,        "failed"  :  0    },    "hits"  :  {        "total"  :  1,        "max_score"  :  0.5908709,        "hits"  :  [  {            "_index"  :  "conf",            "_type"  :  "speaker",            "_id"  :  "1",            "_score"  :  0.5908709,            "_source"  :   {        "name":  "Andrei  Zmievski",        "lives":  "San  Francisco",        "likes":  ["coding",  "beer",  "photography"],        "twitter":  "a",        "height":  187 }  }  ]  }  } response the hit score

Slide 17

Slide 17 text

2. search curl  http://localhost:9200/conf/speaker/_search?q=beer request {  "took"  :  3,    "_shards"  :  {        "total"  :  1,        "successful"  :  1,        "failed"  :  0    },    "hits"  :  {        "total"  :  1,        "max_score"  :  0.5908709,        "hits"  :  [  {            "_index"  :  "conf",            "_type"  :  "speaker",            "_id"  :  "1",            "_score"  :  0.5908709,            "_source"  :   {        "name":  "Andrei  Zmievski",        "lives":  "San  Francisco",        "likes":  ["coding",  "beer",  "photography"],        "twitter":  "a",        "height":  187 }  }  ]  }  } response the original doc contents

Slide 18

Slide 18 text

2. search curl  http://localhost:9200/conf/speaker/_search?q=beer request {  "took"  :  3,    "_shards"  :  {        "total"  :  1,        "successful"  :  1,        "failed"  :  0    },    "hits"  :  {        "total"  :  1,        "max_score"  :  0.5908709,        "hits"  :  [  {            "_index"  :  "conf",            "_type"  :  "speaker",            "_id"  :  "1",            "_score"  :  0.5908709,            "_source"  :   {        "name":  "Andrei  Zmievski",        "lives":  "San  Francisco",        "likes":  ["coding",  "beer",  "photography"],        "twitter":  "a",        "height":  187 }  }  ]  }  } response the execution time

Slide 19

Slide 19 text

3. profit that’s up to you

Slide 20

Slide 20 text

demo

Slide 21

Slide 21 text

distributed model built for performance and resiliency zero-conf discovery sharding/replication auto-routing

Slide 22

Slide 22 text

replicas each shard can have 1 or more replicas # of replicas can be updated dynamically after index creation replicas can be used for querying in parallel

Slide 23

Slide 23 text

shard allocation node 1 start with a single node

Slide 24

Slide 24 text

shard allocation PUT /person { “index”: { “number_of_shards”: 2, “number_of_replicas”: 1 }} node 1 person1 person2

Slide 25

Slide 25 text

shard allocation node 1 person1 person2 node 2 person1 person2 start the second node

Slide 26

Slide 26 text

shard allocation node 1 node 2 node 3 node 4 person1 person2 person1 person2 start 2 more nodes

Slide 27

Slide 27 text

shard allocation node 1 node 2 node 3 node 4 person1 person2 person1 person2 start 2 more nodes

Slide 28

Slide 28 text

document sharding node 1 node 2 node 3 node 4 person1 person2 person1 person2 PUT /person/info/1 { … }

Slide 29

Slide 29 text

document sharding node 1 node 2 node 3 node 4 person1 person2 person1 person2 hashed to shard 1 PUT /person/info/1 { … }

Slide 30

Slide 30 text

document sharding node 1 node 2 node 3 node 4 person1 person2 person1 person2 replicated PUT /person/info/1 { … }

Slide 31

Slide 31 text

document sharding node 1 node 2 node 3 node 4 person1 person2 person1 person2 PUT /person/info/2 { … }

Slide 32

Slide 32 text

document sharding node 1 node 2 node 3 node 4 person1 person2 person1 person2 hashed to shard 2 PUT /person/info/2 { … }

Slide 33

Slide 33 text

document sharding node 1 node 2 node 3 node 4 person1 person2 person1 person2 replicated PUT /person/info/2 { … }

Slide 34

Slide 34 text

scatter-gather node 1 node 2 node 3 node 4 person1 person2 person1 person2 GET /person/_search?q=name:thomas

Slide 35

Slide 35 text

shard allocation node 1 node 2 node 3 node 4 person1 person2 person1 person2 GET /person/_search?q=name:thomas

Slide 36

Slide 36 text

shard allocation node 1 node 2 node 3 node 4 person1 person2 person1 person2 GET /person/_search?q=name:thomas

Slide 37

Slide 37 text

shard allocation node 1 node 2 node 3 node 4 person1 person2 person1 person2 GET /person/_search?q=name:thomas

Slide 38

Slide 38 text

transactional model write consistency is per-document uses write-ahead transaction log 1 second index refresh rate by default

Slide 39

Slide 39 text

storage node data considered transient can be stored in local file system, JVM heap, native OS memory, or FS & memory combination gateway is a persistent storage mechanism local, shared FS, HDFS, S3

Slide 40

Slide 40 text

mapping describes document structure automatically created with sensible defaults, but can be overridden per field many field types: string, integer/long, float/double, boolean, date, geo, array, object, and more

Slide 41

Slide 41 text

sample mapping {"user":            "derick",  "title":          "Don’t  Panic",  "tags":            ["profiling",  "debugging",  "php"],  "postDate":    "2010-­‐12-­‐22T17:14:12",  "priority":    2} document {"post":  {    "properties"  :  {        "user":            {"type":  "string",  "index":  "not_analyzed"},        "message":      {"type":  "string",  “boost”:  1.5},        "tags":            {"type":  "string",  "include_in_all":  "no"},        "postDate"  :  {"type"  :  "date",  “store”:  “no”},        "priority"  :  {"type"  :  "integer"} }}} mapping

Slide 42

Slide 42 text

sample mapping {"user":            "derick",  "title":          "Don’t  Panic",  "tags":            ["profiling",  "debugging",  "php"],  "postDate":    "2010-­‐12-­‐22T17:14:12",  "priority":    2} document {"post":  {    "properties"  :  {        "user":            {"type":  "string",  "index":  "not_analyzed"},        "message":      {"type":  "string",  “boost”:  1.5},        "tags":            {"type":  "string",  "include_in_all":  "no"},        "postDate"  :  {"type"  :  "date",  “store”:  “no”},        "priority"  :  {"type"  :  "integer"}  not  really  needed }}} mapping

Slide 43

Slide 43 text

analyzers break down (tokenize) and normalize fields during indexing and query strings at search time index:    analysis:        analyzer:            eulang:                type:  custom                tokenizer:  standard                filter:  [standard,  lowercase,  stop,                                  asciifolding,  porterStem] elasticsearch.yml … "title":  {"type":  "string",  "analyzer":  "eulang"}, … mapping

Slide 44

Slide 44 text

filters share some similar features with queries apply to the result of the query why use a filter?

Slide 45

Slide 45 text

filters faster than queries cached (depends on the filter) the cache is used for different queries against the same filter no scoring more useful ones: term, terms, range, prefix, and, or, not, exists, missing, query

Slide 46

Slide 46 text

facets provide aggregated data based on the search request usual purpose is to offer a faceted navigation, or faceted search (EBay and more) facet types: terms, histogram, date histogram, range, statistical, and more

Slide 47

Slide 47 text

rivers pluggable service running within the cluster pulls in data from external sources and indexes it automatic failover current: Twitter, MongoDB, CouchDB, RabbitMQ, RSS, Wikipedia

Slide 48

Slide 48 text

percolator turns searching on its head search: index docs and run queries for matches percolator: index queries and run docs for matches great feature for notification/triggers implementation

Slide 49

Slide 49 text

index aliases each index name can have one or more aliases atomic renames allow on-the-fly index switching actual index: tweets{date} alias: tweets on update, create new index and switch alias

Slide 50

Slide 50 text

filtered index aliases allows creation of “views” into an index associates a filter with an alias curl  -­‐XPOST  http://localhost:9200/_aliases  -­‐d' {        "actions"  :  [                {                        "add"  :  {                                  "index"  :  "posts",                                  "alias"  :  "posts_by_andrei",                                  "filter"  :  {  "term"  :  {  "user"  :  "andrei"  }  }                        }                }        ] }' filtered alias

Slide 51

Slide 51 text

parent/child docs _parent field in mapping establishes relationship between doc types, e.g. comment and post used with has_child and top_children queries

Slide 52

Slide 52 text

geo search implemented as filters (and a facet) geo_distance geo_bounding_box geo_polygon

Slide 53

Slide 53 text

plugins add custom functionality to ES written in Java installable from GitHub custom mapping types, scripting language support, custom discovery, admin tools, and more

Slide 54

Slide 54 text

interfaces REST Java / Groovy clients/integration: Python, PHP, Ruby, Perl, Erlang, Django, Drupal, Symfony2, CouchDB, Flume, Flume sink implementation

Slide 55

Slide 55 text

References http://github.com/elasticsearch/elasticsearch https://groups.google.com/group/elasticsearch IRC: #elasticsearch on irc.freenode.net twitter: @elasticsearch Useful tutorials: Query DSL Explained ElasticSearch on EC2

Slide 56

Slide 56 text

Dank u wel! http://joind.in/6236