ElasticSearch {r}Evolution. Welcome. [DPC12]

elasticsearch {r}evolution welcome. Andrei Zmievski • DPC • June 8,
2012

who am i? curl http://localhost:9200/speaker/info/andrei {“name”: “Andrei Zmievski”, “works”: “AppDynamics”,
“projects”: [“PHP”, “PHP-GTK”, “Smarty”, “Unicode/i18n”], “likes”: [“coding”, “beer”, “brewing”, “photography”], “twitter”: “@a”, “email”: “[email protected]”}

what is elasticsearch? a search engine for the NoSQL generation
domain-driven document-oriented distributed RESTful Lucene-based engine

what has happened? A year ago was at 0.15.0 Just
released 0.19.4 Continuous progress, lots of new features, improved stability, and more Increasing adoption, small and big companies No cloud hosting option yet, but maybe soon

API conventions append ?pretty=true to get readable JSON boolean values:
false/0/off = false, rest is true JSONP support via callback parameter

API structure http://host:port/[index]/[type]/[_action/id] GET http://es:9200/twitter/_status GET http://es:9200/twitter/tweet/1 GET http://es:9200/twitter/tweet/_search GET
http://es:9200/twitter/tweet,user/_search GET http://es:9200/twitter,facebook/_search GET http://es:9200/_search

API query example { "query": {
"filtered": { "query": { "query_string": { "query": "foo bar", "default_operator": "AND", "fields": ["title", "description"], "boost": 2.0 } }, "filter": { "range": {"date": {"gt": "2012-‐02-‐09"}} } } }, "from: 10, "size": 10 }

3 easy steps

1. index curl -‐XPOST http://localhost:9200/conf/speaker/1 -‐d' {
"name": "Andrei Zmievski", "talk": "ElasticSearch Revolution. Welcome.", "likes": ["coding", "beer", "photography"], "twitter": "a", "height": 187 }' request { "ok":true "_index":"conf" "_type":"speaker" "_id":"1" } response

2. search curl http://localhost:9200/conf/speaker/_search?q=beer request { "took" : 3,
"_shards" : { "total" : 1, "successful" : 1, "failed" : 0 }, "hits" : { "total" : 1, "max_score" : 0.5908709, "hits" : [ { "_index" : "conf", "_type" : "speaker", "_id" : "1", "_score" : 0.5908709, "_source" : { "name": "Andrei Zmievski", "lives": "San Francisco", "likes": ["coding", "beer", "photography"], "twitter": "a", "height": 187 } } ] } } response

"_shards" : { "total" : 1, "successful" : 1, "failed" : 0 }, "hits" : { "total" : 1, "max_score" : 0.5908709, "hits" : [ { "_index" : "conf", "_type" : "speaker", "_id" : "1", "_score" : 0.5908709, "_source" : { "name": "Andrei Zmievski", "lives": "San Francisco", "likes": ["coding", "beer", "photography"], "twitter": "a", "height": 187 } } ] } } response total number of hits

"_shards" : { "total" : 1, "successful" : 1, "failed" : 0 }, "hits" : { "total" : 1, "max_score" : 0.5908709, "hits" : [ { "_index" : "conf", "_type" : "speaker", "_id" : "1", "_score" : 0.5908709, "_source" : { "name": "Andrei Zmievski", "lives": "San Francisco", "likes": ["coding", "beer", "photography"], "twitter": "a", "height": 187 } } ] } } response the index of the doc

"_shards" : { "total" : 1, "successful" : 1, "failed" : 0 }, "hits" : { "total" : 1, "max_score" : 0.5908709, "hits" : [ { "_index" : "conf", "_type" : "speaker", "_id" : "1", "_score" : 0.5908709, "_source" : { "name": "Andrei Zmievski", "lives": "San Francisco", "likes": ["coding", "beer", "photography"], "twitter": "a", "height": 187 } } ] } } response the type of the doc

"_shards" : { "total" : 1, "successful" : 1, "failed" : 0 }, "hits" : { "total" : 1, "max_score" : 0.5908709, "hits" : [ { "_index" : "conf", "_type" : "speaker", "_id" : "1", "_score" : 0.5908709, "_source" : { "name": "Andrei Zmievski", "lives": "San Francisco", "likes": ["coding", "beer", "photography"], "twitter": "a", "height": 187 } } ] } } response the id of the doc

"_shards" : { "total" : 1, "successful" : 1, "failed" : 0 }, "hits" : { "total" : 1, "max_score" : 0.5908709, "hits" : [ { "_index" : "conf", "_type" : "speaker", "_id" : "1", "_score" : 0.5908709, "_source" : { "name": "Andrei Zmievski", "lives": "San Francisco", "likes": ["coding", "beer", "photography"], "twitter": "a", "height": 187 } } ] } } response the hit score

"_shards" : { "total" : 1, "successful" : 1, "failed" : 0 }, "hits" : { "total" : 1, "max_score" : 0.5908709, "hits" : [ { "_index" : "conf", "_type" : "speaker", "_id" : "1", "_score" : 0.5908709, "_source" : { "name": "Andrei Zmievski", "lives": "San Francisco", "likes": ["coding", "beer", "photography"], "twitter": "a", "height": 187 } } ] } } response the original doc contents

"_shards" : { "total" : 1, "successful" : 1, "failed" : 0 }, "hits" : { "total" : 1, "max_score" : 0.5908709, "hits" : [ { "_index" : "conf", "_type" : "speaker", "_id" : "1", "_score" : 0.5908709, "_source" : { "name": "Andrei Zmievski", "lives": "San Francisco", "likes": ["coding", "beer", "photography"], "twitter": "a", "height": 187 } } ] } } response the execution time

3. proﬁt that’s up to you

distributed model built for performance and resiliency zero-conf discovery sharding/replication
auto-routing

replicas each shard can have 1 or more replicas #
of replicas can be updated dynamically after index creation replicas can be used for querying in parallel

shard allocation node 1 start with a single node

shard allocation PUT /person { “index”: { “number_of_shards”: 2, “number_of_replicas”:
1 }} node 1 person1 person2

shard allocation node 1 person1 person2 node 2 person1 person2
start the second node

shard allocation node 1 node 2 node 3 node 4
person1 person2 person1 person2 start 2 more nodes

document sharding node 1 node 2 node 3 node 4
person1 person2 person1 person2 PUT /person/info/1 { … }

person1 person2 person1 person2 hashed to shard 1 PUT /person/info/1 { … }

person1 person2 person1 person2 replicated PUT /person/info/1 { … }

person1 person2 person1 person2 PUT /person/info/2 { … }

person1 person2 person1 person2 hashed to shard 2 PUT /person/info/2 { … }

person1 person2 person1 person2 replicated PUT /person/info/2 { … }

scatter-gather node 1 node 2 node 3 node 4 person1
person2 person1 person2 GET /person/_search?q=name:thomas

shard allocation node 1 node 2 node 3 node 4
person1 person2 person1 person2 GET /person/_search?q=name:thomas

transactional model write consistency is per-document uses write-ahead transaction log
1 second index refresh rate by default

storage node data considered transient can be stored in local
ﬁle system, JVM heap, native OS memory, or FS & memory combination gateway is a persistent storage mechanism local, shared FS, HDFS, S3

mapping describes document structure automatically created with sensible defaults, but
can be overridden per field many field types: string, integer/long, float/double, boolean, date, geo, array, object, and more

sample mapping {"user": "derick", "title":
"Don’t Panic", "tags": ["profiling", "debugging", "php"], "postDate": "2010-‐12-‐22T17:14:12", "priority": 2} document {"post": { "properties" : { "user": {"type": "string", "index": "not_analyzed"}, "message": {"type": "string", “boost”: 1.5}, "tags": {"type": "string", "include_in_all": "no"}, "postDate" : {"type" : "date", “store”: “no”}, "priority" : {"type" : "integer"} }}} mapping

sample mapping {"user": "derick", "title":
"Don’t Panic", "tags": ["profiling", "debugging", "php"], "postDate": "2010-‐12-‐22T17:14:12", "priority": 2} document {"post": { "properties" : { "user": {"type": "string", "index": "not_analyzed"}, "message": {"type": "string", “boost”: 1.5}, "tags": {"type": "string", "include_in_all": "no"}, "postDate" : {"type" : "date", “store”: “no”}, "priority" : {"type" : "integer"} not really needed }}} mapping

analyzers break down (tokenize) and normalize ﬁelds during indexing and
query strings at search time index: analysis: analyzer: eulang: type: custom tokenizer: standard filter: [standard, lowercase, stop, asciifolding, porterStem] elasticsearch.yml … "title": {"type": "string", "analyzer": "eulang"}, … mapping

ﬁlters share some similar features with queries apply to the
result of the query why use a ﬁlter?

filters faster than queries cached (depends on the filter) the
cache is used for different queries against the same filter no scoring more useful ones: term, terms, range, prefix, and, or, not, exists, missing, query

facets provide aggregated data based on the search request usual
purpose is to offer a faceted navigation, or faceted search (EBay and more) facet types: terms, histogram, date histogram, range, statistical, and more

rivers pluggable service running within the cluster pulls in data
from external sources and indexes it automatic failover current: Twitter, MongoDB, CouchDB, RabbitMQ, RSS, Wikipedia

percolator turns searching on its head search: index docs and
run queries for matches percolator: index queries and run docs for matches great feature for notiﬁcation/triggers implementation

index aliases each index name can have one or more
aliases atomic renames allow on-the-ﬂy index switching actual index: tweets{date} alias: tweets on update, create new index and switch alias

filtered index aliases allows creation of “views” into an index
associates a filter with an alias curl -‐XPOST http://localhost:9200/_aliases -‐d' { "actions" : [ { "add" : { "index" : "posts", "alias" : "posts_by_andrei", "filter" : { "term" : { "user" : "andrei" } } } } ] }' filtered alias

parent/child docs _parent ﬁeld in mapping establishes relationship between doc
types, e.g. comment and post used with has_child and top_children queries

geo search implemented as ﬁlters (and a facet) geo_distance geo_bounding_box
geo_polygon

plugins add custom functionality to ES written in Java installable
from GitHub custom mapping types, scripting language support, custom discovery, admin tools, and more

interfaces REST Java / Groovy clients/integration: Python, PHP, Ruby, Perl,
Erlang, Django, Drupal, Symfony2, CouchDB, Flume, Flume sink implementation

References http://github.com/elasticsearch/elasticsearch https://groups.google.com/group/elasticsearch IRC: #elasticsearch on irc.freenode.net twitter: @elasticsearch Useful
tutorials: Query DSL Explained ElasticSearch on EC2

Dank u wel! http://joind.in/6236

ElasticSearch {r}Evolution. Welcome. [DPC12]

ElasticSearch {r}Evolution. Welcome. [DPC12]

More Decks by Andrei Zmievski

Other Decks in Technology

Featured

Transcript