From A to JSON - an overview of Elasticsearch

Slide 1

Slide 1 text

Boaz Leskes @bleskes From A to JSON an overview of Elasticsearch

Slide 2

Slide 2 text

Plug & Play

Slide 3

Slide 3 text

Installation $ wget https://download.elasticsearch.org/... $ tar -xf elasticsearch-0.90.9.tar.gz $ ./elasticsearch-0.90.9/bin/elasticsearch -f ... [INFO ][node][Ghost Maker] {0.90.2}[5645]: initializing ...

Slide 4

Slide 4 text

Index a document... $ curl -X PUT localhost:9200/products/product/1 -d '{ "title" : "Welcome!" }'

Slide 5

Slide 5 text

Update a document... $ curl -X PUT localhost:9200/products/product/1 -d '{ "title" : "Welcome to the breakfast. Bon appetite!” }'

Slide 6

Slide 6 text

Search for documents.... $ curl -X GET localhost:9200/products/_search?q=welcome

Slide 7

Slide 7 text

Add a node... $ ./elasticsearch-0.90.9/bin/elasticsearch -f -D es.node.name=Node2 ...[cluster.service] [Node2] detected_master [Node1] ...

Slide 8

Slide 8 text

Add another node... $ ./elasticsearch-0.90.9/bin/elasticsearch -f -D es.node.name=Node3 ...[cluster.service] [Node3] detected_master [Node1] ...

Slide 9

Slide 9 text

{  "id" : "abc123",    "title" : "A JSON Document",    "body" : "A JSON document is a ...",    "published_on" : "2013/06/27 10:00:00", ! "featured" : true,    "tags" : ["search", "json"],    "author" : {  "ﬁrst_name" : "Clara",  "last_name" : "Rice",  "email" : "[email protected]"  }  } Documents as JSON Data structure with basic types, arrays and deep hierarchies

Slide 10

Slide 10 text

http:// Lingua Franca of APIs Also supported: Native Java protocol, Thrift, Memcached

Slide 11

Slide 11 text

No content

Slide 12

Slide 12 text

Until you know what to tweak...

Slide 13

Slide 13 text

Search & Find

Slide 14

Slide 14 text

Terms apple apple iphone Phrases "apple iphone" Proximity "apple safari"~5 Fuzzy apple~0.8 Wildcards app* *pp* Boosting apple^10 safari Range [2011/05/01 TO 2011/05/31] [java TO json] Boolean apple AND NOT iphone +apple -iphone (apple OR iphone) AND NOT review Fields title:iphone^15 OR body:iphone published_on:[2011/05/01 TO "2011/05/27 10:00:00"] http://lucene.apache.org/core/4_5_0/queryparser... $ curl -X GET "http://localhost:9200/_search?q="

Slide 15

Slide 15 text

curl -X GET localhost:9200/articles/_search -d '{  "query" : {  "filtered" : {  "query" : {  "bool" : {    "must" : {  "match" : {  "author.first_name" : {  "query" : "claire",  "fuzziness" : 0.1  }  }  },    "must" : {  "multi_match" : {  "query" : "elasticsearch",  "fields" : ["title^10", "body"]  }  }  }    },    "filter": {  "and" : [  { "terms" : { "tags" : ["search"] } },  { "range" : { "published_on": {"from": "2013"} } },  { "term" : { "featured" : true } }  ]  }  }  }  }' JSON-based Query DSL

Slide 16

Slide 16 text

Slide 17

Slide 17 text

Slide 18

Slide 18 text

Slide 19

Slide 19 text

Slide 20

Slide 20 text

“Find all articles with ‘search’ in their title or body, give matches in titles higher score” Full-text Search “Find all articles from year 2013 tagged ‘search’” Structured Search See custom_score and custom_filters_score queries Custom Scoring

Slide 21

Slide 21 text

Fetch document field ➝ Pick configured analyzer ➝ Parse text into tokens ➝ Apply token filters ➝ Store into index How Search Engine Works? Result Results Query How Users See Search?

Slide 22

Slide 22 text

Mapping curl -X PUT localhost:9200/articles/_mapping -d '{  "article" : {  "properties" : {  "title" : {  "type" : "string",  "analyzer" : "english"  }  }  }  }' Configuring document properties for the search engine

Slide 23

Slide 23 text

_analyze?pretty&format=text&text=jumping+jack+flash.  The _analyze API [jumping:0->7:] [jack:8->12:] [flash:13->18:] _analyze?pretty&format=text&text=jumping+jack+flash.&analyzer=english  [jump:0->7:] [jack:8->12:] [flash:13->18:] _analyze?text=...&tokenizer=X&filters=A,B,C

Slide 24

Slide 24 text

Slice & Dice

Slide 25

Slide 25 text

Query Facets

Slide 26

Slide 26 text

curl -X POST 'localhost:9200/articles/_search?search_type=count&pretty' -d '{  "facets": {  "tag-cloug": {  "terms" : {  "ﬁeld" : "tags"  }  }  }  }'  “Tag Cloud” With the terms Facet "facets" : { "tag-cloug" : { "terms" : [ { "term" : "ruby", "count" : 3 }, { "term" : "java", "count" : 2 }, ... } ] } } Simplest “map/reduce” aggregation: document count per tag

Slide 27

Slide 27 text

curl -X GET 'localhost:9200/scores/_search/?search_type=count&pretty' -d '{  "facets": {  "scores-per-subject" : {  "terms_stats" : {  "key_ﬁeld" : "subject",  "value_ﬁeld" : "score"  }  }  }  }'  Statistics on Student Scores With the terms_stats Facet "facets" : { "scores-per-subject" : { "_type" : "terms_stats", "missing" : 0, "terms" : [ { "term" : "math", "count" : 4, "total_count" : 4, "min" : 25.0, "max" : 92.0, "total" : 267.0, "mean" : 66.75 }, ... ] } } Aggregating statistics per subject

Slide 28

Slide 28 text

curl -X GET 'localhost:9200/scores/_search/?search_type=count&pretty' -d '{  "query" : {  "match" : {  "student" : "john"  }  },  "facets": {  "scores-per-subject" : {  "terms_stats" : {  "key_ﬁeld" : "subject",  "value_ﬁeld" : "score"  }  }  }  }'  Statistics on Student Scores With the terms_stats Facet "facets" : { "scores-per-subject" : { "_type" : "terms_stats", "missing" : 0, "terms" : [ { "term" : "math", "count" : 1, "total_count" : 1, "min" : 85.0, "max" : 85.0, "total" : 85.0, "mean" : 85.0 }, ... ] } } Realtime filtering with queries and filters

Slide 29

Slide 29 text

Facets Terms Terms Stats Statistical Range Histogram Date Histogram Filter Query Geo Distance

Slide 30

Slide 30 text

Above & Beyond

Slide 31

Slide 31 text

Above & Beyond Bulk operations (For indexing and search operations) Percolator (“reversed search” — alerts, classification, …) Suggesters (“Did you mean …?”) Index aliases (Grouping, filtering or “renaming” of indices) Index templates (Automatic index configuration) Monitoring API (Amount of memory used, number of operations, …) …

Slide 32

Slide 32 text

Aggregations "

Slide 33

Slide 33 text

What’s wrong with facets? nothing it’s just that we want more…

Slide 34

Slide 34 text

curl -X GET 'localhost:9200/scores/_search/?search_type=count&pretty' -d '{  "query" : {  "match" : {  "student" : "john"  }  },  "aggs": {  "scores-per-subject" : {  "terms" : { "ﬁeld" : “subject” }, "aggs" : { “avg_score” : { "avg" : { "ﬁeld" : "score"  } } } }  }  }'  "aggregations" : { "scores-per-subject" : { "terms" : [ { "term" : "math", "doc_count" : 1, "avg_score" : { “value": 85.0 } }, ... ] } }

Slide 35

Slide 35 text

curl -X GET 'localhost:9200/scores/_search/?search_type=count&pretty' -d '{  "query" : {  "match" : {  "student" : "john"  }  },  "aggs": {  "scores-per-subject" : {  "terms" : { "field" : “subject” }, "aggs" : { "avg_score_by_year”: { “date_histogram”: { "field" : "date", "interval" : "year", "format" : "yyyy" } } "aggs": { "avg_score" : { "avg": { "field" : "score"  } } } } } }  }  }'  "aggregations" : { "scores-per-subject" : { "terms" : [ { "term" : "math", "doc_count" : 1, "avg_score_by_year" : [{ "key_as_string": "2013", "avg_score": { “value”: 85.0 } }… ] }, ... ] } }

Slide 36

Slide 36 text

Distributed Percolation "

Slide 37

Slide 37 text

curl -XPUT “localhost:9200/twitter/.percolator/es-tweets” -d ‘{ “query”: { “match”: { “body”: “elasticsearch” } } }’ $ curl -XGET “localhost:9200/twitter/_percolate” -d ‘{ “doc”: { “body”: “#elasticsearch is awesome” “nick”: “@imotov” “name”: “Igor Motov” “date”: “2013-11-03” } }’ { … “matches”: [ { “_index”: “twitter”, “_id”: “es-tweets” } ] }

Slide 38

Slide 38 text

So what’s in distribution? •Highlighting •Sorting •Multi-Index support •Aggregations •Multi-Percolate

Slide 39

Slide 39 text

Snapshot & Restore "

Slide 40

Slide 40 text

Backup, 0.90 style 1. disable flush 2. find all primary shard location (optional) 3. copy files from primary shards (rsync) 4. enable flush

Slide 41

Slide 41 text

curl -XPUT “localhost:9200/_snapshot/my_backup/snapshot_20140101” Backup, 1.0 style

Slide 42

Slide 42 text

Register a repository curl -XPUT "localhost:9200/_snapshot/my_backup" -d '{ "type": "fs", "settings": { "location":"/mnt/es-test-repo" } }'

Slide 43

Slide 43 text

curl -XPUT “localhost:9200/_snapshot/my_backup/snapshot_20140101” -d ‘{ "indices":"+test_*,-test_4" }’ Creating a Snapshot

Slide 44

Slide 44 text

Backup, 0.90 style 1. close the index (shutdown the cluster) 2. find all existing index shards 3. replace all index shards with data from backup 4. open the index (start the cluster)

Slide 45

Slide 45 text

curl -XPOST "localhost:9200/test_*/_close" Restore, 1.0 style curl -XPOST "localhost:9200/_snapshot/my_backup/snapshot_20140101" -d '{ "indices":"test_*" }'

Slide 46

Slide 46 text

Shard & Cluster

Slide 47

Slide 47 text

A curl -XPUT 'http://localhost:9200/a/' -d '{  "settings" : {  "index" : {  "number_of_shards" : 3,  "number_of_replicas" : 1  }  }  }'  Index is partitioned into 3 primary shards, each is duplicated in 1 replica shard A1 A2 A3 Replicas Primaries A1' A2' A3'

Slide 48

Slide 48 text

1 node 2 nodes 3 nodes Demo "index.routing.allocation.exclude.name" : "Node1" "cluster.routing.allocation.exclude.name" : "Node3" ...

Slide 49

Slide 49 text

thanks!