Slide 1

Slide 1 text

Elasticsearch in action By Thijs Feryn

Slide 2

Slide 2 text

Explain in 1 slide

Slide 3

Slide 3 text

•Full-text search engine •NoSQL database •Analytics engine •Written in Java •Lucene based ( ~Solr) •Inverted indices •Easy to scale (~Elastic) •RESTFul interface (HTTP/JSON) •Schemaless •Real-time •ELK stack

Slide 4

Slide 4 text

Still with me?

Slide 5

Slide 5 text

Hi, I’m Thijs

Slide 6

Slide 6 text

I’m @ThijsFeryn on Twitter

Slide 7

Slide 7 text

I’m an Evangelist At

Slide 8

Slide 8 text

No content

Slide 9

Slide 9 text

Bonjour

Slide 10

Slide 10 text

https://www.elastic.co/ downloads/elasticsearch

Slide 11

Slide 11 text

{ "cluster_name": "elasticsearch", "cluster_uuid": "KKD4RjtCTTWoRomBDXyCDA", "name": "y3qufp6", "tagline": "You Know, for Search", "version": { "build_date": "2017-02-09T22:05:32.386Z", "build_hash": "db0d481", "build_snapshot": false, "lucene_version": "6.4.1", "number": "5.2.1" } } http://localhost: 9200

Slide 12

Slide 12 text

RDBMS Elasticsearch Database Table Row Index Type Document

Slide 13

Slide 13 text

Contains 5.x syntax, not the old 2.x syntax !

Slide 14

Slide 14 text

PUT /blog { "acknowledged": true, "shards_acknowledged": true }

Slide 15

Slide 15 text

POST/blog/post/6576 { "language":"en", "title":"Combell revamps its hosting range: fewer packages, more features!", "date":"Thu, 08 Dec 2016 15:35:59 +0000", "author":"Combell", "category":[ "Combell news", "Hosting", "Combell", "hosting" ], "guid":"6576" }

Slide 16

Slide 16 text

{ "_index": "blog", "_type": "post", "_id": "6576", "_version": 1, "result": "created", "_shards": { "total": 2, "successful": 1, "failed": 0 }, "created": true } Confirmation

Slide 17

Slide 17 text

GET /blog/post/6576 { "_index": "blog", "_type": "post", "_id": "6576", "_version": 1, "found": true, "_source": { "language": "en", "title": "Combell revamps its hosting range: fewer packages, more features!", "date": "Thu, 08 Dec 2016 15:35:59 +0000", "author": "Combell", "category": [ "Combell news", "Hosting", "Combell", "hosting" ], "guid": "6576" } } Retrieve document by id Document & meta data

Slide 18

Slide 18 text

GET /blog/_mapping { "blogger": { "mappings": { "post": { "properties": { "author": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "category": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "date": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "guid": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 Schemaless? Not really … “Guesses” mapping on insert

Slide 19

Slide 19 text

String types ✓Text analyzed ✓Keyword non-analyzed ✓String • Analyzed • Extra non-analyzed keyword field • Deprecated

Slide 20

Slide 20 text

0 hits, analyzed data POST /blog/_search { "_source": "date", "query": { "wildcard": { "date": { "value": "*11:50*" } } } } POST /blog/_search { "_source": "date", "query": { "wildcard": { "date.keyword": { "value": "*11:50*" } } } } 2 hits, non-analyzed data

Slide 21

Slide 21 text

Explicit mapping

Slide 22

Slide 22 text

PUT /blog { "mappings" : { "post" : { "properties": { "title" : { "type" : "text" }, "date" : { "type" : "date", "format": "E, dd MMM YYYY HH:mm:ss Z" }, "author": { "type": "keyword" }, "category": { "type": "keyword" }, "guid": { "type": "integer" } } } } } Explicit mapping at index creation time

Slide 23

Slide 23 text

PUT /blog {
 "mappings": {
 "post": {
 "properties": {
 "title": {
 "type": "text",
 "fields": {
 "nl": {
 "type": "text",
 "analyzer": "dutch"
 },
 "en": {
 "type": "text",
 "analyzer": "english"
 },
 "keyword": {
 "type": "keyword"
 } 
 }
 },
 "date": {
 "type": "date",
 "format": "E, dd MMM YYYY HH:mm:ss Z"
 },
 "language": {
 "type": "keyword"
 },
 "author": {
 "type": "keyword"
 },
 "category": {
 "type": "keyword"
 },
 "guid": {
 "type": "integer"
 }
 }
 }
 }
 } Alternative mapping

Slide 24

Slide 24 text

"properties": {
 "title": {
 "type": "text",
 "fields": {
 "nl": {
 "type": "text",
 "analyzer": "dutch"
 },
 "en": {
 "type": "text",
 "analyzer": "english"
 },
 "keyword": {
 "type": "keyword"
 } 
 }
 }, What’s with the analyzers?

Slide 25

Slide 25 text

Analyzed vs non-analyzed

Slide 26

Slide 26 text

Full-text vs exact value

Slide 27

Slide 27 text

By default strings are analyzed … unless you mention it in the mapping

Slide 28

Slide 28 text

Built-in analyzers •Standard •Simple •Whitespace •Stop •Keyword •Pattern •Language •Snowball •Custom Standard tokenizer Lowercase token filter English stop word token filter

Slide 29

Slide 29 text

POST /_analyze { "analyzer" : "standard", "text" : "Hey man, how are you doing?" } POST /_analyze { "analyzer" : "whitespace", "text" : "Hey man, how are you doing?" } POST /_analyze { "analyzer" : "english", "text" : "Hey man, how are you doing?" } Test the analyzer

Slide 30

Slide 30 text

Hey man, how are you doing? hey man how are you doing Standard Hey man, how are you doing? Whitespace hei man how you do English

Slide 31

Slide 31 text

POST /blog/post/_search { "_source": ["title"], "query": { "match": { "title": "working" } } }

Slide 32

Slide 32 text

"hits": { "total": 1, "max_score": 4.5011063, "hits": [ { "_index": "blog", "_type": "post", "_id": "2742", "_score": 4.5011063, "_source": { "title": "Hosted SharePoint 2010: working efficiently as a team" } } ] } }

Slide 33

Slide 33 text

POST /blog/post/_search { "_source": ["title"], "query": { "match": { "title.en": "working" } } }

Slide 34

Slide 34 text

}, "hits": { "total": 4, "max_score": 4.9479203, "hits": [ { "_index": "blog", "_type": "post", "_id": "2742", "_score": 4.9479203, "_source": { "title": "Hosted SharePoint 2010: working efficiently as a team" } }, { "_index": "blog", "_type": "post", "_id": "5586", "_score": 4.896652, "_source": { "title": "WebAssembly: several world players work on a faster Internet" } }, { "_index": "blog", "_type": "post", "_id": "3873", "_score": 4.8334804, "_source": { "title": "SSL: what is it and how does it work?" } },

Slide 35

Slide 35 text

Search

Slide 36

Slide 36 text

POST /blog/post/_count { "query": { "match": { "title": "PROXY protocol support in Varnish" } } } 164 posts 1 post POST /blog/post/_count { "query": { "bool": { "filter": { "term": { "title.keyword": "PROXY protocol support in Varnish" } } } } }

Slide 37

Slide 37 text

Filter context vs Query context

Slide 38

Slide 38 text

Filter •Does it match? Yes or no •When relevance doesn’t matter •Faster & cacheable •For non-analyzed data Query •How well does it match? •For full-text search •On analyzed/tokenized data

Slide 39

Slide 39 text

Match Query Match Phrase Query Match Phrase Prefix Query Multi Match Query Common Terms Query Query String Query Simple Query String Query Term Query Terms Query Range Query Exists Query Prefix Query Wildcard Query Regexp Query Fuzzy Query Type Query Ids Query Constant Score Query Bool Query Dis Max Query Function Score Query Boosting Query Indices Query Nested Query Has Child Query Has Parent Query Parent Id Query GeoShape Query Geo Bounding Box Query Geo Distance Query Geo Distance Range Query Geo Polygon Query More Like This Query Template Query Script Query Percolate Query Span Term Query Span Multi Term Query Span First Query Span Near Query Span Or Query Span Not Query Span Containing Query Span Within Query Span Field Masking Query

Slide 40

Slide 40 text

Filter examples

Slide 41

Slide 41 text

POST /blog/post/_search?pretty { "query": { "bool": { "filter": { "ids": { "values": [231,234,258] } } } } }

Slide 42

Slide 42 text

POST /blog/_search { "query": { "bool": { "filter": { "bool": { "must" : [ { "term" : { "language" : "en" } }, { "range" : { "date" : { "gte" : "2016-01-01", "format" : "yyyy-MM-dd" } } } ], "must_not" : [ { "term" : { "category" : "joomla" } } ], "should" : [ { "term" : { "category" : "Hosting" } }, { "term" : { "category" : "evangelist" } } ] } } } } }

Slide 43

Slide 43 text

POST /blog/_search?pretty { "query": { "bool": { "filter": { "prefix": { "title.keyword": "Combell" } } } } }

Slide 44

Slide 44 text

POST /cities/city/_search { "size": 200, "sort": [ { "city": { "order": "asc" } } ], "query": { "bool": { "filter": { "geo_distance": { "distance": "5km", "location": { "lat": 51.033333, "lon": 2.866667 } } } } } } Requires “geo point” typed field

Slide 45

Slide 45 text

POST /cities/city/_search { "size": 200, "query": { "bool": { "filter": { "geo_bounding_box": { "location": { "bottom_left": { "lat": 51.1, "lon": 2.6 }, "top_right": { "lat": 51.2, "lon": 2.7 } } } } } } } Requires “geo point” typed field Draw a “box”

Slide 46

Slide 46 text

Relevance

Slide 47

Slide 47 text

POST /blog/_search { "_source": ["title"], "query": { "bool": { "must": [ { "match": { "title": "varnish thijs" } }, { "bool": { "filter": { "term": { "language": "en" } } } } ] } } }

Slide 48

Slide 48 text

{ "took": 3, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 8, "max_score": 1.984594, "hits": [ { "_index": "blog", "_type": "post", "_id": "4275", "_score": 1.984594, "fields": { "title": [ "Thijs Feryn gave a demo of Varnish Cache on WordPress during a Future Insights webinar" ] } }, { "_index": "blog", "_type": "post", "_id": "6238", "_score": 0.8335616, "fields": { "title": [ "PROXY protocol support in Varnish" ] } }, { "_index": "blog", Hits both terms. More relevant

Slide 49

Slide 49 text

POST /blog/_search?_source=false { "query": { "bool": { "filter": { "term": { "category": "PHPBenelux" } } } } } Using a filter instead of a query We don’t care about the source

Slide 50

Slide 50 text

{ "took": 2, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 4, "max_score": 0, "hits": [ { "_index": "blog", "_type": "post", "_id": "6700", "_score": 0 }, { "_index": "blog", "_type": "post", "_id": "13425", "_score": 0 }, { "_index": "blog", "_type": "post", "_id": "6254", "_score": 0 }, No relevance on filters Score is always 0

Slide 51

Slide 51 text

POST /blog/_search { "_source": ["title", "category"], "query": { "bool": { "must": [ { "match": { "title": "thijs feryn" } } ], "should": [ { "match": { "category": "Varnish" } } ] } } } Only search for “thijs feryn” Increase relevance if category contains “Varnish”

Slide 52

Slide 52 text

POST /blog/_search { "_source": ["title", "category"], "query": { "bool": { "must_not": [ { "bool": { "filter": { "term": { "author": "Romy" } } } } ], "should": [ { "match": { "category": "Magento" } } ] } } } Increase relevance Combining filters & queries

Slide 53

Slide 53 text

POST /blog/_search { "query": { "bool": { "should": [ { "match": { "title": { "query": "Magento", "boost" : 3 } } }, { "match": { "title": { "query": "Wordpress", "boost" : 2 } } } ] } } } Increase relevance Query- time boosting

Slide 54

Slide 54 text

Multi index multi type

Slide 55

Slide 55 text

/_search /products/_search /products/product/_search /products,clients/_search /pro*/_search /pro*,cli*/_search /products/product,invoice/_search /products/pro*/_search /_all/product/_search /_all/product,invoice/_search /_all/pro*/_search

Slide 56

Slide 56 text

Multi “all the things”

Slide 57

Slide 57 text

Aggregations

Slide 58

Slide 58 text

Group by on steroids

Slide 59

Slide 59 text

SELECT author, COUNT(guid) FROM blog.post GROUP BY author Aggregations in SQL Metric Bucket

Slide 60

Slide 60 text

SELECT author, COUNT(guid) FROM blog.post GROUP BY author POST /blog/post/_search?pretty { "size": 0 "aggs": { "popular_bloggers": { "terms": { "field": "author" } } } } Only aggs, no docs

Slide 61

Slide 61 text

"aggregations": { "popular_bloggers": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 1, "buckets": [ { "key": "Romy", "doc_count": 458 }, { "key": "Jimmy Cappaert", "doc_count": 160 }, { "key": "Tom", "doc_count": 144 }, { "key": "Combell", "doc_count": 143 }, { "key": "Christophe", "doc_count": 32 }, { "key": "Dorien Marinus", "doc_count": 19 }, Aggregation output

Slide 62

Slide 62 text

POST /blog/_search { "query": { "match": { "title": "varnish" } }, "aggs": { "popular_bloggers": { "terms": { "field": "author", "size": 10 }, "aggs": { "used_languages": { "terms": { "field": "language", "size": 10 } } } } } } Nested multi-group by alongside query

Slide 63

Slide 63 text

}, "aggregations": { "popular_bloggers": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "Romy", "doc_count": 6, "used_languages": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "en", "doc_count": 4 }, { "key": "nl", "doc_count": 2 } ] } }, { "key": "Combell", "doc_count": 5, "used_languages": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "nl", "doc_count": 4 }, { "key": "en", "doc_count": 1 } ] } }, Aggregation output

Slide 64

Slide 64 text

Avg Aggregation Cardinality Aggregation Extended Stats Aggregation Geo Bounds Aggregation Geo Centroid Aggregation Max Aggregation Min Aggregation Percentiles Aggregation Percentile Ranks Aggregation Scripted Metric Aggregation Stats Aggregation Sum Aggregation Top hits Aggregation Value Count Aggregation Children Aggregation Date Histogram Aggregation Date Range Aggregation Diversified Sampler Aggregation Filter Aggregation Filters Aggregation Geo Distance Aggregation GeoHash grid Aggregation Global Aggregation Histogram Aggregation IP Range Aggregation Missing Aggregation Nested Aggregation Range Aggregation Reverse nested Aggregation Sampler Aggregation Significant Terms Aggregation Terms Aggregation Avg Bucket Aggregation Derivative Aggregation Max Bucket Aggregation Min Bucket Aggregation Sum Bucket Aggregation Stats Bucket Aggregation Extended Stats Bucket Aggregation Percentiles Bucket Aggregation Moving Average Aggregation Cumulative Sum Aggregation Bucket Script Aggregation Bucket Selector Aggregation Serial Differencing Aggregation Matrix Stats

Slide 65

Slide 65 text

Managing Elasticsearch

Slide 66

Slide 66 text

Plenty of ways … for which we don’t have enough time

Slide 67

Slide 67 text

Clustering

Slide 68

Slide 68 text

Single node 2 node cluster 3 node cluster

Slide 69

Slide 69 text

Example config settings node.rack: my-location node.master: true node.data: true http.enabled: true cluster.name: my-cluster node.name: my-node index.number_of_shards: 5 index.number_of_replicas: 1 discovery.zen.minimum_master_nodes: 2

Slide 70

Slide 70 text

GET /_cat

Slide 71

Slide 71 text

GET /_cat =^.^= /_cat/repositories /_cat/fielddata /_cat/fielddata/{fields} /_cat/recovery /_cat/recovery/{index} /_cat/allocation /_cat/pending_tasks /_cat/health /_cat/shards /_cat/shards/{index} /_cat/plugins /_cat/thread_pool /_cat/thread_pool/{thread_pools}/_cat/templates /_cat/master /_cat/tasks /_cat/segments /_cat/segments/{index} /_cat/indices /_cat/indices/{index} /_cat/aliases /_cat/aliases/{alias} /_cat/count /_cat/count/{index} /_cat/nodeattrs /_cat/snapshots/{repository} /_cat/nodes Non-JSON output

Slide 72

Slide 72 text

GET /_cat/shards?v index shard prirep state docs store ip node my-index 2 r STARTED 6 7.2kb 192.168.10.142 node3 my-index 2 p STARTED 6 9.5kb 192.168.10.142 node2 my-index 0 p STARTED 4 7.1kb 192.168.10.142 node3 my-index 0 r STARTED 4 4.8kb 192.168.10.142 node2 my-index 3 r STARTED 5 7.1kb 192.168.10.142 node1 my-index 3 p STARTED 5 7.2kb 192.168.10.142 node3 my-index 1 p STARTED 1 2.4kb 192.168.10.142 node1 my-index 1 r STARTED 1 2.4kb 192.168.10.142 node2 my-index 4 p STARTED 5 9.5kb 192.168.10.142 node1 my-index 4 r STARTED 5 9.4kb 192.168.10.142 node3 5 shards & a single replica by default

Slide 73

Slide 73 text

GET /_cat/health? v&h=cluster,status,node.total,shards,pri,unassign,init cluster status node.total shards pri unassign init mycluster green 3 12 6 0 0 Cluster health

Slide 74

Slide 74 text

The ELK stack

Slide 75

Slide 75 text

No content

Slide 76

Slide 76 text

Logs Parse & ship Store Visualize

Slide 77

Slide 77 text

Beats •File beat •Top beat •Packet beat •Winlog beat

Slide 78

Slide 78 text

Logs Parse Store Visualize Ship

Slide 79

Slide 79 text

No content

Slide 80

Slide 80 text

Integrating Elasticsearch

Slide 81

Slide 81 text

It’s REST, deal with it!

Slide 82

Slide 82 text

Or just use an API PHP Java Perl Python Ruby .NET

Slide 83

Slide 83 text

Try it yourself! http://github.com/ thijsferyn/ elasticsearch_tutorial

Slide 84

Slide 84 text

No content

Slide 85

Slide 85 text

https://blog.feryn.eu https://talks.feryn.eu https://book.feryn.eu https://youtube.com/thijsferyn https://soundcloud.com/thijsferyn https://twitter.com/thijsferyn http://itunes.feryn.eu

Slide 86

Slide 86 text

No content