Slide 1

Slide 1 text

Elasticsearch in action By Thijs Feryn

Slide 2

Slide 2 text

Explain in 1 slide

Slide 3

Slide 3 text

•Full-text search engine •NoSQL database •Analytics engine •Written in Java •Lucene based ( ~Solr) •Inverted indices •Easy to scale (~Elastic) •RESTFul interface (HTTP/JSON) •Schemaless •Real-time •ELK stack

Slide 4

Slide 4 text

Still with me?

Slide 5

Slide 5 text

Hi, I’m Thijs

Slide 6

Slide 6 text

I’m @ThijsFeryn on Twitter

Slide 7

Slide 7 text

I’m an Evangelist At

Slide 8

Slide 8 text

I’m a at board member

Slide 9

Slide 9 text

150 This is my 150th presentation

Slide 10

Slide 10 text

https://www.elastic.co/ downloads/elasticsearch

Slide 11

Slide 11 text

{ "name" : "node-1", "cluster_name" : "elasticsearch", "version" : { "number" : "2.2.0", "build_hash" : "8ff36d139e16f8720f2947ef62c8167a888992fe", "build_timestamp" : "2016-01-27T13:32:39Z", "build_snapshot" : false, "lucene_version" : "5.4.1" }, "tagline" : "You Know, for Search" } http://localhost: 9200

Slide 12

Slide 12 text

RDBMS Elasticsearch Database Table Row Index Type Document

Slide 13

Slide 13 text

POST /blog {"acknowledged":true} Confirmation

Slide 14

Slide 14 text

POST/blog/post/6160 { "language": "en-US", "title": "WordPress 4.4 is available! And these are the new features…", "date": "Tue, 15 Dec 2015 13:28:23 +0000", "author": "Romy", "category": [ "News", "PHP", "Sector news", "Webdesign & development", "CMS", "content management system", "wordpress", "WordPress 4.4" ], "guid": "6160" }

Slide 15

Slide 15 text

{ "_index": "blog", "_type": "post", "_id": "6160", "_version": 1, "created": true } Confirmation

Slide 16

Slide 16 text

GET /blog/post/6160 { "_index": "blog", "_type": "post", "_id": "6160", "_version": 1, "found": true, "_source": { "language": "en-US", "title": "WordPress 4.4 is available! And these are the new features…", "date": "Tue, 15 Dec 2015 13:28:23 +0000", "author": "Romy", "category": [ "News", "PHP", "Sector news", "Webdesign & development", "CMS", "content management system", "wordpress", "WordPress 4.4" ], "guid": "6160" } } Retrieve document by id Document & meta data

Slide 17

Slide 17 text

GET /blog/_mapping { "blog": { "mappings": { "post": { "properties": { "author": { "type": "string" }, "category": { "type": "string" }, "date": { "type": "string" }, "guid": { "type": "string" }, "language": { "type": "string" }, "title": { "type": "string" } } } } } } Schemaless? Not really … “Guesses” mapping on insert

Slide 18

Slide 18 text

Explicit mapping

Slide 19

Slide 19 text

POST /blog { "mappings" : { "post" : { "properties": { "title" : { "type" : "string" }, "date" : { "type" : "date", "format": "E, dd MMM YYYY HH:mm:ss Z" }, "author": { "type": "string" }, "category": { "type": "string" }, "guid": { "type": "integer" } } } } } Explicit mapping at index creation time

Slide 20

Slide 20 text

POST /blog { "mappings": { "post": { "properties": { "author": { "type": "string", "index": "not_analyzed" }, "category": { "type": "string", "index": "not_analyzed" }, "date": { "type": "date", "format": "E, dd MMM YYYY HH:mm:ss Z" }, "guid": { "type": "integer" }, "language": { "type": "string", "index": "not_analyzed" }, "title": { "type": "string", "fields": { "en": { "type": "string", "analyzer": "english" }, "nl": { "type": "string", "analyzer": "dutch" }, "raw": { "type": "string", "index": "not_analyzed" } } } } } } } Alternative mapping

Slide 21

Slide 21 text

"type": "integer" }, "language": { "type": "string", "index": "not_analyzed" }, "title": { "type": "string", "fields": { "en": { "type": "string", "analyzer": "english" }, "nl": { "type": "string", "analyzer": "dutch" }, "raw": { "type": "string", "index": "not_analyzed" } } } } } } } What’s with the analyzers?

Slide 22

Slide 22 text

Analyzed vs non-analyzed

Slide 23

Slide 23 text

Full-text vs exact value

Slide 24

Slide 24 text

By default strings are analyzed … unless you mention it in the mapping

Slide 25

Slide 25 text

Analyzer •Character filters •Tokenizers •Token filters Replaces characters for analyzed text Break text down into terms Add/modify/ delete tokens

Slide 26

Slide 26 text

Built-in analyzers •Standard •Simple •Whitespace •Stop •Keyword •Pattern •Language •Snowball •Custom Standard tokenizer Lowercase token filter English stop word token filter

Slide 27

Slide 27 text

Hey man, how are you doing? hey man how are you doing Standard Hey man, how are you doing? Whitespace hei man how you do English

Slide 28

Slide 28 text

POST /blog/post/_search { "fields": ["title"], "query": { "match": { "title": "working" } } }

Slide 29

Slide 29 text

"total": 1, "max_score": 1.7562683, "hits": [ { "_index": "blog", "_type": "post", "_id": "2742", "_score": 1.7562683, "fields": { "title": [ "Hosted SharePoint 2010: working efficiently as a team" ] } } ] } }

Slide 30

Slide 30 text

POST /blog/post/_search { "fields": ["title"], "query": { "match": { "title.en": "working" } } }

Slide 31

Slide 31 text

"failed": 0 }, "hits": { "total": 6, "max_score": 2.4509864, "hits": [ { "_index": "blog", "_type": "post", "_id": "828", "_score": 2.4509864, "fields": { "title": [ "Still a lot of work in store" ] } }, { "_index": "blog", "_type": "post", "_id": "3873", "_score": 2.144613, "fields": { "title": [ "SSL: what is it and how does it work?" ] } }, { "_index": "blog",

Slide 32

Slide 32 text

Search

Slide 33

Slide 33 text

GET /blog/post/_search?pretty { "took": 2, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 963, "max_score": 1, "hits": [ { "_index": "blog", "_type": "post", "_id": "6067", "_score": 1, "_source": { "language": "en-US", "title": "My Combell Power Tips: Registrant Templates and new domain name overview", "date": "Tue, 24 Nov 2015 15:58:48 +0000", "author": "Romy", "category": [ "Combell news", "Domain names", "News", "Tools", "control panel", "domain name", "my combell", "register", "templates" ], "guid": "6067"

Slide 34

Slide 34 text

GET /blog/post/_search?pretty POST /blog/post/_search?pretty { "query": { "match_all": {} } } Search “lite” vs full query DSL

Slide 35

Slide 35 text

GET /blog/post/_search?pretty&q=title:Thijs POST /products/product/_search?pretty { "query": { "match": { "title": "Thijs" } } } Search “lite” vs full query DSL

Slide 36

Slide 36 text

POST /blog/post/_count { "query": { "match": { "title": "PROXY protocol support in Varnish" } } } 162 posts 1 post POST /blog/post/_count { "query": { "filtered": { "filter": { "term": { "title.raw": "PROXY protocol support in Varnish" } } } } }

Slide 37

Slide 37 text

Filter vs Query

Slide 38

Slide 38 text

Filter •Does it match? Yes or no •When relevance doesn’t matter •Faster & cacheable •For non-analyzed data Query •How well does it match? •For full-text search •On analyzed/tokenized data

Slide 39

Slide 39 text

Match Query Multi Match Query Bool Query Boosting Query Common Terms Query Constant Score Query Dis Max Query Filtered Query Fuzzy Like This Query Fuzzy Like This Field Query Function Score Query Fuzzy Query GeoShape Query Has Child Query Has Parent Query Ids Query Indices Query Match All Query More Like This Query Nested Query Prefix Query Query String Query Simple Query String Query Range Query Regexp Query Span First Query Span Multi Term Query Span Near Query Span Not Query Span Or Query Span Term Query Term Query Terms Query Top Children Query Wildcard Query Minimum Should Match Multi Term Query Rewrite Template Query

Slide 40

Slide 40 text

And Filter Bool Filter Exists Filter Geo Bounding Box Filter Geo Distance Filter Geo Distance Range Filter Geo Polygon Filter GeoShape Filter Geohash Cell Filter Has Child Filter Has Parent Filter Ids Filter Indices Filter Limit Filter Match All Filter Missing Filter Nested Filter Not Filter Or Filter Prefix Filter Query Filter Range Filter Regexp Filter Script Filter Term Filter Terms Filter Type Filter

Slide 41

Slide 41 text

Filter examples

Slide 42

Slide 42 text

POST /blog/post/_search?pretty { "query": { "filtered": { "filter": { "ids": { "values": [231,234,258] } } } } }

Slide 43

Slide 43 text

POST /blog/_search { "query": { "filtered": { "filter": { "bool": { "must" : [ { "term" : { "language" : "en-US" } }, { "range" : { "date" : { "gte" : "2016-01-01", "format" : "yyyy-MM-dd" } } } ], "must_not" : [ { "term" : { "category" : "joomla" } } ], "should" : [ { "term" : { "category" : "Hosting" } }, { "term" : { "category" : "evangelist" } } ] } } } } }

Slide 44

Slide 44 text

POST /blog/_search?pretty { "query": { "filtered": { "filter": { "prefix": { "title.raw": "Combell" } } } } }

Slide 45

Slide 45 text

POST /cities/city/_search { "size": 200, "sort": [ { "city": { "order": "asc" } } ], "query": { "filtered": { "filter": { "geo_distance_range": { "lt": "5km", "location": { "lat": 51.033333, "lon": 2.866667 } } } } } } Requires “geo point” typed field

Slide 46

Slide 46 text

POST /cities/city/_search { "size": 200, "query": { "filtered": { "query": { "match_all": {} }, "filter": { "geo_bounding_box": { "location": { "bottom_left": { "lat": 51.1, "lon": 2.6 }, "top_right": { "lat": 51.2, "lon": 2.7 } } } } } } } Requires “geo point” typed field Draw a “box”

Slide 47

Slide 47 text

Relevance

Slide 48

Slide 48 text

POST /blog/_search { "fields": ["title"], "query": { "bool": { "must": [ { "match": { "title": "varnish thijs" } }, { "filtered": { "filter": { "term": { "language": "en-US" } } } } ] } } }

Slide 49

Slide 49 text

{ "took": 3, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 8, "max_score": 1.984594, "hits": [ { "_index": "blog", "_type": "post", "_id": "4275", "_score": 1.984594, "fields": { "title": [ "Thijs Feryn gave a demo of Varnish Cache on WordPress during a Future Insights webinar" ] } }, { "_index": "blog", "_type": "post", "_id": "6238", "_score": 0.8335616, "fields": { "title": [ "PROXY protocol support in Varnish" ] } }, { "_index": "blog", Hits both terms. More relevant

Slide 50

Slide 50 text

POST /blog/_search?_source=false { "query": { "filtered": { "filter": { "term": { "category": "PHPBenelux" } } } } } Using a filter instead of a query We don’t care about the source

Slide 51

Slide 51 text

{ "took": 3, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 2, "max_score": 1, "hits": [ { "_index": "blog", "_type": "post", "_id": "6254", "_score": 1 }, { "_index": "blog", "_type": "post", "_id": "11749", "_score": 1 } ] } } No relevance on filters Score is always 1

Slide 52

Slide 52 text

POST /blog/_search { "fields": ["title", "category"], "query": { "bool": { "must": [ { "match": { "title": "thijs feryn" } } ], "should": [ { "match": { "category": "Varnish" } } ] } } } Only search for “thijs feryn” Increase relevance if category contains “Varnish”

Slide 53

Slide 53 text

POST /blog/_search { "fields": ["title", "category"], "query": { "bool": { "must_not": [ { "filtered": { "filter": { "term": { "author": "Romy" } } } } ], "should": [ { "match": { "category": "Magento" } } ] } } } Increase relevance Combining filters & queries

Slide 54

Slide 54 text

POST /blog/_search { "query": { "bool": { "should": [ { "match": { "title": { "query": "Magento", "boost" : 3 } } }, { "match": { "title": { "query": "Wordpress", "boost" : 2 } } } ] } } } Increase relevance Query- time boosting

Slide 55

Slide 55 text

Multi index multi type

Slide 56

Slide 56 text

/_search /products/_search /products/product/_search /products,clients/_search /pro*/_search /pro*,cli*/_search /products/product,invoice/_search /products/pro*/_search /_all/product/_search /_all/product,invoice/_search /_all/pro*/_search

Slide 57

Slide 57 text

Multi “all the things”

Slide 58

Slide 58 text

Aggregations

Slide 59

Slide 59 text

Group by on steroids

Slide 60

Slide 60 text

SELECT author, COUNT(guid) FROM blog.post GROUP BY author Aggregations in SQL Metric Bucket

Slide 61

Slide 61 text

SELECT author, COUNT(guid) FROM blog.post GROUP BY author POST /blog/post/_search? pretty&search_type=count { "aggs": { "popular_bloggers": { "terms": { "field": "author" } } } } Only aggs, no docs

Slide 62

Slide 62 text

"aggregations": { "popular_bloggers": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "Romy", "doc_count": 415 }, { "key": "Combell", "doc_count": 184 }, { "key": "Tom", "doc_count": 184 }, { "key": "Jimmy Cappaert", "doc_count": 157 }, { "key": "Christophe", "doc_count": 23 } ] } } Aggregation output

Slide 63

Slide 63 text

POST /blog/_search { "query": { "match": { "title": "varnish" } }, "aggs": { "popular_bloggers": { "terms": { "field": "author", "size": 10 }, "aggs": { "used_languages": { "terms": { "field": "language", "size": 10 } } } } } } Nested multi-group by alongside query

Slide 64

Slide 64 text

"aggregations": { "popular_bloggers": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "Romy", "doc_count": 4, "used_languages": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "en-US", "doc_count": 3 }, { "key": "nl-NL", "doc_count": 1 } ] } }, { "key": "Combell", "doc_count": 3, "used_languages": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "nl-NL", "doc_count": 3 } ] } }, Aggregation output

Slide 65

Slide 65 text

Min Aggregation Max Aggregation Sum Aggregation Avg Aggregation Stats Aggregation Extended Stats Aggregation Value Count Aggregation Percentiles Aggregation Percentile Ranks Aggregation Cardinality Aggregation Geo Bounds Aggregation Top hits Aggregation Scripted Metric Aggregation Global Aggregation Filter Aggregation Filters Aggregation Missing Aggregation Nested Aggregation Reverse nested Aggregation Children Aggregation Terms Aggregation Significant Terms Aggregation Range Aggregation Date Range Aggregation IPv4 Range Aggregation Histogram Aggregation Date Histogram Aggregation Geo Distance Aggregation GeoHash grid Aggregation

Slide 66

Slide 66 text

Managing Elasticsearch

Slide 67

Slide 67 text

Plenty of ways … for which we don’t have enough time

Slide 68

Slide 68 text

Clustering

Slide 69

Slide 69 text

Single node 2 node cluster 3 node cluster

Slide 70

Slide 70 text

Example config settings node.rack: my-location node.master: true node.data: true http.enabled: true cluster.name: my-cluster node.name: my-node index.number_of_shards: 5 index.number_of_replicas: 1 discovery.zen.minimum_master_nodes: 2

Slide 71

Slide 71 text

GET /_cat

Slide 72

Slide 72 text

GET /_cat =^.^= /_cat/allocation /_cat/shards /_cat/shards/{index} /_cat/master /_cat/nodes /_cat/indices /_cat/indices/{index} /_cat/segments /_cat/segments/{index} /_cat/count /_cat/count/{index} /_cat/recovery /_cat/recovery/{index} /_cat/health /_cat/pending_tasks /_cat/aliases /_cat/aliases/{alias} /_cat/thread_pool /_cat/plugins /_cat/fielddata /_cat/fielddata/{fields} Non-JSON output

Slide 73

Slide 73 text

GET /_cat/shards?v index shard prirep state docs store ip node my-index 2 r STARTED 6 7.2kb 192.168.10.142 node3 my-index 2 p STARTED 6 9.5kb 192.168.10.142 node2 my-index 0 p STARTED 4 7.1kb 192.168.10.142 node3 my-index 0 r STARTED 4 4.8kb 192.168.10.142 node2 my-index 3 r STARTED 5 7.1kb 192.168.10.142 node1 my-index 3 p STARTED 5 7.2kb 192.168.10.142 node3 my-index 1 p STARTED 1 2.4kb 192.168.10.142 node1 my-index 1 r STARTED 1 2.4kb 192.168.10.142 node2 my-index 4 p STARTED 5 9.5kb 192.168.10.142 node1 my-index 4 r STARTED 5 9.4kb 192.168.10.142 node3 5 shards & a single replica by default

Slide 74

Slide 74 text

GET /_cat/health? v&h=cluster,status,node.total,shards,pri,unassign,init cluster status node.total shards pri unassign init mycluster green 3 12 6 0 0 Cluster health

Slide 75

Slide 75 text

The ELK stack

Slide 76

Slide 76 text

No content

Slide 77

Slide 77 text

Logs Parse & ship Store Visualize

Slide 78

Slide 78 text

Beats •File beat •Top beat •Packet beat •Winlog beat

Slide 79

Slide 79 text

Logs Parse Store Visualize Ship

Slide 80

Slide 80 text

No content

Slide 81

Slide 81 text

Integrating Elasticsearch

Slide 82

Slide 82 text

It’s REST, deal with it!

Slide 83

Slide 83 text

Or just use an API PHP Java Perl Python Ruby .NET

Slide 84

Slide 84 text

Try it yourself! http://github.com/ thijsferyn/ elasticsearch_tutorial

Slide 85

Slide 85 text

No content

Slide 86

Slide 86 text

https://blog.feryn.eu https://talks.feryn.eu https://youtube.com/thijsferyn https://soundcloud.com/thijsferyn https://twitter.com/thijsferyn http://itunes.feryn.eu

Slide 87

Slide 87 text

No content