Slide 1

Slide 1 text

1 Aravind Putrevu Developer | Evangelist @aravindputrevu | aravindputrevu.in elastic.co/community Elasticsearch Search Engine on your server

Slide 2

Slide 2 text

2 2 Agenda Terms 1 Mappings 3 Analyzers and Aggregations 4 Capacity Planning 5 Talking to Elasticsearch 2

Slide 3

Slide 3 text

3 3 Agenda Terms 1 Mappings 3 Analyzers and Aggregations 4 Capacity Planning 5 Talking to Elasticsearch 2

Slide 4

Slide 4 text

4 4 Agenda Terms 1 Mappings 3 Analyzers and Aggregations 4 Capacity Planning 5 Talking to Elasticsearch 2

Slide 5

Slide 5 text

5 5 Agenda Terms 1 Mappings 3 Analyzers and Aggregations 4 Capacity Planning 5 Talking to Elasticsearch 2

Slide 6

Slide 6 text

6 6 Agenda Terms 1 Mappings 3 Analyzers and Aggregations 4 Capacity Planning 5 Talking to Elasticsearch 2

Slide 7

Slide 7 text

7 Elastic Stack No enterprise edition All new versions with 6.2 X-Pack Security Alerting Monitoring Reporting Machine Learning Graph

Slide 8

Slide 8 text

8 Why it is Popular? Speed Scale Relevance

Slide 9

Slide 9 text

9 Terms Node Cluster Index Type Document Shard Replica https://www.elastic.co/guide/en/elasticsearch/reference/current/glossary.html A cluster is a collection of one or more nodes (servers) A node is a single server that is part of your cluster, stores your data, and participates in the cluster’s indexing and search capabilities An index is a collection of documents that have somewhat similar characteristics *Deprecated in 6.0.0* A type used to be a logical category/partition of your index to allow you to store different types of documents in the same index A document is a basic unit of information that can be indexed. This document is expressed in JSON (JavaScript Object Notation) which is a ubiquitous internet data interchange format. Elasticsearch provides the ability to subdivide your index into multiple pieces called shards To this end, Elasticsearch allows you to make one or more copies of your index’s shards into what are called replica shards, or replicas for short

Slide 10

Slide 10 text

10 All product names, logos, and brands are property of their respective owners and are used only for identification purposes. This is not an endorsement. Elasticsearch Node Types Elasticsearch X-Pack Master (3) Ingest (X) Machine Learning (2+) Data – Warm (X) Coordinating (X) Data – Hot (X) • Master Nodes – Control the cluster, requires a minimum of 3, one is active at any given time • Data Nodes – Hold indexed data and perform data related operations – Differentiated Hot and Warm Data nodes can be used • Coordinating Nodes – Route requests, handle search reduce phase, distribute bulk indexing – All nodes function as coordinating nodes • Ingest Nodes – Use ingest pipelines to transform and enrich before indexing • Machine Learning Nodes – Run machine learning jobs Nodes can play one or more roles, for workload isolation and scaling

Slide 11

Slide 11 text

11 What powers Elasticsearch? https://www.elastic.co/blog/found-elasticsearch-top-down ● A Java library ● Great for full-text search But ● Challenging to use ● Not designed for scale

Slide 12

Slide 12 text

12 Talking to Elasticsearch https://www.elastic.co/guide/en/elasticsearch/client/index.html

Slide 13

Slide 13 text

13 Indexing a document https://www.elastic.co/blog/how-many-shards-should-i-have-in-my-elasticsearch-cluster

Slide 14

Slide 14 text

14 Inserting data _bulk

Slide 15

Slide 15 text

15 Where will my data go? The default value used for _routing is the document’s _id. https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-routing-field.html 0 < shard < number_of_primary_shards - 1

Slide 16

Slide 16 text

16 Mappings

Slide 17

Slide 17 text

17 Full Text Analysis Inverted Index

Slide 18

Slide 18 text

18 Analyzer Helps in converting text into tokens for better search capability Character filters 1 2 3 Tokenizer Token Filters

Slide 19

Slide 19 text

19 Aggregations ● Metrics ● Bucket ● Pipeline ● and so on...

Slide 20

Slide 20 text

20 Querying Data ● Full Text Queries ● Term Level Queries ● Compound Queries ● Geo Queries

Slide 21

Slide 21 text

21 Query DSL Match Query

Slide 22

Slide 22 text

22 Query DSL Term Queries

Slide 23

Slide 23 text

23 Query DSL Nested queries

Slide 24

Slide 24 text

24 Query DSL Geo queries

Slide 25

Slide 25 text

25 Beats Log Files Metrics Wire Data Datastore Web APIs Social Sensors Kafka Redis Messaging Queue ES-Hadoop Elasticsearch Kibana Master Nodes (3) Ingest Nodes (X) Data Nodes – Hot (X) Data Notes – Warm (X) Instances (X) your{beat} X-Pack X-Pack Logstash Nodes (X) Custom UI LDAP Authentication AD Notification SSO Hadoop Ecosystem

Slide 26

Slide 26 text

26 Capacity Planning It depends... https://www.elastic.co/elasticon/conf/2016/sf/quantitative-cluster-sizing

Slide 27

Slide 27 text

27 Capacity Planning What is your use case? ● Full text search ● Logging/Metrics ● Complex Aggregations with lot of users Each use case needs a different cluster configuration. https://www.elastic.co/elasticon/conf/2016/sf/quantitative-cluster-sizing

Slide 28

Slide 28 text

28 Capacity Planning Let us take Logging.. ● Inflow of data per day ○ Per day : 10GB ○ Per Month : 300GB ○ Per Year: 3600GB ● Data Retention ○ 15 days ● High Availability (Replication factor) ○ 1 i.e., 7200GB Per Year ● Type of Queries Master Node : X Data Node : X https://www.elastic.co/elasticon/conf/2016/sf/quantitative-cluster-sizing

Slide 29

Slide 29 text

29 Capacity Planning Hardware Recommendations ● SSD’s are the best ● Local Disk is king! ● Prefer Medium size machine’s over Large size machine’s ● Only 50% of your RAM to Elasticsearch ● Don’t Cross 32GB Java Heap Space https://www.elastic.co/elasticon/conf/2016/sf/quantitative-cluster-sizing

Slide 30

Slide 30 text

30 Beats Log Files Metrics Wire Data Datastore Web APIs Social Sensors Kafka Redis Messaging Queue Logstash ES-Hadoop Elasticsearch Kibana Nodes (X) Master Nodes (3) Ingest Nodes (X) Data Nodes – Hot (X) Data Notes – Warm (X) Instances (X) your{beat} X-Pack X-Pack Custom UI LDAP Authentication AD Notification SSO Hadoop Ecosystem https://www.elastic.co/blog/hot-warm-architecture-in-elasticsearch-5-x

Slide 31

Slide 31 text

31 training.elastic.co

Slide 32

Slide 32 text

Resources • https://www.elastic.co/learn • https://www.elastic.co/blog/category/engineering • https://discuss.elastic.co/ • https://fb.com/groups/ElasticIndiaUserGroup • https://elastic.co/community 32

Slide 33

Slide 33 text

33 Fin! discuss.elastic.co | [email protected] | @aravindputrevu