Slide 1

Slide 1 text

‹#› Pablo Musa October 2016, @pablitomusa Elastic Revolution

Slide 2

Slide 2 text

Pablo Musa • Engineer PUC-Rio • Master PUC-Rio • Backend Developer • Software Architect • Infra Lover • 2 years Hadoop DevOps • 3 years Elastic Guru 2

Slide 3

Slide 3 text

3 Education Engineer • User enablement • Content creation • Travel the world (16 months) • 4 continents • 16 countries • 160+ classes • 2400+ enabled users • 4 new trainings

Slide 4

Slide 4 text

Which?? What?? When?? How?? 4 Elasticsearch Kibana Logstash Cloud Beats Prelert 2.3.1 4.5.2 2.2.0 1.2.3 ??? 1.2.3

Slide 5

Slide 5 text

5 2010 2012 2013 2014 2015 2016 First version of Elasticsearch
 released in February

Slide 6

Slide 6 text

6 2010 2012 2013 2014 2015 2016 Elasticsearch becomes a company Total cumulative downloads 2M

Slide 7

Slide 7 text

2010 Kibana and Logstash open source projects join Total cumulative downloads 5M 2012 2013 2014 2015 2016 7

Slide 8

Slide 8 text

2010 1.0 GA Elasticsearch Total cumulative downloads 18M 2012 2013 2014 2015 2016 8

Slide 9

Slide 9 text

2010 1st Elastic{ON} user conference we are now Elastic Cloud acquired Beats team joins Total cumulative downloads 45M 2012 2013 2014 2015 2016 9

Slide 10

Slide 10 text

2010 2nd Elastic{ON} user conference ELK “Elastic Stack” Prelert acquired Total cumulative downloads 75M 2012 2013 2014 2015 2016 10

Slide 11

Slide 11 text

11 Kibana Elasticsearch Beats Logstash Security Alerting Monitoring Reporting X-Pack Graph Elastic Cloud

Slide 12

Slide 12 text

5.0 is here. All new versions. All aligned.

Slide 13

Slide 13 text

13

Slide 14

Slide 14 text

14 It doesn't make sense to hire smart people and then tell them what to do; we hire smart people so they can tell us what to do. Steve Jobs

Slide 15

Slide 15 text

15 TRUST "I don't want to monitor people or know where anyone is on a Tuesday at 2 PM."

Slide 16

Slide 16 text

16 We Are Everywhere

Slide 17

Slide 17 text

17 WE DON'T SELL SHIT

Slide 18

Slide 18 text

18 Usability

Slide 19

Slide 19 text

19 Everything should be made as simple as possible, but not simpler. Albert Einstein

Slide 20

Slide 20 text

20 Community

Slide 21

Slide 21 text

‹#› Pioneer Program https://www.elastic.co/blog/ elastic-pioneer-program

Slide 22

Slide 22 text

22 Different Real World Problems

Slide 23

Slide 23 text

We Love It All 23

Slide 24

Slide 24 text

24 Without data you are just another person with an opinion. William Edwards Deming

Slide 25

Slide 25 text

25

Slide 26

Slide 26 text

"Gotta Catch 'Em All" 26

Slide 27

Slide 27 text

"Gotta Catch 'Em All" Cluster my_cluster 27 Server 1 Node A d1 d2 d3 d4 d5 d6 d7 d8 d9 d10 d11 d12 Index twitter d6 d3 d2 d5 d1 d4 Index logs

Slide 28

Slide 28 text

Split Data Cluster my_cluster 28 Server 1 Node A d6 d3 d2 d5 d1 d4 Index logs d1 d2 d3 d4 d5 d6 d7 d8 d9 d10 d11 d12 Index twitter Shard 0

Slide 29

Slide 29 text

Cluster my_cluster 29 Server 2 Node B twitter shard 1 Server 1 Node A d1 d2 d6 d5 d10 d12 twitter shard 3 twitter shard 4 d6 d3 d1 logs shard 0 d2 d5 d4 logs shard 1 d3 d4 d9 d7 d8 d11 twitter shard 2 twitter shard 0

Slide 30

Slide 30 text

Distribute the Load Cluster my_cluster 30 Server 1 Node A Server 2 Node B Server 3 Node C Server 4 Node D

Slide 31

Slide 31 text

2 Shards for Logs and Metrics Cluster my_cluster 31 Server 1 Node A Server 2 Node B Server 3 Node C Server 4 Node D NOT
 OPTIMAL

Slide 32

Slide 32 text

4 Shards for Logs and Metrics Cluster my_cluster 32 Server 1 Node A Server 2 Node B Server 3 Node C Server 4 Node D BETTER But what about shard size?

Slide 33

Slide 33 text

Math Time! • ~1000 events per second • 60 sec * 60 min * 24 hours * 1000 events => ~87M events per day • 1kb per event => ~82GB per day • 4 shards => ~20.5GB per shard • https://www.elastic.co/elasticon/conf/2016/sf/quantitative-cluster-sizing • For my use case, each shard should handle 45GB 33 4 shards per day is NOT OPTIMAL

Slide 34

Slide 34 text

Deadlock! Cluster my_cluster 34 Server 1 Node A Server 2 Node B Server 3 Node C Server 4 Node D Optimize Throughput Optimize Data Storage Which one is better?

Slide 35

Slide 35 text

35

Slide 36

Slide 36 text

First, Maximize Throughput Cluster my_cluster 36 Server 1 Node A Server 2 Node B Server 3 Node C Server 4 Node D Create a daily index with one shard per node.

Slide 37

Slide 37 text

Then, Maximize Storage Cluster my_cluster 37 Server 1 Node A Server 2 Node B Server 3 Node C Server 4 Node D Shrink the index to the optimal number of shards.

Slide 38

Slide 38 text

Then, Maximize Storage Cluster my_cluster 38 Server 1 Node A Server 2 Node B Server 3 Node C Server 4 Node D Shrink the index to the optimal number of shards.

Slide 39

Slide 39 text

Goals and Mechanisms • Goals • Achieve high ingest rates • Don't waste resources • Mechanisms • Daily Indices • Templates • Alias • Rollover • Shrink 39

Slide 40

Slide 40 text

Daily Indices 40 Cluster my_cluster d6 d3 d2 d5 d1 d4 logs-2016-10-19

Slide 41

Slide 41 text

Daily Indices 41 Cluster my_cluster d6 d3 d2 d5 d1 d4 logs-2016-10-19 d6 d3 d2 d5 d1 d4 logs-2016-10-20

Slide 42

Slide 42 text

Daily Indices 42 Cluster my_cluster d6 d3 d2 d5 d1 d4 logs-2016-10-19 d6 d3 d2 d5 d1 d4 logs-2016-10-21 d6 d3 d2 d5 d1 d4 logs-2016-10-20

Slide 43

Slide 43 text

Templates Every new index starting with 'logs-' will have 4 shards and '_all' disabled 43 PUT _template/logs { "template": "logs-*", "settings": { "number_of_shards": 4 } "mappings": { "_default_": { "_all": { "enabled": false } } } }

Slide 44

Slide 44 text

Alias 44 Cluster my_cluster users Application d6 d3 d2 d5 d1 d4 logs-2016-10-19 logs-write logs-read

Slide 45

Slide 45 text

Alias 45 Cluster my_cluster users Application d6 d3 d2 d5 d1 d4 logs-2016-10-19 d6 d3 d2 d5 d1 d4 logs-2016-10-20 logs-write logs-read

Slide 46

Slide 46 text

Alias 46 Cluster my_cluster users Application d6 d3 d2 d5 d1 d4 logs-2016-10-19 d6 d3 d2 d5 d1 d4 logs-2016-10-21 d6 d3 d2 d5 d1 d4 logs-2016-10-20 logs-write logs-read

Slide 47

Slide 47 text

Templates Alias can also be defined in templates. 47 PUT _template/logs { "template": "logs-*", "settings": { "number_of_shards": 4 } "mappings": { ... } "aliases" : { "logs-write": {}, "logs-read": {} } } * you still need to remove "write" alias from previous index

Slide 48

Slide 48 text

Let's Scale... 48 Cluster my_cluster Server 1 Node A Server 2 Node B Server 3 Node C Server 4 Node D Application logs-write logs-read logs-write: logs-read:

Slide 49

Slide 49 text

Let's Scale... 49 Cluster my_cluster Server 1 Node A Server 2 Node B Server 3 Node C Server 4 Node D Application logs-write logs-read logs-write: logs-read:

Slide 50

Slide 50 text

Let's Scale... 50 Cluster my_cluster Server 1 Node A Server 2 Node B Server 3 Node C Server 4 Node D Application logs-write logs-read logs-write: logs-read:

Slide 51

Slide 51 text

Let's Scale... 51 Cluster my_cluster Server 1 Node A Server 2 Node B Server 3 Node C Server 4 Node D Application logs-write logs-read logs-write: logs-read:

Slide 52

Slide 52 text

Let's Scale... 52 Cluster my_cluster Server 1 Node A Server 2 Node B Server 3 Node C Server 4 Node D Application logs-write logs-read logs-write: logs-read:

Slide 53

Slide 53 text

Let's Scale... 53 Cluster my_cluster Server 1 Node A Server 2 Node B Server 3 Node C Server 4 Node D Application logs-write logs-read logs-write: logs-read:

Slide 54

Slide 54 text

Let's Scale... 54 Cluster my_cluster Server 1 Node A Server 2 Node B Server 3 Node C Server 4 Node D Application logs-write logs-read logs-write: logs-read:

Slide 55

Slide 55 text

What is not here • JSON • the only way to use Elasticsearch • too verbose for presentations and you can always go back to the docs • Replicas • high availability • diagrams would be even worse • Hot/Warm/Cold Architecture • allow you to use the most of your hardware • diagrams would be even worse 55

Slide 56

Slide 56 text

Where to go • Blog Post: • https://www.elastic.co/blog/managing-time-based-indices-efficiently • Shrink Docs: • https://www.elastic.co/guide/en/elasticsearch/reference/master/indices-shrink- index.html • Rollover Docs: • https://www.elastic.co/guide/en/elasticsearch/reference/master/indices-rollover- index.html 56

Slide 57

Slide 57 text

Elastic Revolution Principles • Smart people should be free to be smart • Trust • Distributed • We don't sell shit • Usability and Simplicity • Different Real World Problems • Community 57

Slide 58

Slide 58 text

We our community ‒ https://www.elastic.co (Website) ‒ https://www.elastic.co/learn (Learning Resources) ‒ https://www.elastic.co/community/meetups (Meetups) ‒ https://discuss.elastic.co (Discussion Forum) ‒ [email protected] (Lista em Português) 58

Slide 59

Slide 59 text

59 March 7-9, 2017 Pier 48 San Francisco, CA 2,500 attendees Annual Elasticsearch User Conference SUBMIT A TALK: Call for Speakers Open SUBMIT A CAUSE: First Cause Awards https://www.elastic.co/elasticon/conf/2017/sf/registration Thank You! Questions?