Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Elastic Revolution

Pablo Musa
October 29, 2016

Elastic Revolution

A bit about the Elastic story and why Elastic is transforming the software industry.

Pablo Musa

October 29, 2016
Tweet

More Decks by Pablo Musa

Other Decks in Programming

Transcript

  1. Pablo Musa • Engineer PUC-Rio • Master PUC-Rio • Backend

    Developer • Software Architect • Infra Lover • 2 years Hadoop DevOps • 3 years Elastic Guru 2
  2. 3 Education Engineer • User enablement • Content creation •

    Travel the world (16 months) • 4 continents • 16 countries • 160+ classes • 2400+ enabled users • 4 new trainings
  3. 5 2010 2012 2013 2014 2015 2016 First version of

    Elasticsearch
 released in February
  4. 6 2010 2012 2013 2014 2015 2016 Elasticsearch becomes a

    company Total cumulative downloads 2M
  5. 2010 1st Elastic{ON} user conference we are now Elastic Cloud

    acquired Beats team joins Total cumulative downloads 45M 2012 2013 2014 2015 2016 9
  6. 2010 2nd Elastic{ON} user conference ELK “Elastic Stack” Prelert acquired

    Total cumulative downloads 75M 2012 2013 2014 2015 2016 10
  7. 13

  8. 14 It doesn't make sense to hire smart people and

    then tell them what to do; we hire smart people so they can tell us what to do. Steve Jobs
  9. 15 TRUST "I don't want to monitor people or know

    where anyone is on a Tuesday at 2 PM."
  10. 24 Without data you are just another person with an

    opinion. William Edwards Deming
  11. 25

  12. "Gotta Catch 'Em All" Cluster my_cluster 27 Server 1 Node

    A d1 d2 d3 d4 d5 d6 d7 d8 d9 d10 d11 d12 Index twitter d6 d3 d2 d5 d1 d4 Index logs
  13. Split Data Cluster my_cluster 28 Server 1 Node A d6

    d3 d2 d5 d1 d4 Index logs d1 d2 d3 d4 d5 d6 d7 d8 d9 d10 d11 d12 Index twitter Shard 0
  14. Cluster my_cluster 29 Server 2 Node B twitter shard 1

    Server 1 Node A d1 d2 d6 d5 d10 d12 twitter shard 3 twitter shard 4 d6 d3 d1 logs shard 0 d2 d5 d4 logs shard 1 d3 d4 d9 d7 d8 d11 twitter shard 2 twitter shard 0
  15. Distribute the Load Cluster my_cluster 30 Server 1 Node A

    Server 2 Node B Server 3 Node C Server 4 Node D
  16. 2 Shards for Logs and Metrics Cluster my_cluster 31 Server

    1 Node A Server 2 Node B Server 3 Node C Server 4 Node D NOT
 OPTIMAL
  17. 4 Shards for Logs and Metrics Cluster my_cluster 32 Server

    1 Node A Server 2 Node B Server 3 Node C Server 4 Node D BETTER But what about shard size?
  18. Math Time! • ~1000 events per second • 60 sec

    * 60 min * 24 hours * 1000 events => ~87M events per day • 1kb per event => ~82GB per day • 4 shards => ~20.5GB per shard • https://www.elastic.co/elasticon/conf/2016/sf/quantitative-cluster-sizing • For my use case, each shard should handle 45GB 33 4 shards per day is NOT OPTIMAL
  19. Deadlock! Cluster my_cluster 34 Server 1 Node A Server 2

    Node B Server 3 Node C Server 4 Node D Optimize Throughput Optimize Data Storage Which one is better?
  20. 35

  21. First, Maximize Throughput Cluster my_cluster 36 Server 1 Node A

    Server 2 Node B Server 3 Node C Server 4 Node D Create a daily index with one shard per node.
  22. Then, Maximize Storage Cluster my_cluster 37 Server 1 Node A

    Server 2 Node B Server 3 Node C Server 4 Node D Shrink the index to the optimal number of shards.
  23. Then, Maximize Storage Cluster my_cluster 38 Server 1 Node A

    Server 2 Node B Server 3 Node C Server 4 Node D Shrink the index to the optimal number of shards.
  24. Goals and Mechanisms • Goals • Achieve high ingest rates

    • Don't waste resources • Mechanisms • Daily Indices • Templates • Alias • Rollover • Shrink 39
  25. Daily Indices 41 Cluster my_cluster d6 d3 d2 d5 d1

    d4 logs-2016-10-19 d6 d3 d2 d5 d1 d4 logs-2016-10-20
  26. Daily Indices 42 Cluster my_cluster d6 d3 d2 d5 d1

    d4 logs-2016-10-19 d6 d3 d2 d5 d1 d4 logs-2016-10-21 d6 d3 d2 d5 d1 d4 logs-2016-10-20
  27. Templates Every new index starting with 'logs-' will have 4

    shards and '_all' disabled 43 PUT _template/logs { "template": "logs-*", "settings": { "number_of_shards": 4 } "mappings": { "_default_": { "_all": { "enabled": false } } } }
  28. Alias 44 Cluster my_cluster users Application d6 d3 d2 d5

    d1 d4 logs-2016-10-19 logs-write logs-read
  29. Alias 45 Cluster my_cluster users Application d6 d3 d2 d5

    d1 d4 logs-2016-10-19 d6 d3 d2 d5 d1 d4 logs-2016-10-20 logs-write logs-read
  30. Alias 46 Cluster my_cluster users Application d6 d3 d2 d5

    d1 d4 logs-2016-10-19 d6 d3 d2 d5 d1 d4 logs-2016-10-21 d6 d3 d2 d5 d1 d4 logs-2016-10-20 logs-write logs-read
  31. Templates Alias can also be defined in templates. 47 PUT

    _template/logs { "template": "logs-*", "settings": { "number_of_shards": 4 } "mappings": { ... } "aliases" : { "logs-write": {}, "logs-read": {} } } * you still need to remove "write" alias from previous index
  32. Let's Scale... 48 Cluster my_cluster Server 1 Node A Server

    2 Node B Server 3 Node C Server 4 Node D Application logs-write logs-read logs-write: logs-read:
  33. Let's Scale... 49 Cluster my_cluster Server 1 Node A Server

    2 Node B Server 3 Node C Server 4 Node D Application logs-write logs-read logs-write: logs-read:
  34. Let's Scale... 50 Cluster my_cluster Server 1 Node A Server

    2 Node B Server 3 Node C Server 4 Node D Application logs-write logs-read logs-write: logs-read:
  35. Let's Scale... 51 Cluster my_cluster Server 1 Node A Server

    2 Node B Server 3 Node C Server 4 Node D Application logs-write logs-read logs-write: logs-read:
  36. Let's Scale... 52 Cluster my_cluster Server 1 Node A Server

    2 Node B Server 3 Node C Server 4 Node D Application logs-write logs-read logs-write: logs-read:
  37. Let's Scale... 53 Cluster my_cluster Server 1 Node A Server

    2 Node B Server 3 Node C Server 4 Node D Application logs-write logs-read logs-write: logs-read:
  38. Let's Scale... 54 Cluster my_cluster Server 1 Node A Server

    2 Node B Server 3 Node C Server 4 Node D Application logs-write logs-read logs-write: logs-read:
  39. What is not here • JSON • the only way

    to use Elasticsearch • too verbose for presentations and you can always go back to the docs • Replicas • high availability • diagrams would be even worse • Hot/Warm/Cold Architecture • allow you to use the most of your hardware • diagrams would be even worse 55
  40. Where to go • Blog Post: • https://www.elastic.co/blog/managing-time-based-indices-efficiently • Shrink

    Docs: • https://www.elastic.co/guide/en/elasticsearch/reference/master/indices-shrink- index.html • Rollover Docs: • https://www.elastic.co/guide/en/elasticsearch/reference/master/indices-rollover- index.html 56
  41. Elastic Revolution Principles • Smart people should be free to

    be smart • Trust • Distributed • We don't sell shit • Usability and Simplicity • Different Real World Problems • Community 57
  42. We our community ‒ https://www.elastic.co (Website) ‒ https://www.elastic.co/learn (Learning Resources)

    ‒ https://www.elastic.co/community/meetups (Meetups) ‒ https://discuss.elastic.co (Discussion Forum) ‒ [email protected] (Lista em Português) 58
  43. 59 March 7-9, 2017 Pier 48 San Francisco, CA 2,500

    attendees Annual Elasticsearch User Conference SUBMIT A TALK: Call for Speakers Open SUBMIT A CAUSE: First Cause Awards https://www.elastic.co/elasticon/conf/2017/sf/registration Thank You! Questions?