Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Introduction to Elasticsearch & the ELK Stack

Introduction to Elasticsearch & the ELK Stack

Presented by Leslie Hawthorn at the Rio de Janeiro Elasticsearch Meetup

Introduction to Elasticsearch Inc and the ELK stack, including a crash course in Elasticsearch's features and Logstash + Kibana for centralized logging and simple, beautiful data visualizations. Overview of major features for each product and a teaser for Leslie Hawthorn's address at the FISL 15 conference, fisl.softwarelivre.org

Elasticsearch Inc

May 05, 2014
Tweet

More Decks by Elasticsearch Inc

Other Decks in Technology

Transcript

  1. Copyright Elasticsearch Inc 2014. All rights reserved. Elasticsearch Making sense

    of your data through search and analytics.
  2. Copyright Elasticsearch Inc 2014. All rights reserved. Agenda for Tonight

    •  What is Elasticsearch? – Leslie Hawthorn, Elasticsearch Inc –  Company and technology •  Introduction to the ELK Stack – Leslie •  FISL 15 Talk Preview: Beyond Contributing to Software Livre – Leslie •  Elasticsearch Use Case: Globo.com – Luiz Guilherme Pais dos Santos, Software Engineer •  Introduction to Analyzers – Pablo Musa, EmergiNet
  3. Copyright Elasticsearch Inc 2014. All rights reserved. Elasticsearch: Who we

    are. Our mission is to help businesses get actionable insights from their data.
  4. Copyright Elasticsearch Inc 2014. All rights reserved. Where we are

    today. •  Dual HQ: Amsterdam / Los Altos, CA •  Three Open Source Software Projects: Elasticsearch, Logstash, and Kibana (the ELK stack) •  6+ million total downloads •  Supporting hundreds of organizations across the globe •  Offer Training and Support Subscriptions through our expanding partner ecosystem
  5. Copyright Elasticsearch Inc 2014. All rights reserved. Every day more

    data is created User Generated Machine Generated •  Tweets •  Blogs •  Books •  Notes •  Emails •  Comments •  Logs •  Events •  Statistics •  Alerts •  Metrics Structured  and Unstructured Data
  6. Copyright Elasticsearch Inc 2014. All rights reserved. Massive effort in

    cleaning the data. Takes weeks to analyze. Insights come too late. Where is the industry today? “I’m a data janitor.” Sr Dir of Data Science
  7. Copyright Elasticsearch Inc 2014. All rights reserved. Search is critical

    •  Search as Navigation •  Search Driven Design •  Search as Structure •  Keeping users Engaged •  Leverage existing assets: –  RDBMS, Data warehouse, SharePoint, Intranet, Documents –  NoSQL, Hadoop, Data Grids
  8. Copyright Elasticsearch Inc 2014. All rights reserved. Realizing the value

    of Elasticsearch Free Text Search Code Search Geo Location ecommerce Content Management Media Classification Social
  9. Copyright Elasticsearch Inc 2014. All rights reserved. Data From Any

    Source Instantly Search & Analyze Actionable Insights The ELK Stack Logstash Elasticsearch Kibana
  10. Copyright Elasticsearch Inc 2014. All rights reserved. Even more products

    •  Elasticsearch Marvel – Kibana based monitoring platform for Elasticsearch clusters •  Elasticsearch for Apache Hadoop – Interface to Hadoop to make it easier to extract insights from your “data lake”
  11. Copyright Elasticsearch Inc 2014. All rights reserved. Introduction to the

    ELK Stack
  12. Copyright Elasticsearch Inc 2014. All rights reserved. Elasticsearch in 10

    seconds •  Schema-free, REST & JSON based document store •  Distributed and horizontally scalable •  Open Source: Apache License 2.0 •  Zero configuration •  Written in Java, extensible
  13. Copyright Elasticsearch Inc 2014. All rights reserved. Unstructured search Copyright

    Elasticsearch Inc 2014. All rights reserved.
  14. Copyright Elasticsearch Inc 2014. All rights reserved. Aggregation to find

    languages Copyright Elasticsearch Inc 2014. All rights reserved.
  15. Copyright Elasticsearch Inc 2014. All rights reserved. Structured search Copyright

    Elasticsearch Inc 2014. All rights reserved.
  16. Copyright Elasticsearch Inc 2014. All rights reserved. Enrichment Copyright Elasticsearch

    Inc 2014. All rights reserved.
  17. Copyright Elasticsearch Inc 2014. All rights reserved. Sorting Copyright Elasticsearch

    Inc 2014. All rights reserved.
  18. Copyright Elasticsearch Inc 2014. All rights reserved. Sorting Copyright Elasticsearch

    Inc 2014. All rights reserved.
  19. Copyright Elasticsearch Inc 2014. All rights reserved. Pagination Copyright Elasticsearch

    Inc 2014. All rights reserved.
  20. Copyright Elasticsearch Inc 2014. All rights reserved. Suggestions Copyright Elasticsearch

    Inc 2014. All rights reserved.
  21. Copyright Elasticsearch Inc 2014. All rights reserved. Basic terms • 

    Index –  Logical collection of data; might be time based –  Analogous to a database •  Sharding –  Split logical data over several machines –  Write scalability –  Control data flows •  Replication –  Read scalability –  Removing SPOF
  22. Copyright Elasticsearch Inc 2014. All rights reserved. Elasticsearch Use Case

    Overview: Product Catalog Analytics
  23. Copyright Elasticsearch Inc 2014. All rights reserved. Analytics •  Aggregation

    of information •  Facets are one dimensional –  Categories/brands/material of all results of this query •  Questions are multidimensional –  Average revenue per product category per day •  Elasticsearch 1.0 has aggregations –  Nested facets –  Basic stats (mean, min/max, std dev, term counts) –  Significant terms, Percentiles, Cardinality estimations
  24. Copyright Elasticsearch Inc 2014. All rights reserved. Create knowledge from

    data •  Orders –  How many orders were created every day in the last month? –  How many orders were created per state per day in the last month? •  Revenue –  What is the average revenue per shopping cart? –  What is the average shopping cart size per order per hour? •  Tip: bring together multiple sources of data, especially time-series data, then compare
  25. Copyright Elasticsearch Inc 2014. All rights reserved. Ecosystem •  Plugins

    –  Many third party plugins available •  Clients for many languages –  Ruby, python, php, perl, javascript, (.NET coming) –  Scala, clojure, go •  Kibana •  Logstash •  Hadoop integration
  26. Copyright Elasticsearch Inc 2014. All rights reserved. Monitor your cluster

    with Marvel •  Point in time views are a start •  Marvel shows historical trends •  Visualize cluster behavior, act before problems •  Free for development, $500/year for up to 5 nodes
  27. Copyright Elasticsearch Inc 2014. All rights reserved. Overview  

  28. Copyright Elasticsearch Inc 2014. All rights reserved. Log analysis with

    Logstash and Kibana
  29. Copyright Elasticsearch Inc 2014. All rights reserved. Logstash in 10

    seconds •  Managing events and logs •  Collect, parse, enrich, store data •  Modular: many, many inputs and outputs •  Apache License 2.0 •  Ruby app (JRuby) •  Part of Elasticsearch family
  30. Copyright Elasticsearch Inc 2014. All rights reserved. What is a

    log? •  Time-based data •  This data is everywhere! –  Server logs –  Twitter stream –  Financial transactions –  Metric / monitoring data –  ... •  Log all things
  31. Copyright Elasticsearch Inc 2014. All rights reserved. Why collect &

    centralize logs? •  Access log files without system access •  Shell scripting: Too limited or slow •  Using unique ids for errors, aggregate it across your stack •  Reporting (everyone can create his/her own report) •  Tip: Unify your data to make it easily searchable
  32. Copyright Elasticsearch Inc 2014. All rights reserved. Logstash architecture Logstash

    Input   Output   Filter   ?   ?   collect and split alter and enrich store and visualize
  33. Copyright Elasticsearch Inc 2014. All rights reserved. Inputs •  Monitoring:

    collectd, graphite, ganglia, snmptrap, zenoss •  Datastores: elasticsearch, redis, sqlite, s3 •  Queues: rabbitmq, zeromq •  Logging: eventlog, lumberjack, gelf, log4j, relp, syslog, varnish log •  Platforms: drupal_dblog, gemfire, heroku, sqs, s3, twitter •  Local: exec, generator, file, stdin, pipe, unix •  Protocol: imap, irc, stomp, tcp, udp, websocket, wmi, xmpp
  34. Copyright Elasticsearch Inc 2014. All rights reserved. Filters •  alter,

    anonymize, checksum, csv, drop, multiline •  dns, date, extractnumbers, geoip, i18n, kv, noop, ruby, range •  json, urldecode, useragent •  metrics, sleep •  … many, many more …
  35. Copyright Elasticsearch Inc 2014. All rights reserved. Outputs •  Store:

    elasticsearch, gemfire, mongodb, redis, riak, rabbitmq •  Monitoring: ganglia, graphite, graphtastic, nagios, opentsdb, statsd, zabbix •  Notification: email, hipchat, irc, pagerduty, sns •  Protocol: gelf, http, lumberjack, metriccatcher, stomp, tcp, udp, websocket, xmpp •  External Monitoring: boundary, circonus, cloudwatch, datadog, librato •  External service: google big query, google cloud storage, jira, loggly, riemann, s3, sqs, syslog, zeromq •  Local: csv, exec, file, pipe, stdout, null
  36. Copyright Elasticsearch Inc 2014. All rights reserved. Deploying ELK for

    scale Shipper Logstash Store/Search Visualize
  37. Copyright Elasticsearch Inc 2014. All rights reserved. Visualize with Kibana

    Copyright Elasticsearch Inc 2014. All rights reserved.
  38. Copyright Elasticsearch Inc 2014. All rights reserved. Kibana in 10

    seconds •  Visualize data in Elasticsearch •  See real-time updates to the data •  Build custom charts and dashboards •  Apache License 2.0 •  Runs in browser (Chrome, FF, IE, Safari) •  Part of Elasticsearch family
  39. Copyright Elasticsearch Inc 2014. All rights reserved. Heartbleed •  Following

    Twitter #Heartbleed in real time •  Using the ELK stack •  Thanks to Om •  http://palakonda.org/author/web4by2/ Copyright Elasticsearch Inc 2014. All rights reserved.
  40. Copyright Elasticsearch Inc 2014. All rights reserved. Use the Twitter

    input From http://palakonda.org/2014/04/15/real-time-stats-for-heartbleed-on-twitter-with-logstash-elasticsearch-and-kibana/ INPUT {! TWITTER {! CONSUMER_KEY => ""! CONSUMER_SECRET => ""! KEYWORDS => ["#HEARTBLEED","HEARTBLEED","HEARTBLEED.COM"]! OAUTH_TOKEN => ""! OAUTH_TOKEN_SECRET => ""! TAGS => ["#HEARTBLEED"]! TYPE => "HEARTBLEED"! }! }! Copyright Elasticsearch Inc 2014. All rights reserved.
  41. Copyright Elasticsearch Inc 2014. All rights reserved. Heartbleed - mentions

    From http://palakonda.org/2014/04/15/real-time-stats-for-heartbleed-on-twitter-with-logstash-elasticsearch-and-kibana/ Copyright Elasticsearch Inc 2014. All rights reserved.
  42. Copyright Elasticsearch Inc 2014. All rights reserved. Heartbleed - with

    OpenSSL From http://palakonda.org/2014/04/15/real-time-stats-for-heartbleed-on-twitter-with-logstash-elasticsearch-and-kibana/ Copyright Elasticsearch Inc 2014. All rights reserved.
  43. Copyright Elasticsearch Inc 2014. All rights reserved. Kibana Copyright Elasticsearch

    Inc 2014. All rights reserved.
  44. Copyright Elasticsearch Inc 2014. All rights reserved. Kibana Copyright Elasticsearch

    Inc 2014. All rights reserved.
  45. Copyright Elasticsearch Inc 2014. All rights reserved. Useful helpers • 

    Curator: index management –  http://www.elasticsearch.org/blog/curator-tending-your-time-series- indices/ •  Puppet module –  https://github.com/elasticsearch/puppet-logstash •  Logstash forwarder: low overhead collector –  https://github.com/elasticsearch/logstash-forwarder •  Logstash cookbook –  http://cookbook.logstash.net/
  46. Copyright Elasticsearch Inc 2014. All rights reserved. More info • 

    Github: https://github.com/elasticsearch –  Code, issues there •  Mailing lists –  Google groups, logstash-users and elasticsearch •  IRC channels –  #logstash and #elasticsearch on freenode
  47. Copyright Elasticsearch Inc 2014. All rights reserved. FISL 15 Talk

    Preview Community 2.0: Beyond Using Software Livre
  48. Copyright Elasticsearch Inc 2014. All rights reserved. Software Livre in

    Brasil is Huge •  FISL is the world’s largest open source software conference •  Government regulations on use of software livre have been model for other nations •  Few know what is happening in Brasil
  49. Copyright Elasticsearch Inc 2014. All rights reserved. Why is there

    a disconnect? •  Language barriers •  Emphasis in Brasil on usage vs. contribution – 2007 survey by Brasilian Ministry of Science and Technology found that only 14% of respondents were creating software livre – Only 33% of respondents made their source code available publicly
  50. Copyright Elasticsearch Inc 2014. All rights reserved. Why does it

    matter? •  Software livre as economic engine – Crowd sourced platform for innovation in R&D – Distributed project work environment prepares workers for life in a distributed team – Show the world Brasil is a global player in software development •  Sharing is fundamental to human nature & our success as learners
  51. Copyright Elasticsearch Inc 2014. All rights reserved. Ways to Contribute

    – Code, Code, Code •  Release works under a livre license – Source code developed for your own use – Code derived from other livre works – “giving back” •  Testing – Write test suites – Perform QA testing •  This can be as simple as filing a bug report
  52. Copyright Elasticsearch Inc 2014. All rights reserved. Ways to Contribute

    – Beyond Code •  Translation – Software documentation, How To’s & tutorials •  Software localization, a.k.a. l10n •  Writing new articles in BR-PT to help others Fun fact: there are 12 volunteers on the BR- PT translations team for Mozilla’s Developer Network
  53. Copyright Elasticsearch Inc 2014. All rights reserved. More Ways to

    Contribute •  Installfests •  Events to teach people to use software livre •  Giving talks about livre software tools, development methodologies and ideologies
  54. Copyright Elasticsearch Inc 2014. All rights reserved. Advocacy Proudly use

    software livre, and ask others to do the same
  55. Copyright Elasticsearch Inc 2014. All rights reserved. Questions then over

    to Luiz & Pablo …. Elasticsearch Use Case