Slide 1

Slide 1 text

Copyright Elasticsearch Inc 2014. All rights reserved. Elasticsearch Making sense of your data through search and analytics.

Slide 2

Slide 2 text

Copyright Elasticsearch Inc 2014. All rights reserved. Agenda for Tonight •  What is Elasticsearch? – Leslie Hawthorn, Elasticsearch Inc –  Company and technology •  Introduction to the ELK Stack – Leslie •  FISL 15 Talk Preview: Beyond Contributing to Software Livre – Leslie •  Elasticsearch Use Case: Globo.com – Luiz Guilherme Pais dos Santos, Software Engineer •  Introduction to Analyzers – Pablo Musa, EmergiNet

Slide 3

Slide 3 text

Copyright Elasticsearch Inc 2014. All rights reserved. Elasticsearch: Who we are. Our mission is to help businesses get actionable insights from their data.

Slide 4

Slide 4 text

Copyright Elasticsearch Inc 2014. All rights reserved. Where we are today. •  Dual HQ: Amsterdam / Los Altos, CA •  Three Open Source Software Projects: Elasticsearch, Logstash, and Kibana (the ELK stack) •  6+ million total downloads •  Supporting hundreds of organizations across the globe •  Offer Training and Support Subscriptions through our expanding partner ecosystem

Slide 5

Slide 5 text

Copyright Elasticsearch Inc 2014. All rights reserved. Every day more data is created User Generated Machine Generated •  Tweets •  Blogs •  Books •  Notes •  Emails •  Comments •  Logs •  Events •  Statistics •  Alerts •  Metrics Structured  and Unstructured Data

Slide 6

Slide 6 text

Copyright Elasticsearch Inc 2014. All rights reserved. Massive effort in cleaning the data. Takes weeks to analyze. Insights come too late. Where is the industry today? “I’m a data janitor.” Sr Dir of Data Science

Slide 7

Slide 7 text

Copyright Elasticsearch Inc 2014. All rights reserved. Search is critical •  Search as Navigation •  Search Driven Design •  Search as Structure •  Keeping users Engaged •  Leverage existing assets: –  RDBMS, Data warehouse, SharePoint, Intranet, Documents –  NoSQL, Hadoop, Data Grids

Slide 8

Slide 8 text

Copyright Elasticsearch Inc 2014. All rights reserved. Realizing the value of Elasticsearch Free Text Search Code Search Geo Location ecommerce Content Management Media Classification Social

Slide 9

Slide 9 text

Copyright Elasticsearch Inc 2014. All rights reserved. Data From Any Source Instantly Search & Analyze Actionable Insights The ELK Stack Logstash Elasticsearch Kibana

Slide 10

Slide 10 text

Copyright Elasticsearch Inc 2014. All rights reserved. Even more products •  Elasticsearch Marvel – Kibana based monitoring platform for Elasticsearch clusters •  Elasticsearch for Apache Hadoop – Interface to Hadoop to make it easier to extract insights from your “data lake”

Slide 11

Slide 11 text

Copyright Elasticsearch Inc 2014. All rights reserved. Introduction to the ELK Stack

Slide 12

Slide 12 text

Copyright Elasticsearch Inc 2014. All rights reserved. Elasticsearch in 10 seconds •  Schema-free, REST & JSON based document store •  Distributed and horizontally scalable •  Open Source: Apache License 2.0 •  Zero configuration •  Written in Java, extensible

Slide 13

Slide 13 text

Copyright Elasticsearch Inc 2014. All rights reserved. Unstructured search Copyright Elasticsearch Inc 2014. All rights reserved.

Slide 14

Slide 14 text

Copyright Elasticsearch Inc 2014. All rights reserved. Aggregation to find languages Copyright Elasticsearch Inc 2014. All rights reserved.

Slide 15

Slide 15 text

Copyright Elasticsearch Inc 2014. All rights reserved. Structured search Copyright Elasticsearch Inc 2014. All rights reserved.

Slide 16

Slide 16 text

Copyright Elasticsearch Inc 2014. All rights reserved. Enrichment Copyright Elasticsearch Inc 2014. All rights reserved.

Slide 17

Slide 17 text

Copyright Elasticsearch Inc 2014. All rights reserved. Sorting Copyright Elasticsearch Inc 2014. All rights reserved.

Slide 18

Slide 18 text

Copyright Elasticsearch Inc 2014. All rights reserved. Sorting Copyright Elasticsearch Inc 2014. All rights reserved.

Slide 19

Slide 19 text

Copyright Elasticsearch Inc 2014. All rights reserved. Pagination Copyright Elasticsearch Inc 2014. All rights reserved.

Slide 20

Slide 20 text

Copyright Elasticsearch Inc 2014. All rights reserved. Suggestions Copyright Elasticsearch Inc 2014. All rights reserved.

Slide 21

Slide 21 text

Copyright Elasticsearch Inc 2014. All rights reserved. Basic terms •  Index –  Logical collection of data; might be time based –  Analogous to a database •  Sharding –  Split logical data over several machines –  Write scalability –  Control data flows •  Replication –  Read scalability –  Removing SPOF

Slide 22

Slide 22 text

Copyright Elasticsearch Inc 2014. All rights reserved. Elasticsearch Use Case Overview: Product Catalog Analytics

Slide 23

Slide 23 text

Copyright Elasticsearch Inc 2014. All rights reserved. Analytics •  Aggregation of information •  Facets are one dimensional –  Categories/brands/material of all results of this query •  Questions are multidimensional –  Average revenue per product category per day •  Elasticsearch 1.0 has aggregations –  Nested facets –  Basic stats (mean, min/max, std dev, term counts) –  Significant terms, Percentiles, Cardinality estimations

Slide 24

Slide 24 text

Copyright Elasticsearch Inc 2014. All rights reserved. Create knowledge from data •  Orders –  How many orders were created every day in the last month? –  How many orders were created per state per day in the last month? •  Revenue –  What is the average revenue per shopping cart? –  What is the average shopping cart size per order per hour? •  Tip: bring together multiple sources of data, especially time-series data, then compare

Slide 25

Slide 25 text

Copyright Elasticsearch Inc 2014. All rights reserved. Ecosystem •  Plugins –  Many third party plugins available •  Clients for many languages –  Ruby, python, php, perl, javascript, (.NET coming) –  Scala, clojure, go •  Kibana •  Logstash •  Hadoop integration

Slide 26

Slide 26 text

Copyright Elasticsearch Inc 2014. All rights reserved. Monitor your cluster with Marvel •  Point in time views are a start •  Marvel shows historical trends •  Visualize cluster behavior, act before problems •  Free for development, $500/year for up to 5 nodes

Slide 27

Slide 27 text

Copyright Elasticsearch Inc 2014. All rights reserved. Overview  

Slide 28

Slide 28 text

Copyright Elasticsearch Inc 2014. All rights reserved. Log analysis with Logstash and Kibana

Slide 29

Slide 29 text

Copyright Elasticsearch Inc 2014. All rights reserved. Logstash in 10 seconds •  Managing events and logs •  Collect, parse, enrich, store data •  Modular: many, many inputs and outputs •  Apache License 2.0 •  Ruby app (JRuby) •  Part of Elasticsearch family

Slide 30

Slide 30 text

Copyright Elasticsearch Inc 2014. All rights reserved. What is a log? •  Time-based data •  This data is everywhere! –  Server logs –  Twitter stream –  Financial transactions –  Metric / monitoring data –  ... •  Log all things

Slide 31

Slide 31 text

Copyright Elasticsearch Inc 2014. All rights reserved. Why collect & centralize logs? •  Access log files without system access •  Shell scripting: Too limited or slow •  Using unique ids for errors, aggregate it across your stack •  Reporting (everyone can create his/her own report) •  Tip: Unify your data to make it easily searchable

Slide 32

Slide 32 text

Copyright Elasticsearch Inc 2014. All rights reserved. Logstash architecture Logstash Input   Output   Filter   ?   ?   collect and split alter and enrich store and visualize

Slide 33

Slide 33 text

Copyright Elasticsearch Inc 2014. All rights reserved. Inputs •  Monitoring: collectd, graphite, ganglia, snmptrap, zenoss •  Datastores: elasticsearch, redis, sqlite, s3 •  Queues: rabbitmq, zeromq •  Logging: eventlog, lumberjack, gelf, log4j, relp, syslog, varnish log •  Platforms: drupal_dblog, gemfire, heroku, sqs, s3, twitter •  Local: exec, generator, file, stdin, pipe, unix •  Protocol: imap, irc, stomp, tcp, udp, websocket, wmi, xmpp

Slide 34

Slide 34 text

Copyright Elasticsearch Inc 2014. All rights reserved. Filters •  alter, anonymize, checksum, csv, drop, multiline •  dns, date, extractnumbers, geoip, i18n, kv, noop, ruby, range •  json, urldecode, useragent •  metrics, sleep •  … many, many more …

Slide 35

Slide 35 text

Copyright Elasticsearch Inc 2014. All rights reserved. Outputs •  Store: elasticsearch, gemfire, mongodb, redis, riak, rabbitmq •  Monitoring: ganglia, graphite, graphtastic, nagios, opentsdb, statsd, zabbix •  Notification: email, hipchat, irc, pagerduty, sns •  Protocol: gelf, http, lumberjack, metriccatcher, stomp, tcp, udp, websocket, xmpp •  External Monitoring: boundary, circonus, cloudwatch, datadog, librato •  External service: google big query, google cloud storage, jira, loggly, riemann, s3, sqs, syslog, zeromq •  Local: csv, exec, file, pipe, stdout, null

Slide 36

Slide 36 text

Copyright Elasticsearch Inc 2014. All rights reserved. Deploying ELK for scale Shipper Logstash Store/Search Visualize

Slide 37

Slide 37 text

Copyright Elasticsearch Inc 2014. All rights reserved. Visualize with Kibana Copyright Elasticsearch Inc 2014. All rights reserved.

Slide 38

Slide 38 text

Copyright Elasticsearch Inc 2014. All rights reserved. Kibana in 10 seconds •  Visualize data in Elasticsearch •  See real-time updates to the data •  Build custom charts and dashboards •  Apache License 2.0 •  Runs in browser (Chrome, FF, IE, Safari) •  Part of Elasticsearch family

Slide 39

Slide 39 text

Copyright Elasticsearch Inc 2014. All rights reserved. Heartbleed •  Following Twitter #Heartbleed in real time •  Using the ELK stack •  Thanks to Om •  http://palakonda.org/author/web4by2/ Copyright Elasticsearch Inc 2014. All rights reserved.

Slide 40

Slide 40 text

Copyright Elasticsearch Inc 2014. All rights reserved. Use the Twitter input From http://palakonda.org/2014/04/15/real-time-stats-for-heartbleed-on-twitter-with-logstash-elasticsearch-and-kibana/ INPUT {! TWITTER {! CONSUMER_KEY => ""! CONSUMER_SECRET => ""! KEYWORDS => ["#HEARTBLEED","HEARTBLEED","HEARTBLEED.COM"]! OAUTH_TOKEN => ""! OAUTH_TOKEN_SECRET => ""! TAGS => ["#HEARTBLEED"]! TYPE => "HEARTBLEED"! }! }! Copyright Elasticsearch Inc 2014. All rights reserved.

Slide 41

Slide 41 text

Copyright Elasticsearch Inc 2014. All rights reserved. Heartbleed - mentions From http://palakonda.org/2014/04/15/real-time-stats-for-heartbleed-on-twitter-with-logstash-elasticsearch-and-kibana/ Copyright Elasticsearch Inc 2014. All rights reserved.

Slide 42

Slide 42 text

Copyright Elasticsearch Inc 2014. All rights reserved. Heartbleed - with OpenSSL From http://palakonda.org/2014/04/15/real-time-stats-for-heartbleed-on-twitter-with-logstash-elasticsearch-and-kibana/ Copyright Elasticsearch Inc 2014. All rights reserved.

Slide 43

Slide 43 text

Copyright Elasticsearch Inc 2014. All rights reserved. Kibana Copyright Elasticsearch Inc 2014. All rights reserved.

Slide 44

Slide 44 text

Copyright Elasticsearch Inc 2014. All rights reserved. Kibana Copyright Elasticsearch Inc 2014. All rights reserved.

Slide 45

Slide 45 text

Copyright Elasticsearch Inc 2014. All rights reserved. Useful helpers •  Curator: index management –  http://www.elasticsearch.org/blog/curator-tending-your-time-series- indices/ •  Puppet module –  https://github.com/elasticsearch/puppet-logstash •  Logstash forwarder: low overhead collector –  https://github.com/elasticsearch/logstash-forwarder •  Logstash cookbook –  http://cookbook.logstash.net/

Slide 46

Slide 46 text

Copyright Elasticsearch Inc 2014. All rights reserved. More info •  Github: https://github.com/elasticsearch –  Code, issues there •  Mailing lists –  Google groups, logstash-users and elasticsearch •  IRC channels –  #logstash and #elasticsearch on freenode

Slide 47

Slide 47 text

Copyright Elasticsearch Inc 2014. All rights reserved. FISL 15 Talk Preview Community 2.0: Beyond Using Software Livre

Slide 48

Slide 48 text

Copyright Elasticsearch Inc 2014. All rights reserved. Software Livre in Brasil is Huge •  FISL is the world’s largest open source software conference •  Government regulations on use of software livre have been model for other nations •  Few know what is happening in Brasil

Slide 49

Slide 49 text

Copyright Elasticsearch Inc 2014. All rights reserved. Why is there a disconnect? •  Language barriers •  Emphasis in Brasil on usage vs. contribution – 2007 survey by Brasilian Ministry of Science and Technology found that only 14% of respondents were creating software livre – Only 33% of respondents made their source code available publicly

Slide 50

Slide 50 text

Copyright Elasticsearch Inc 2014. All rights reserved. Why does it matter? •  Software livre as economic engine – Crowd sourced platform for innovation in R&D – Distributed project work environment prepares workers for life in a distributed team – Show the world Brasil is a global player in software development •  Sharing is fundamental to human nature & our success as learners

Slide 51

Slide 51 text

Copyright Elasticsearch Inc 2014. All rights reserved. Ways to Contribute – Code, Code, Code •  Release works under a livre license – Source code developed for your own use – Code derived from other livre works – “giving back” •  Testing – Write test suites – Perform QA testing •  This can be as simple as filing a bug report

Slide 52

Slide 52 text

Copyright Elasticsearch Inc 2014. All rights reserved. Ways to Contribute – Beyond Code •  Translation – Software documentation, How To’s & tutorials •  Software localization, a.k.a. l10n •  Writing new articles in BR-PT to help others Fun fact: there are 12 volunteers on the BR- PT translations team for Mozilla’s Developer Network

Slide 53

Slide 53 text

Copyright Elasticsearch Inc 2014. All rights reserved. More Ways to Contribute •  Installfests •  Events to teach people to use software livre •  Giving talks about livre software tools, development methodologies and ideologies

Slide 54

Slide 54 text

Copyright Elasticsearch Inc 2014. All rights reserved. Advocacy Proudly use software livre, and ask others to do the same

Slide 55

Slide 55 text

Copyright Elasticsearch Inc 2014. All rights reserved. Questions then over to Luiz & Pablo …. Elasticsearch Use Case