Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Going Beyond the Needle in a Haystack: Elasticsearch and the ELK Stack

Going Beyond the Needle in a Haystack: Elasticsearch and the ELK Stack

The canonical challenge of finding a needle in a haystack is comparatively easy to finding out if there are other needles in the haystack, what their average length is, and where they tend to group together. With the combined power of Elasticsearch, Logstash, and Kibana, the three massively popular open source projects that make up the ELK stack, this kind of data exploration is possible.

Elasticsearch Inc

February 18, 2015
Tweet

More Decks by Elasticsearch Inc

Other Decks in Technology

Transcript

  1. Kurt Hurtado (@kurtado) Tal Levy (@talevy) Elasticsearch, Inc. Going Beyond

    the Needle in a Haystack: Elasticsearch and the ELK Stack
  2. We’ve clearly hit a nerve 13M+ cumulative downloads Million of

    Downloads 0 2 4 6 8 Oct'12 Jan'13 Apr'13 Jul'13 Oct'13 Jan'14 Apr'14 Nearly 5M downloads in the last year
  3. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited History of the ELK Stack • Logstash was started in 2009 by Jordan Sissel • Elasticsearch was first released in 2010 by Shay Banon • Kibana was begun in 2011 by Rashid Khan • Elasticsearch (the company) was founded in 2012 • Rashid joined Elasticsearch in January, 2013 • Jordan joined Elasticsearch in August, 2013 • Much of the development on all three projects is now done in-house, in addition to open source contributions
  4. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Unstructured search
  5. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Structured search
  6. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Enrichment
  7. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Sorting
  8. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Pagination
  9. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Aggregation
  10. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Suggestions
  11. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited What is Elasticsearch? • Document-oriented search engine • JSON based (both document store and REST API) • Built on top of Apache Lucene • Schema Free / Schema-Less • Yet enables control of schema when needed (via mappings) • Distributed Model • Scales Up+Out, Highly Available • Multi-tenant data • Dynamically create/delete indices • API centric & RESTful • Most functionality + cluster statistics are exposed via API
  12. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited Basic glossary • Maecenas aliquam maecenas ligula nostra, accumsan taciti. Sociis mauris in integer • El eu libero cras interdum at eget habitasse elementum est, ipsum purus pede • Aliquet sed. Lorem ipsum dolor sit amet, ligula suspendisse nulla pretium, rhoncus cluster A cluster consists of one or more nodes which share the same cluster name. Each cluster has a single master node which is chosen automatically by the cluster and which can be replaced automatically if the current master node fails. node A node is a running instance of elasticsearch which belongs to a cluster. Multiple nodes can be started on a single server for testing purposes, but usually you should have one node per server. At startup, a node will use multicast (or unicast, if specified) to discover an existing cluster with the same cluster name and will try to join that cluster.
  13. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited Basic glossary • Maecenas aliquam maecenas ligula nostra, accumsan taciti. Sociis mauris in integer • El eu libero cras interdum at eget habitasse elementum est, ipsum purus pede • Aliquet sed. Lorem ipsum dolor sit amet, ligula suspendisse nulla pretium, rhoncus index An index can be seen as a named collection of documents. It is a logical namespace which maps to one or more primary shards and can have zero or more replica shards. shard A shard is a single Apache Lucene instance. It is a low- level “worker” unit which is managed automatically. Shards are distributed across all nodes in the cluster, and can move shards automatically from one node to another in the case of node failure, or the addition of new nodes. There are two types of shards: primary and replica.
  14. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited Basic glossary • Maecenas aliquam maecenas ligula nostra, accumsan taciti. Sociis mauris in integer • El eu libero cras interdum at eget habitasse elementum est, ipsum purus pede • Aliquet sed. Lorem ipsum dolor sit amet, ligula suspendisse nulla pretium, rhoncus Primary shard An index can have one or more primary shards (defaults to 5) and it is not possible to change this number after index creation. When you index a document, it is first indexed on the primary shard, then on all replicas of this shard. Replica shard Each primary shard can have zero or more replicas (defaults to 1). A replica is a copy of the primary shard, and serves two purposes: ‣ Increase high availability - a replica is another copy of the data and will be promoted to a primary shard if the primary fails ‣ Increase performance - get and search requests can be handled by primary or replica shards
  15. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited Create Index API Creating Index a with 2 shards and 1 replica (a total of 4 shards) Creating Index b with 3 shards and 1 replica (a total of 6 shards) curl -XPUT 'localhost:9200/a' -d '{ "settings" : { "number_of_shards" : 2, "number_of_replicas" : 1 } }' curl -XPUT 'localhost:9200/b' -d '{ "settings" : { "number_of_shards" : 3, "number_of_replicas" : 1 } }'
  16. Copyright Elasticsearch 2014 Copying, publishing and/or distributing without written permission

    is strictly prohibited curl -XPUT ‘localhost:9200/crunchbase/person/1’ -d '{ "first_name" : "Tony", "last_name" : "Stark" }' Index API target index name HTTP REST method document JSON source document type document id
  17. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited • It is possible to retrieve a specific document from the index using its _type and _id • The GET operation is realtime Meaning, once a document is indexed, it is immediately available to be retrieved using the GET API curl -XGET 'localhost:9200/crunchbase/person/1' Get API target index name HTTP REST operation document type document id
  18. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited Exists API • Check if a document is in the index Without the overhead of loading it • The response is based on HTTP status code 200 (OK) if exists 404 (NOT FOUND) if doesn’t exist curl -XHEAD -i 'localhost:9200/crunchbase/person/1'
  19. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited Update API • Update by partial data Partial doc is merged with existing doc
 
 Non-object properties with the same key are replaced. Object properties are recursively merged curl -XPOST 'localhost:9200/crunchbase/person/1/_update' -d '{ "doc" : { "first_name" : "Antonio" } }'
  20. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited • Deleting a specific document by _id • Response 200 (OK) if deleted 404 if not found { "found" : true, "_index" : "test", "_type" : "person", "_id" : "1", "_version" : 3 } Delete API Indication if it was actually found curl -XDELETE 'localhost:9200/crunchbase/person/1'
  21. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited • Sometimes you'd like to get multiple documents in one go Avoid round trips when using the Get API curl 'localhost:9200/_mget' -d '{ "docs" : [ { "_index" : "crunchbase", "_type" : "person", "_id" : "1" }, { "_index" : "marvels", "_type" : "hero", "_id" : "2" "_source" : [ "first_name" ] } ] }' Multi Get API index name document type document id Optionally specify what fields should be returned
  22. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited • Minimizes round trips when performing bulk index/delete/ update operations • The format of a bulk request is as follows { "delete" : { "_index" : "crunchbase", "_type" : "person", "_id" : "2" } }\n { "index" : { "_index" : "crunchbase", "_type" : "person", "_id" : "1" } }\n { "first_name" : "Tony", "last_name" : "Stark" }\n . . . { "create" : { "_index" : "crunchbase", "_type" : "person", "_id" : "3" } }\n { "first_name" : "Thor", "last_name" : "Odinson" }\n Bulk API each line must end with a line break (incl. the last line) optional action body action metadata
  23. Grep! • Which plays contain the word “darling” in the

    complete works of Shakespeare? ‣ Grep it! ‣ Go over each play, word by word, and mark the play that contains it • Linear to the number of words • Fails at large scale
  24. Inverted Index • Inverting Shakespeare ‣ Take all the plays

    and break them down word by word ‣ For each word, store the ids of the documents that contain it ‣ Sort all tokens (words) • Search First look for the relevant word (fast as words are sorted), if found, iterate over the document ids that are associated it
  25. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited. Term Doc 1 Doc 2 Doc 3 breathe brings buds but by can … damasked darling date day deaf death declines delight
  26. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited. Analyzers • Analysis => Tokenization and normalization • Analyzers => Analysis and token filters • Token filters act on the token stream - can drop and modify existing tokens, or add new ones. • Out of the box, many analyzers are available — Standard analyzer, Whitespace analyzer, language analyzers • Can define/build custom analyzers
  27. _analyze API GET /_analyze?analyzer=whitespace&text=FOO BAR { "tokens": [ { "token":

    "FOO", "start_offset": 0, "end_offset": 3, "type": "word", "position": 1 }, { "token": "BAR", "start_offset": 4, "end_offset": 7, "type": "word", "position": 2 } ]
  28. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited Rich Search via Query DSL • Queries • Unstructured search, enables to query the data based on textual analysis (free text search). Queries score documents by relevancy (supports powerful custom scoring algorithms). To name a few: • match ‣ bool (boolean) ‣ histogram ‣ Filters • Structured search, enables narrowing the search context based on known document structure (no scoring and very fast). To name a few: ‣ term ‣ range ‣ bool (boolean)
  29. Querying Powerful and rich Query DSL Queries are analyzed too

    Near real time (from indexing to querying) GET /_search -d '{
 {
 "query": {
 "match": { "tweet": "elasticsearch" }
 } }

  30. Results { "took": 15, "timed_out": false, "_shards": { "total": 5,

    "successful": 5, "failed": 0 }, "hits": { "total": 1, "max_score": 0.30685282, "hits": [ { "_index": "twitter", "_type": "tweets", "_id": "cxxV4_TST_iR2zH1GuedVQ", "_score": 0.30685282, "_source": { “awesome #logstash #kibana #elasticsearch presentation with real life use case demo by @webmat at @devopsmontreal"

  31. Copyright Elasticsearch 2014 Copying, publishing and/or distributing without written permission

    is strictly prohibited Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Suggestions Look familiar?
  32. Analytics Un-invert the inverted index (Field data) Load the field

    data to memory Group By — popular terms, significant terms, ranges, dates, geolocation Metrics — count, min, max, sum, avg, percentiles, cardinality, Nested aggregations helps slice and dice data
  33. Tweets per month GET /_all/tweet/_search -d '
 {
 "aggs": {


    "tweets_by_month": {
 "date_histogram": {
 "field": "date",
 "interval": "month"
 }
 }
 }
 }

  34. Copyright Elasticsearch 2014 Copying, publishing and/or distributing without written permission

    is strictly prohibited Aggregations Use aggregations to build analytics tools & dashboards
  35. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited Aggregations: Buckets & Metrics • Two categories of aggregations - Buckets & Metrics • Buckets Aggregations that build buckets. Each bucket is associated with some criteria over documents. During query execution, each document is evaluated against the created buckets and each bucket keeps track of what documents “fall” in it. Each bucket effectively defines a set of documents derived from the document set within the aggregations scope. • Metrics Aggregations that given a set of documents, produce a single/multiple scalar/s. Typically, metrics aggregations generate numeric stats that are computed over a specific document set
  36. Copyright Elasticsearch 2014 Copying, publishing and/or distributing without written permission

    is strictly prohibited Bucket - terms "aggregations": { "states": { "buckets": [ { "key": "ma", "doc_count": 841 }, { "key": "ca", "doc_count": 631 }, ... "key": "ny", "doc_count": 630 }, { "key": "nj", "doc_count": 560 }, { "key": "wa", "doc_count": 525 } ] } } Response wa 16% nj 18% ny 20% ca 20% ma 26%
  37. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited Bucket - *_range • date_range A dedicated range aggregations that works on date fields. Ranges can be defined as date math expressions • ip_range A dedicated range aggregation that works on ip fields. Ranges can be defined as ipv4 strings or CIDR masks { "from" : "now-1M", "to" : "now" } { "from" : "10.0.0.0", "to" : "10.0.0.128" } { "mask" : "10.0.0.0/25" }
  38. Copyright Elasticsearch 2014 Copying, publishing and/or distributing without written permission

    is strictly prohibited Bucket - histogram "aggregations": { "grades_distribution": { "buckets": [ { "key": 60, "doc_count": 467 }, { "key": 70, "doc_count": 873 }, { "key": 80, "doc_count": 930 }, { "key": 90, "doc_count": 915 } ] } } Response 0 250 500 750 1000 60 70 80 90 915 930 873 467 By default, only non-empty buckets will be returned
  39. Copyright Elasticsearch 2014 Copying, publishing and/or distributing without written permission

    is strictly prohibited Metrics - extended_stats The following computes statistics on student exam scores over a set of document (each representing an exam result) { "aggs" : { "grades" : { "extended_stats" : { "field" : "grade" } } } } "aggregations": { "grades": { "count": 4375, "min": 65, "max": 99, "avg": 82.14765714285714, "sum": 359396, "sum_of_squares": 29970052, "variance": 102.06002593959144, "std_deviation": 10.102476228113158 } } Request Response { "subject": "Mathematics", "state": "CA", "age": 8, "grade": 69, "male": true } Sample document
  40. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited Aggregations • Enables slicing & dicing the data • Provides multi-dimensional grouping of results. e.g. Top URLs by country. • Many types available • All operate over values extracted from the documents - usually from specific fields of the documents, but highly customizable using scripts ‣ terms ‣ range / date_range / ip_range ‣ geo_distance / geohash_grid ‣ histogram / date_histogram ‣ stats / avg / max / min / sum / percentiles
  41. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited Problem 1: No Consistency • Every application and device logs in its own special way. • Expert in each log format required to use the logs. • Difficult to search across because of this formatting problem.
  42. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited No Consistency 120707 0:40:34 4 Connect root@localhost on 4 Query select @@version_comment limit 1 120707 0:40:45 4 Query select * from mysql.user 120707 0:41:18 5 Query hello world
  43. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited No Consistency 120707 0:37:09 [Note] Plugin 'FEDERATED' is disabled. 120707 0:37:09 InnoDB: The InnoDB memory heap is disabled 120707 0:37:09 InnoDB: Mutexes and rw_locks use GCC atomic builtins 120707 0:37:09 InnoDB: Compressed tables use zlib 1.2.5 120707 0:37:09 InnoDB: Using Linux native AIO 120707 0:37:09 InnoDB: Initializing buffer pool, size = 128.0M 120707 0:37:09 InnoDB: Completed initialization of buffer pool
  44. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited No Consistency # User@Host: biz_1[biz_1] @ localhost [] # Query_time: 0.000273 Lock_time: 0.000104 Rows_sent: 1 Rows_examined: 1 SET timestamp=1255345490; SELECT * FROM organization_details;
  45. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited No Consistency Mar 23 22:05:24 Macintosh com.apple.launchd[1] (httpd): Throttling respawn: Will start in 10 seconds
  46. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited Problem 2: Time Formats 130460505 Oct 11 20:21:47 [29/Apr/2011:07:05:26 +0000] 020805 13:51:24 @4000000037c219bf2ef02e94
  47. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited Problem 3: Decentralized • Logs are spread across all of your servers • Many servers have many different kinds of logs • ssh + grep aren’t scalable
  48. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited Problem 4: Experts Required • People interested in the logs often… • Do not have access to read the logs. • Do not have expertise to understand the data. • Do not know where the logs are.
  49. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited What is Logstash? • Event processing engine, optimized for logs • raw data in, enriched data out • Written in Ruby, runs on JRuby • Simple to extend, efficient to run • Events pass through a pipeline • Inputs: receive data from files, network, etc. • Filters: enrich, massage, process the event data • Outputs: send event data to other systems • Designed to be extremely flexible • Most commonly used to index data in Elasticsearch
  50. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited Event Flow Simple Apache flow from input to Elasticsearch
  51. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited Logstash Pipeline • Inputs: • Network (TCP/UDP), File, syslog, stdin. • RabbitMQ, Redis • Twitter, IMAP, S3, gelf, collectd • Filters: • grok, date, mutate, ruby, geoip, etc. • Outputs: • Elasticsearch, MongoDB, File, S3 • PagerDuty, Nagios, Zabbix, Email • TCP, Redis, RabbitMQ, syslog • Graphite, Ganglia, StatsD, etc.
  52. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited Inputs (50+) • Network (TCP/UDP) + File: most common • syslog / rsyslog: supports multiple simultaneous clients • RabbitMQ, Redis, Kafka: used in larger clusters • stdin: handy for "backfilling" data, or testing • Twitter: follow your brand's social media activity • Email (IMAP): so you don't need to read it all yourself! • Lumberjack: resilient, compressed, secure • Amazon S3, gelf, collectd, ganglia, sqs, varnishlog, etc. etc.
  53. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited Filters (60+) • grok: for extracting data using pattern matching • date: parse timestamps from fields, for use as "official" timestamps • mutate: rename, remove, replace, and modify fields in your events • ruby: run arbitrary Ruby code in the pipeline • geoip: determine geographical location based on IP address • csv: parse CSV data (or any pattern-separated data) • kv: parse key-value pairs in event data • And many, many, more
  54. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited Outputs (75+) Outputs tend to fit certain categories: • Storage: (Elasticsearch, MongoDB, S3, File, etc) • Notification: (PagerDuty, Nagios, Zabbix, Email, etc.) • Relay: (TCP, Redis, RabbitMQ, Syslog, etc.) • Metrics: (Graphite, Ganglia, StatsD, etc.)
  55. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited Configuration input {} filter {} output {}
  56. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited Input • input {} • input { plugin { setting_1 => "value" } } • input {
 plugin {
 setting_1 => "value"
 array_2 => ["value1","value2"]
 hash_3 => { key => "value" }
 # comment
 }
 }
  57. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited Filter • filter {} • filter { plugin { setting_1 => "value" } } • filter {
 plugin {
 setting_1 => "value"
 array_2 => ["value1","value2"]
 hash_3 => { key => "value" }
 # comment
 }
 }
  58. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited Output • output {} • output { plugin { setting_1 => "value" } } • output {
 plugin {
 setting_1 => "value"
 array_2 => ["value1","value2"]
 hash_3 => { key => "value" }
 # comment
 }
 }
  59. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited What is Kibana? • Data Visualization tool • Runs in a browser (served locally or remote) • No programming necessary • Reads data from Elasticsearch • Multiple panel types • Save and share dashboards • Democratize your data!
  60. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited table Drill into individual events. 73
  61. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited Use Cases • Free search • Structured Search • Data Analytics • Log analysis • Event analysis • Visual Exploration via Kibana • Social Streams
  62. Distributed by design Sharding is the unit of distribution in

    Elasticsearch A shard is a fully-functional Lucene Search Engine and contains many Lucene segments Primary Shard — All data is indexed here first Replica Shard — Copy of indexed data which serves 2 purposes: ‣ Increase high availability ‣ Increase read throughput
  63. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited. one shard s s many segments s s s s s s s s many shards s s →
  64. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited. one shard s s many segments one index I I s s s s s s s s many shards s s → →
  65. Scale-out with shards Shards are moved around in the cluster

    For performance, 1 index with 5 shards same as 5 indices with 1 shard Once index is created, cannot change # of shards Replicas can be added/increased any time
  66. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited Create Index API Creating Index a with 2 shards and 1 replica (a total of 4 shards) Creating Index b with 3 shards and 1 replica (a total of 6 shards) curl -XPUT 'localhost:9200/a' -d '{ "settings" : { "number_of_shards" : 2, "number_of_replicas" : 1 } }' curl -XPUT 'localhost:9200/b' -d '{ "settings" : { "number_of_shards" : 3, "number_of_replicas" : 1 } }'
  67. Copyright Elasticsearch 2014 Copying, publishing and/or distributing without written permission

    is strictly prohibited node4 Node/Shard Allocation Indices “a” and “b”, “a” with 2 shards and 1 replica, and “b” with 3 shards with 1 replica, on a 4 node cluster node1 a0 b1 b2 node3 b1 a0 node2 a1 b0 b0 a1 b2 Primary Replica
  68. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited. start small node_A shard_0 shard_1 shard_2
  69. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited. add more nodes node_A shard_0 shard_1 shard_2 node_B node_C
  70. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited. shards migrate node_A shard_0 shard_1 shard_2 node_B shard_1 node_C shard_2
  71. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited. rebalanced automatically node_A shard_0 node_B shard_1 node_C shard_2
  72. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited. Master node • Master manages cluster activity • By default all nodes are master ready • Master election automatic • If master node fails, another node elected automatically • Cluster has just one master at any time • Three master-eligible nodes recommended
  73. Copyright Elasticsearch 2015. Copying, publishing and/or distributing without written permission

    is strictly prohibited Hands-on Lab Elasticsearch | Logstash | Kibana
  74. Copyright Elasticsearch 2015. Copying, publishing and/or distributing without written permission

    is strictly prohibited Lab Resources • http://bit.ly/17fzASv (https://s3.amazonaws.com/elk-workshop.elasticsearch.org/ hands-on-workshop/20150218-strata/hands-on-workshop.tar.gz) • USB Drive
  75. Copyright Elasticsearch 2015. Copying, publishing and/or distributing without written permission

    is strictly prohibited Lab: Logstash Installing and Running Logstash
  76. Copyright Elasticsearch 2015. Copying, publishing and/or distributing without written permission

    is strictly prohibited Installation • Install Logstash • Create basic configuration file Goals:
  77. Copyright Elasticsearch 2015. Copying, publishing and/or distributing without written permission

    is strictly prohibited Step 1 • Obtain tarball • Uncompress into your directory • tar -zxf logstash/logstash-1.4.2.tar.gz • Result will be a directory: logstash-1.4.2 • Change directory to logstash-1.4.2 • cd logstash-1.4.2
  78. Copyright Elasticsearch 2015. Copying, publishing and/or distributing without written permission

    is strictly prohibited Step 2 Run the following test: $ bin/logstash -e \ 'input { stdin {} } output { stdout { codec => rubydebug } }' … Hello world! { "message" => "Hello world!", "@version" => "1", "@timestamp" => "2014-07-11T23:09:11.981Z", "host" => "oh-my" }

  79. Copyright Elasticsearch 2015. Copying, publishing and/or distributing without written permission

    is strictly prohibited Step 3 • View log file logstash/sample1.log • View config file logstash/logstash-lab4.conf • Run the following with log entry found in log file $ bin/logstash -f ../logstash/logstash-lab4.conf <paste sample1 log entry here>

  80. Copyright Elasticsearch 2015. Copying, publishing and/or distributing without written permission

    is strictly prohibited Step 4 Your results should look similar to this: { "message" => "22/Mar/2014:16:38:00 -0700 183.60.215.50 <This>", "@version" => "1", "@timestamp" => "2014-03-22T23:38:00.000Z", "host" => "oh-my", "ip" => "183.60.215.50", "msg" => "This", "geoip" => { … <this part on next slide> … } }
  81. Copyright Elasticsearch 2015. Copying, publishing and/or distributing without written permission

    is strictly prohibited "geoip" => { "ip" => "183.60.215.50", "country_code2" => "CN", "country_code3" => "CHN", "country_name" => "China", "continent_code" => "AS", "region_name" => "30", "city_name" => "Guangzhou", "latitude" => 23.11670000000001, "longitude" => 113.25, "timezone" => "Asia/Chongqing", "real_region_name" => "Guangdong", "location" => [ [0] 113.25, [1] 23.11670000000001 ] }
  82. Copyright Elasticsearch 2015. Copying, publishing and/or distributing without written permission

    is strictly prohibited Lab: Elasticsearch Installing and Running Elasticsearch
  83. Copyright Elasticsearch 2015. Copying, publishing and/or distributing without written permission

    is strictly prohibited Configuration • Node configuration in config/elasticsearch.yml cluster.name: test_cluster discovery.zen.ping.multicast.enabled: false http.cors.enabled: true
  84. Copyright Elasticsearch 2015. Copying, publishing and/or distributing without written permission

    is strictly prohibited ES Lab: Starting a Node • Extract Elasticsearch, Install and Execute # extract and cd into the directory % tar zxf elasticsearch/elasticsearch-1.4.2.tar.gz % cd elasticsearch-1.4.2 # install marvel from network or local file % bin/plugin -i elasticsearch/marvel/latest - or - % bin/plugin -i marvel -u file:../marvel/marvel-latest.zip # run in the foreground % bin/elasticsearch
  85. Copyright Elasticsearch 2015. Copying, publishing and/or distributing without written permission

    is strictly prohibited ES Lab: Is it running? Check the result: curl 'localhost:9200' { "status" : 200, "name" : "Akhenaten", "cluster_name" : "elasticsearch", "version" : { "number" : "1.4.2", "build_hash" : "927caff6f05403e936c20bf4529f144f0c89fd8c", "build_timestamp" : "2014-12-16T14:11:12Z", "build_snapshot" : false, "lucene_version" : "4.10.2" }, "tagline" : "You Know, for Search" }
  86. Copyright Elasticsearch 2015. Copying, publishing and/or distributing without written permission

    is strictly prohibited ES Lab: Apache Logs Storing Logstash processed logs in Elasticsearch
  87. Copyright Elasticsearch 2015. Copying, publishing and/or distributing without written permission

    is strictly prohibited ES Lab: Populating Data • Run Logstash using this new configuration % cd logstash-1.4.2 % cp ../logstash/complete.conf . % bin/logstash -f complete.conf
  88. Copyright Elasticsearch 2015. Copying, publishing and/or distributing without written permission

    is strictly prohibited ES Lab: Populating ES (cont.) • Copy/paste sample log • cat ../sample2.log 71.141.244.242 - kurt [18/May/2011:01:48:10 -0700] "GET /admin HTTP/1.1" 301 566 "-" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3"
  89. Copyright Elasticsearch 2015. Copying, publishing and/or distributing without written permission

    is strictly prohibited ES Lab: Populating ES (cont.) { "message" => "71.141.244.242 - kurt [18/May/ 2011:01:48:10 -0700] \"GET /admin HTTP/1.1\" 301 566 \"-\" \"Mozilla/5.0 (Windows; U; Windows NT 5.1; en- US; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3\"", "@version" => "1", "@timestamp" => "2011-05-18T08:48:10.000Z", "host" => "cadenza", "clientip" => "71.141.244.242", "ident" => "-", "auth" => "kurt", "timestamp" => "18/May/2011:01:48:10 -0700", "verb" => "GET", "request" => "/admin", …truncated… }
  90. Copyright Elasticsearch 2015. Copying, publishing and/or distributing without written permission

    is strictly prohibited ES Lab: Simple Queries • Get a count of documents (in the entire cluster) % curl -XGET 'localhost:9200/logstash-*/_count?pretty' { "count" : 1, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 } }
  91. Copyright Elasticsearch 2015. Copying, publishing and/or distributing without written permission

    is strictly prohibited ES Lab: Simple Queries • Perform a simple search for term “mozilla” curl -XGET 'localhost:9200/_search?q=mozilla&pretty' { "took" : 2, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 1, "max_score" : 0.047945753, "hits" : [ { "_index" : "logstash-2013.12.11", "_type" : "logs", "_id" : “z8V_NXAHQkigh-SaFW26yg", ... } ] }
  92. Copyright Elasticsearch 2015. Copying, publishing and/or distributing without written permission

    is strictly prohibited ES Lab: Complex Search Example • Index some more documents using Logstash • Sample dataset in logs.gz file (copy into the current directory and unzip it) • Use the same configuration as previous example % gzip -d ../logs.gz % cp ../logs ./logs
 % cat logs | bin/logstash -f complete.conf
  93. Copyright Elasticsearch 2015. Copying, publishing and/or distributing without written permission

    is strictly prohibited ES Lab: Monitoring with Marvel • Simply browse to the Marvel installation! http://localhost:9200/_plugin/marvel
  94. Copyright Elasticsearch 2015. Copying, publishing and/or distributing without written permission

    is strictly prohibited ES Lab: Aggregations • What are the top IP addresses in our apache logs? • http://bit.ly/1h2tmqt<- JSON can be found here! GET logstash-*/_search { "aggs" : { "top_uris" : { "terms" : { "field" : "clientip", "size" : 3 } } } }
  95. Copyright Elasticsearch 2015. Copying, publishing and/or distributing without written permission

    is strictly prohibited ES Lab: Searching Elasticsearch • Results of this query (top IPs in the log) { "hits" : {… list of hits …} "aggregations" : { "top_uris" : { "buckets" : [ { "key" : "128.30.28.58", "doc_count" : 104 }, { "key" : "65.115.35.83", "doc_count" : 71 }, { "key" : "151.250.94.199", "doc_count" : 52 } … } } }
  96. Copyright Elasticsearch 2015. Copying, publishing and/or distributing without written permission

    is strictly prohibited ES Lab: Aggregations • Exclude a specific term from the aggregation GET logstash-*/_search { "size": 0, "aggs" : { "top_uris" : { "terms" : { "field" : "clientip", "size" : 3, "exclude" : "128.30.28.58" } } } }
  97. Copyright Elasticsearch 2015. Copying, publishing and/or distributing without written permission

    is strictly prohibited ES Lab: Aggregations Result "aggregations" : { "top_uris" : { "buckets" : [ { "key" : "65.115.35.83", "doc_count" : 71 }, { "key" : "151.250.94.199", "doc_count" : 52 }, { "key" : "46.105.14.53", "doc_count" : 5 } ] }
  98. Copyright Elasticsearch 2015. Copying, publishing and/or distributing without written permission

    is strictly prohibited Lab: Kibana Time to explore those logs. 124
  99. Copyright Elasticsearch 2015. Copying, publishing and/or distributing without written permission

    is strictly prohibited Step 0: Run Kibana • Logstash ships with Kibana for easy deployment • bin/logstash web • Enable CORS on Elasticsearch • off by default since version 1.4.0 • set “http.cors.enabled: true” in “elasticsearch.yml" • Visit: http://localhost:9292/ • Alternate means of deployment: • Apache, nginx, lighttpd. 125
  100. Copyright Elasticsearch 2015. Copying, publishing and/or distributing without written permission

    is strictly prohibited Exercise 2: Structured Search • Remember our grok field ‘response’ • Search for: response:200 • Observation: Some apache logs could have ‘200’ anywhere in the text. • Searching a specific field gets you more accurate results! 129
  101. Copyright Elasticsearch 2015. Copying, publishing and/or distributing without written permission

    is strictly prohibited Exercise 3: Exploration • Search for: * • Time: Click-drag on histogram to zoom in on time • Field exploration (the table panel!) • Click an event to expand it. • Click or ⃠ to include or exclude a field value. 130
  102. Copyright Elasticsearch 2015. Copying, publishing and/or distributing without written permission

    is strictly prohibited Exercise 4: Ranges and Labels • Search for: response:>=200 AND response:<300 • response:[200 TO 299] • Click + to add another query. • Repeat for 300 TO 399, 400 TO 499, 500 TO 599 • Click colored dot next to each query • Choose desired color, if any. • Set legend value: • “OK” for 200, “Redirect” for 300, • “Client Error” for 400, “Server Error” for 500 131
  103. Copyright Elasticsearch 2015. Copying, publishing and/or distributing without written permission

    is strictly prohibited Exercise 5a: Top N query • Change query type to topN • Field: useragent.os • Count: 10 • What looks wrong? 132
  104. Copyright Elasticsearch 2015. Copying, publishing and/or distributing without written permission

    is strictly prohibited Exercise 5b: Top N query • Change query type to topN • Field: useragent.os.raw • Count: 10 133
  105. Copyright Elasticsearch 2015. Copying, publishing and/or distributing without written permission

    is strictly prohibited Exercise 6: Plot Metrics! • Goal: plot bandwidth usage over time • topN query on ‘geoip.country_name.raw’ field • Add new panel: Histogram • Set style to: line • Chart value: total • Value field: bytes • View last 30 days. See anything? 134
  106. Copyright Elasticsearch 2015. Copying, publishing and/or distributing without written permission

    is strictly prohibited Exercise 7: Custom Dashboard • Dashboard Goal: Apache Logs Overview • Save your work while you build this dashboard! Event Count Histogram Bytes Histogram (terms panel) HTTP Response Pie Chart (terms panel) Top 10 IPs (terms panel) Top 10 Requests (map panel) geoip.country_ code2.raw
  107. Copyright Elasticsearch 2014. Copying, publishing and/or distributing without written permission

    is strictly prohibited Resources • Support: http://www.elasticsearch.com/support • Community Resources: • irc: #logstash and #elasticsearch on freenode • email: [email protected] • email: [email protected] • meetups: http://elasticsearch.meetup.com/ • twitter: @elasticsearch • github: https://github.com/elasticsearch/