Slide 1

Slide 1 text

‹#› Edition: March 23, 2017 Elastic Stack/X-Pack 5.0 for IT Ops Workshop

Slide 2

Slide 2 text

Objective • Be an expert with broader knowledge to the Elastic Stack and the X-Pack • Secure your cluster • Leverage realtime alerting capability for daily IT operations 2 "Take full advantage of the Elastic Stack and the X-Pack to maximize your IT operational excellence."

Slide 3

Slide 3 text

3 Dealing with Time-series Data

Slide 4

Slide 4 text

Curl vs Console with Kibana 4 $ curl -XGET "https://ES_HOST:ES_PORT/_search" -H "Content-type:application/json" \ -u ES_USER:ES_PASSWORD -d' { "query": { "match_all": {} } }'

Slide 5

Slide 5 text

CRUD 5 PUT my-metrics-2017-03-02/my-type/1 { "@timestamp": "2017-03-02T14:12:00", "host": "server-01", "cpu_usage": 0.10, "free_memory": 12285 } GET my-metrics-2017-03-02/my-type/1 PUT my-metrics-2017-03-02/my-type/1 { "@timestamp": "2017-03-02T14:12:00", "host": "server-01", "cpu_usage": 0.10, "free_memory": 12285, "load_average": 1.52 } DELETE my-metrics-2017-03-02/my-type/1

Slide 6

Slide 6 text

Search Basics 6 GET my-metrics-*/_search?q=* GET my-metrics-*/_search { "size": 10, "query": { "match_all": {} } } URI search with Query String Query (Lucene Syntax) Search with Query DSL

Slide 7

Slide 7 text

Range Query with Date Math 7 GET my-metrics-2017-03-02/_search { "query": { "range": { "@timestamp": { "gte": "now-10m" } } } } Search the events happened within the last 10 minutes.

Slide 8

Slide 8 text

Stats Aggregation 8 GET metricbeat-2017.02.28/_search { "size": 0, "query": { "range": { "@timestamp": { "gte": "now-10m" } } }, "aggs": { "1": { "stats": { "field": "system.cpu.user.pct" } } } }
 { "took": 5, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 280646, "max_score": 0, "hits": [] }, "aggregations": { "1": { "count": 1222, "min": 0, "max": 0.6950000000000001, "avg": 0.09341571194762681, "sum": 114.15399999999995 } } }

Slide 9

Slide 9 text

Date Histogram Aggregation 9 GET my-metrics-2017-02-28/_search { "size": 0, "aggs": { "1": { "date_histogram": { "field": "@timestamp", "interval": "minute" } } } } { "hits": { "total": 280646, "max_score": 0, "hits": [] }, "aggregations": { "1": { "buckets": [ { "key_as_string": "2017-02-28T05:21:00.000Z", "key": 1488259260000, "doc_count": 686 }, { "key_as_string": "2017-02-28T05:22:00.000Z", "key": 1488259320000, "doc_count": 1387 }, { "key_as_string": "2017-02-28T05:23:00.000Z", "key": 1488259380000, "doc_count": 1384 },

Slide 10

Slide 10 text

Stats Aggregation over Minutes 10 GET metricbeat-2017.02.28/_search { "size": 0, "aggs": { "1": { "date_histogram": { "field": "@timestamp", "interval": "minute" }, "aggs": { "2": { "stats": { "field": "system.cpu.user.pct" } } } } } }
 { "aggregations": { "1": { "buckets": [ { "2": { "count": 3, "min": 0, "max": 0.148, "avg": 0.08033333333333333, "sum": 0.241 }, "key_as_string": "2017-02-28T05:21:00.000Z", "key": 1488259260000, "doc_count": 686 }, { "2": { "count": 6, "min": 0.03, "max": 0.196, "avg": 0.11583333333333333, "sum": 0.695 }, "key_as_string": "2017-02-28T05:22:00.000Z", "key": 1488259320000, "doc_count": 1387 },

Slide 11

Slide 11 text

11 Timelion

Slide 12

Slide 12 text

Pull Data from Elasticsearch 12 .es(index=metricbeat-*) Show document counts .es(index=metricbeat-*,metric=avg:system.cpu.user.pct) Plot aggregated values (avg, sum, min, max or cardinality) .es(index=metricbeat-*,metric=avg:system.cpu.user.pct).movingaverage(window=3) Moving Average Aggregation .static(1000) Draw statical line

Slide 13

Slide 13 text

Styles 13 .es(index=metricbeat-*,metric=avg:system.cpu.user.pct).bars()
 .es(index=metricbeat-*,metric=avg:system.cpu.user.pct).lines()
 .es(index=metricbeat-*,metric=avg:system.cpu.user.pct).points()

Slide 14

Slide 14 text

Colors 14 .es(index=metricbeat-*,metric=avg:system.cpu.user.pct).bars().color(lightblue)

Slide 15

Slide 15 text

Operations 15 $avg=.es(index=metricbeat-*,metric=avg:system.cpu.user.pct) Assignment ($avg).add($avg) ($avg).multiply(2) ($avg).subtract($avg) ($avg).divide($avg) Arithmetic ($avg).if(gt,0.2,$avg,null) Conditional

Slide 16

Slide 16 text

16 Metricbeat

Slide 17

Slide 17 text

Metricbeat 17 Modules Apache, HAProxy, MongoDB, MySQL, Nginx, PostgreSQL, Redis, System, Zookeeper Report on Error Enables you to monitor not only the metrics, but also any errors with full message string that occur during metrics monitoring. Retrieve Raw Data Doesn’t do aggregations. All raw data is available on the Elasticsearch host for drilling down into the details, and the data can be reprocessed at any time. Multiple Metrics in One Event Elasticsearch can directly store and query the metrics as a nested JSON document. A lightweight shipper that you can install on your servers to periodically collect metrics from the operating system and services running on the computer.

Slide 18

Slide 18 text

Ready Made Dashboards for System Module 18

Slide 19

Slide 19 text

Configuration 19 metricbeat.modules: - module: system metricsets: # CPU stats - cpu … output.elasticsearch: hosts: ["ES_HOST:9200"] protocol: "https" username: "elastic" password: "changeme" Modules $ metricbeat e —c metricbeat.yml Running from the command line

Slide 20

Slide 20 text

20 X-Pack: Security

Slide 21

Slide 21 text

Security Features 21 Access Control Role-base access control against indices, documents and fields. Native, LDAP, AD, PKI and custom realms are supported. Encrypting Communications Enable SSL/TLS against endpoints and cluster-internal communications. IP Filtering Deny/allow access from specific hosts and IP addresses. Auditing Security Events Record security events on index and log file.

Slide 22

Slide 22 text

‹#›

Slide 23

Slide 23 text

Built-in kibana_user Role 23

Slide 24

Slide 24 text

Creating Read-only Role 24

Slide 25

Slide 25 text

Creating User with Read-only Role 25

Slide 26

Slide 26 text

26 X-Pack: Alerting

Slide 27

Slide 27 text

Your Watch E.g. • Send e-mail to web admins when the number of access/min is 120% greater than the moving average. Check it every minute. • Slack on #it-sec when the number of login failures/minute per ip is greater than 5. Check it every 5 seconds. • Generate a report from a dashboard as always. Check it 8am on Mondays. 27 Can be described in a natural language as: [Action] when [input] is [condition]. Check it [trigger].

Slide 28

Slide 28 text

Watch APIs 28 PUT _xpack/watcher/watch/my-watch { … } GET _xpack/watcher/watch/my_watch DELETE _xpack/watcher/watch/my_watch PUT _xpack/watcher/watch/my_watch/_activate PUT _xpack/watcher/watch/my_watch/_deactivate

Slide 29

Slide 29 text

Watch Definition 29 trigger Determines how frequently the watch is checked. (hourly, daily, weekly, monthly, yearly, cron or interval) input Loads data into the watch payload. What alert on. Typically an Elasticsearch query. (simple, search, http, chain) condition Decides whether to take actions. (always, never, compare, array_compare, script) transform Processes the watch payload. Both the watch level and the action level are available. actions Specifies actions to take when the condition is met. (email, webhook, index, logging and etc.) metadata Defines optional static metadata. PUT _xpack/watcher/watch/my-watch { "trigger": {…}, "input": {…}, "condition": {…}, "transform": {…}, "actions": {…} "metadata": {…} }

Slide 30

Slide 30 text

Watch History 30 GET .watcher-history-*/_search watch_id The name of the watch that was triggered. trigger_event How the watch was triggered (manual or schedule) and the watch’s scheduled time and actual trigger time. input The input type (http, search, or simple) and definition. condition The condition type (always, never, or script) and definition. state The state of the watch execution (execution_not_needed, executed, throttled). result The results of each phase of the watch execution. Shows the input payload, condition status, transform status (if defined), and actions status

Slide 31

Slide 31 text

Watch Context 31 ctx.watch_id The id of the watch that is currently executing. ctx.execution_time The time execution of this watch started. ctx.trigger.triggered_time The time this watch was triggered. ctx.trigger.scheduled_time The time this watch was supposed to be triggered. ctx.metadata.* Any metadata associated with the watch. ctx.payload.* The payload data loaded by the watch’s input.

Slide 32

Slide 32 text

Trigger - Interval 32 { "trigger" : { "schedule" : { "interval" : "5m" } } } Runs triggers every five minutes.

Slide 33

Slide 33 text

Input - Search 33 { "input": { "search": { "request": { "indices": [ "logs" ], "body": { "query": { "match_all": {} } } }, "extract": [ "hits.total" ] } } } Run query/aggregation upon a local Elasticsearch cluster.

Slide 34

Slide 34 text

Watch Payload 34 ctx.payload.hits All the search hits. ctx.payload.hits.total Number of documents of being hit. ctx.payload.hits.hits.0 The first document of the hits. ctx.payload.hits.hits..fields. A field value of a particular hit. ctx.payload.aggregations..buckets...value An aggregated value of a specific bucket.

Slide 35

Slide 35 text

Conditions 35 { "condition" : { "compare" : { "ctx.payload.hits.total" : { "gte" : 5 } } } { "condition": { "always": {} } } { "condition": { "never": {} } } "always" forces the watch actions to be executed unless they are throttled. Never execute actions. Frequently used for comparing the value in the watch payload with a threshold. Available operators: eq, not_eq, gt, gte, lt and lte.

Slide 36

Slide 36 text

Action - Email Setup 36 xpack.notification.email.account: gmail_account: profile: gmail smtp: auth: true starttls.enable: true host: smtp.gmail.com port: 587 user: Configure an email account in elasticsearch.yml.

Slide 37

Slide 37 text

Action - Email 37 { "actions":{ "send_email":{ "email":{ "to":"@", "subject":"Watcher Notification", "body":"{{ctx.payload.hits.total}} error logs found", "attachments":{ "dashboard.pdf":{ "reporting":{ "url":"http://example.org:5601/api/reporting/generate/dashboard/Error-Monitoring" } } } } } } } The subject and the body can contain static text and the watch context as Mustache templates. http, data and reporting type attachments are supported.

Slide 38

Slide 38 text

Action - Webhook 38 "actions" : { "create_github_issue" : { "webhook" : { "method" : "POST", "url" : "https://api.github.com/repos///issues", "body" : "{ \"title\": \"Found errors in 'contact.html'\", \"body\": \"Found {{ctx.payload.hits.total}} errors in the last 5 minutes\", \"assignee\": \"web-admin\", \"labels\": [ \"bug\", \"sev2\" ] }", "auth" : { "basic" : { "username" : "", "password" : "" } } } } } Performs a HTTP/HTTPS request to any third party’s web service.

Slide 39

Slide 39 text

Action - Index Single Document 39 "actions" : { "index_payload" : { "index" : { "index" : "my-index", "doc_type" : "my-type" , "execution_time_field": "@timestamp" } } Index ctx.payload into an Elasticsearch index as a single document.

Slide 40

Slide 40 text

Action - Time Based Throttling 40 "actions" : { "email_administrator" : { "throttle_period": "15m", "email" : { … "throttle_period" : "15m", "actions" : { "email_administrator" : { "email" : { … "notify_pager" : { The watch level and action level throttling is available. The action will not be taken while throttled (default 5 sec).

Slide 41

Slide 41 text

Execute Watch API 41 PUT _xpack/watcher/watch/my-watch/_execute PUT _xpack/watcher/watch/_execute { "watch" : { "trigger": { … }, "input": { … }, "condition": { … }, "actions": { … }, "meta": { … }, "throttle_period": { … } } } Execute a watch inline without registering for debugging. Forces execution of a stored watch outside of its triggering logic.

Slide 42

Slide 42 text

Alerting Idea - Minute by Minute Roll-up 42 "input": { "search": { "request": { "indices": ["flight-track—*"], "body": { "query": { "range": { "@timestamp": {"gte": "now-1m" } } }, "aggs": { "1": { "stats": {"field": "speed"} } … "actions": { "index_payload": { "transform": { "script": { "lang": "painless", "inline": "return ctx.payload.aggregations.1" } }, "index": { "index": "rollup-speed", "doc_type": "metric", "execution_time_field": "@timestamp" Strategy: Run stats aggregation upon a specific field every minute and index.

Slide 43

Slide 43 text

Alerting Idea - Alert with Moving Average Aggregation 43 { "condition":{ "script":{ "lang":"painless", "inline":"return ctx.payload.aggregations.agg_day.buckets.29.agg_bytes.value > ctx.payload.aggregations.agg_day.buckets.29.agg_moving_avg.value * params.gap", "params":{ "gap":1.2 } } } } Strategy: Run moving_avg aggregation upon the target index. Compare the value on the last bucket with the actual value. This example runs upon 30 days with "interval": "day" setting thus, the 30 buckets will be returned.

Slide 44

Slide 44 text

Alerting Idea - Measure Time Differences 44 "aggs":{ "agg_session_id":{ "terms":{ "field":"session_id.keyword" }, "aggs":{ "agg_user":{ "terms":{ "field":"user.keyword" } }, "agg_start":{ "min":{ "field":"@timestamp" } }, "agg_end":{ "max":{"field":"@timestamp" } }, "agg_duration":{ "bucket_script":{ "buckets_path":{ "min":"agg_start", "max":"agg_end" }, "script":{ "lang":"painless", "inline":"return params.max - params.min" } } } Strategy: Run terms aggregation upon a field which specifies a time series event and calculate max - min timestamp.

Slide 45

Slide 45 text

Restrictions on Elastic Cloud 45 • Email is delivered from the Elastic Cloud as the Email action is taken. So use of own SMTP server is not possible. • The default throttle period is not configurable. Specify a throttle period per watch/action, however.