Slide 1

Slide 1 text

Kai Zhong Security Engineer @ Etsy @sixhundredns 411: Automated alerts on Elasticsearch

Slide 2

Slide 2 text

•  LAMP stack •  Continuous Deployment –  Tests –  Feature flags –  Logs

Slide 3

Slide 3 text

Main Elasticsearch cluster •  Types –  Access logs –  Application logs –  Error logs •  3,000,000,000 lines/day •  30 day log retention •  211TB data LOG ALL THE THINGS

Slide 4

Slide 4 text

•  Heavily depend on alerting •  Moved to ES in mid 2014 •  We wanted –  Concise query syntax –  Automatic query scheduling •  No good options at the time ALERT ON ALL THE THINGS

Slide 5

Slide 5 text

Search scheduling Alert management

Slide 6

Slide 6 text

Scheduling

Slide 7

Slide 7 text

Searches: Automatically query a data source and return information Types •  Ping –  Check the reachability of a host •  HTTP –  Check the response code of an URL •  Logstash –  Retrieve results from Elasticsearch

Slide 8

Slide 8 text

Filters: Remove matching Alerts Types •  Regex –  Filter Alerts matching a regex •  Dedupe –  Filter Alerts that have been seen recently •  Throttle –  Filter Alerts that occur frequently

Slide 9

Slide 9 text

Targets: Send Alerts to external services Types •  WebHook –  Send Alerts to an HTTP endpoint •  Notification –  Send Alerts to an (extra) email address

Slide 10

Slide 10 text

Alert Pipeline Search Targets Filters ./search  |  filter1  |  filter2  |  tee  target1  target2   Alerts Alerts

Slide 11

Slide 11 text

Searches Fields •  Query –  The query to execute •  Frequency –  How often to schedule the query •  Assignee –  User/Group responsible for these Alerts •  Priority –  How important these Alerts are

Slide 12

Slide 12 text

Alert Emails Priority Email Frequency High Immediately Medium Hourly Rollup Low Never

Slide 13

Slide 13 text

Frontend

Slide 14

Slide 14 text

Dashboard •  Summary of active alerts •  Historical alert information

Slide 15

Slide 15 text

User management •  Manage users –  Create –  Modify –  Delete

Slide 16

Slide 16 text

Group management •  Manage groups –  Create –  Modify –  Delete

Slide 17

Slide 17 text

Searches page •  Manage searches –  Create –  Enable/Disable –  View Health

Slide 18

Slide 18 text

Search page •  Manage a search –  Modify –  Delete –  Test –  Execute –  Configure Filters/Targets •  View statistics •  Changelog

Slide 19

Slide 19 text

Alerts: Are actionable events Actions •  Escalation –  Promotes an Alert to high priority •  Assignment –  Sets a new Assignee for an Alert •  Resolution –  Marks an Alert as finished

Slide 20

Slide 20 text

•  Filters Alerts •  Manage Alerts –  Escalate/De-escalate –  Assign –  Mark New/In Progress/Resolved –  Add Note Alerts page

Slide 21

Slide 21 text

Alert page •  Manage Alert –  Escalate/De-escalate –  Assign –  Mark New/In Progress/Resolved –  Add Note •  View changelog

Slide 22

Slide 22 text

ES_Proxy Pipelined Lucene shorthand

Slide 23

Slide 23 text

Command Syntax Joins *  |  join  source:src_ip  target:dst_ip   Aggregations *  |  agg:terms  field:src_ip      |  agg:terms  field:user_id   Transactions *  |  trans  field:request_uuid   Lists src_ip:@internal_ips   Features

Slide 24

Slide 24 text

Logstash Search page Fields: •  Time Range –  How far back to query •  Result Type –  The type of data to return •  Result Filter –  Only return results if the result set matches a condition

Slide 25

Slide 25 text

Demo

Slide 26

Slide 26 text

Search Ideas •  Spike in HTTP 500 responses •  POSTs with a referrer from another site •  Odd HTTP verbs •  Googlebot useragent from non-Google IP •  Requests from known bad IPs •  Sign-ins from unusual locations

Slide 27

Slide 27 text

Thanks Emily Sommer Ken Lee Avleen Vig Security Operations

Slide 28

Slide 28 text

No content

Slide 29

Slide 29 text

Questions? https://github.com/Etsy/411 Kai Zhong [email protected] @sixhundredns

Slide 30

Slide 30 text

No content