Upgrade to Pro — share decks privately, control downloads, hide ads and more …

What's the 411? Building Alerts on Elasticsearch at Etsy

Elastic Co
February 18, 2016

What's the 411? Building Alerts on Elasticsearch at Etsy

When working with a web application, you may find yourself drowning in logs. Some of this data is vital for debugging your application, and some can be a rich source of data for alerting. This talk will cover how Etsy constructs and responds to alert queries, as well as offer up ideas for additional types of alerts.

Elastic Co

February 18, 2016
Tweet

More Decks by Elastic Co

Other Decks in Technology

Transcript

  1. Kai Zhong
    Security Engineer @ Etsy
    @sixhundredns
    411: Automated alerts
    on Elasticsearch

    View full-size slide

  2. •  LAMP stack
    •  Continuous Deployment
    –  Tests
    –  Feature flags
    –  Logs

    View full-size slide

  3. Main Elasticsearch cluster
    •  Types
    –  Access logs
    –  Application logs
    –  Error logs
    •  3,000,000,000 lines/day
    •  30 day log retention
    •  211TB data
    LOG ALL THE THINGS

    View full-size slide

  4. •  Heavily depend on alerting
    •  Moved to ES in mid 2014
    •  We wanted
    –  Concise query syntax
    –  Automatic query scheduling
    •  No good options at the time
    ALERT ON ALL THE
    THINGS

    View full-size slide

  5. Search scheduling
    Alert management

    View full-size slide

  6. Searches:
    Automatically query a data source and
    return information
    Types
    •  Ping
    –  Check the reachability of a host
    •  HTTP
    –  Check the response code of an URL
    •  Logstash
    –  Retrieve results from Elasticsearch

    View full-size slide

  7. Filters:
    Remove matching Alerts
    Types
    •  Regex
    –  Filter Alerts matching a regex
    •  Dedupe
    –  Filter Alerts that have been seen recently
    •  Throttle
    –  Filter Alerts that occur frequently

    View full-size slide

  8. Targets:
    Send Alerts to external services
    Types
    •  WebHook
    –  Send Alerts to an HTTP endpoint
    •  Notification
    –  Send Alerts to an (extra) email address

    View full-size slide

  9. Alert Pipeline
    Search
    Targets
    Filters
    ./search  |  filter1  |  filter2  |  tee  target1  target2  
    Alerts
    Alerts

    View full-size slide

  10. Searches
    Fields
    •  Query
    –  The query to execute
    •  Frequency
    –  How often to schedule the query
    •  Assignee
    –  User/Group responsible for these Alerts
    •  Priority
    –  How important these Alerts are

    View full-size slide

  11. Alert Emails
    Priority Email Frequency
    High Immediately
    Medium Hourly Rollup
    Low Never

    View full-size slide

  12. Dashboard
    •  Summary of active alerts
    •  Historical alert information

    View full-size slide

  13. User management
    •  Manage users
    –  Create
    –  Modify
    –  Delete

    View full-size slide

  14. Group management
    •  Manage groups
    –  Create
    –  Modify
    –  Delete

    View full-size slide

  15. Searches page
    •  Manage searches
    –  Create
    –  Enable/Disable
    –  View Health

    View full-size slide

  16. Search page
    •  Manage a search
    –  Modify
    –  Delete
    –  Test
    –  Execute
    –  Configure Filters/Targets
    •  View statistics
    •  Changelog

    View full-size slide

  17. Alerts:
    Are actionable events
    Actions
    •  Escalation
    –  Promotes an Alert to high priority
    •  Assignment
    –  Sets a new Assignee for an Alert
    •  Resolution
    –  Marks an Alert as finished

    View full-size slide

  18. •  Filters Alerts
    •  Manage Alerts
    –  Escalate/De-escalate
    –  Assign
    –  Mark New/In Progress/Resolved
    –  Add Note
    Alerts page

    View full-size slide

  19. Alert page
    •  Manage Alert
    –  Escalate/De-escalate
    –  Assign
    –  Mark New/In Progress/Resolved
    –  Add Note
    •  View changelog

    View full-size slide

  20. ES_Proxy
    Pipelined Lucene shorthand

    View full-size slide

  21. Command Syntax
    Joins *  |  join  source:src_ip  target:dst_ip  
    Aggregations
    *  |  agg:terms  field:src_ip  
       |  agg:terms  field:user_id  
    Transactions *  |  trans  field:request_uuid  
    Lists src_ip:@internal_ips  
    Features

    View full-size slide

  22. Logstash Search page
    Fields:
    •  Time Range
    –  How far back to query
    •  Result Type
    –  The type of data to return
    •  Result Filter
    –  Only return results if the result set
    matches a condition

    View full-size slide

  23. Search Ideas
    •  Spike in HTTP 500 responses
    •  POSTs with a referrer from another site
    •  Odd HTTP verbs
    •  Googlebot useragent from non-Google IP
    •  Requests from known bad IPs
    •  Sign-ins from unusual locations

    View full-size slide

  24. Thanks
    Emily Sommer
    Ken Lee
    Avleen Vig
    Security
    Operations

    View full-size slide

  25. Questions?
    https://github.com/Etsy/411
    Kai Zhong
    [email protected]
    @sixhundredns

    View full-size slide