ElasticSearch for Logging

8d96f5c273062cb617255e630fe0705c?s=47 Brad Lhotsky
September 20, 2013

ElasticSearch for Logging

A brief overview of the landscape of logging data with ElasticSearch followed by a number of lessons learned. By the end of the talk you should want to use ElasticSearch for logging and know enough to prevent shooting yourself in the foot.

8d96f5c273062cb617255e630fe0705c?s=128

Brad Lhotsky

September 20, 2013
Tweet

Transcript

  1. ElasticSearch for Logging One Man's Sordid Journey of Discovery Brad

    Lhotsky http://twitter.com/reyjrar http://github.com/reyjrar
  2. ‣Agile Development (for Structure!) ‣Test everything (mostly in production) ‣Failure

    is encouraged ‣IT Budget for taking the site down ‣Amazing Business Monitoring ‣KPI's for IT tied to business metrics ‣ElasticSearch was successful for Front-End
  3. bouncing logs into ElasticSearch

  4. None
  5. LogStash ‣Many Input / Filter / Output Plugins ‣Thriving Community

    ‣Daily Index Layout ‣Front-end? Not so much.
  6. Graylog2 ‣Pluggable Event Stream ‣Excellent Front-end ‣Index Layout-based on number

    of documents
  7. ‣Dealing with "days" make sense ‣Maintenance Operations Easy: Delete, Optimize,

    Close, Open ‣Results in a higher number of shards ‣Which indexes do I search for 1 week of data? ‣Maintenance Operations Expensive ‣Potentially lower number of shards and even index sizes Daily Schema logstash-YYYY.MM.DD Capacity Schema Graylog2
  8. Shameful Self Plug https://github.com/reyjrar/es-utils Set of utilities for managing data

    in daily index schemas
  9. Roll Your Own! Perl ElasticSearch.pm Python pyes Ruby tire JavaScript

    Elastic.js http://www.elasticsearch.org/guide/clients/
  10. You want pretty pictures?

  11. None
  12. None
  13. ‣Composable dashboards ‣Create incident specific dashboards while investigating the incident

    ‣Leverage the speed of ElasticSearch ‣Melt your cluster!
  14. None
  15. None
  16. None
  17. ElasticSearch is Magic

  18. ElasticSearch Black Magic

  19. index.auto_expand_replicas ‣Clustering order of operations issue ‣Can cause enormous data

    transfers between nodes leaving and entering a cluster ‣Defaults to true ‣You should set it to false
  20. Understanding Shard Allocation Tales from the script gone wrong.

  21. None
  22. None
  23. None
  24. None
  25. None
  26. None
  27. curl -XPUT localhost:9200/_cluster/settings -d '{ "transient" : { "cluster.routing.disable_allocation" :

    true } } For planned maintenance disable reallocation!
  28. curl -XPUT localhost:9200/_cluster/settings -d '{ "transient" : { "cluster.routing.disable_allocation" :

    false } } Re-enable when your node is back.
  29. ‣There are no query killers! ‣Memory is limited. ‣Aggressive caching

    by default consumes the heap. ‣This is normally good ‣Except when it's not ‣Thread pools are malleable by default, and maintaining buffers for them can also cost memory. A Perl programmer learns about Java memory management.
  30. Thanks to Jason for learning me the Graphites! http://goo.gl/XS0wzG

  31. index.cache.filter.max_size index.cache.filter.expire indices.fielddata.cache.size indices.fielddata.cache.expire Prevent Some Bad Queries

  32. threadpool: index: type: fixed size: 30 queue_size: 1000 reject_policy: caller

    Thread Pool Management (less relevant since 0.90.0)
  33. A Security guy asks about Access Control

  34. There are no solutions, aside from firewalls. ‣If you can

    search, you can search any data in the cluster. ‣If you can search, you can modify or delete data from that index.
  35. ElasticSearch is not a System of Record ‣Not legit for

    Legal Uses ‣That's O.K. we can handle that use case cheaply.
  36. ElasticSearch, Graphite of Logging? ‣Composable investigations with Kibana ‣Easy access

    to everything for everyone ‣Simple API (REST) and data format (JSON) ‣We can get pretty pictures from it! ‣Encourages interaction with data
  37. We're Hiring! Developers, System Administrators, Analysts, Designers! booking.com/jobs