A brief overview of the landscape of logging data with ElasticSearch followed by a number of lessons learned. By the end of the talk you should want to use ElasticSearch for logging and know enough to prevent shooting yourself in the foot.
‣Agile Development (for Structure!) ‣Test everything (mostly in production) ‣Failure is encouraged ‣IT Budget for taking the site down ‣Amazing Business Monitoring ‣KPI's for IT tied to business metrics ‣ElasticSearch was successful for Front-End
‣Dealing with "days" make sense ‣Maintenance Operations Easy: Delete, Optimize, Close, Open ‣Results in a higher number of shards ‣Which indexes do I search for 1 week of data? ‣Maintenance Operations Expensive ‣Potentially lower number of shards and even index sizes Daily Schema logstash-YYYY.MM.DD Capacity Schema Graylog2
index.auto_expand_replicas ‣Clustering order of operations issue ‣Can cause enormous data transfers between nodes leaving and entering a cluster ‣Defaults to true ‣You should set it to false
‣There are no query killers! ‣Memory is limited. ‣Aggressive caching by default consumes the heap. ‣This is normally good ‣Except when it's not ‣Thread pools are malleable by default, and maintaining buffers for them can also cost memory. A Perl programmer learns about Java memory management.
There are no solutions, aside from firewalls. ‣If you can search, you can search any data in the cluster. ‣If you can search, you can modify or delete data from that index.
ElasticSearch, Graphite of Logging? ‣Composable investigations with Kibana ‣Easy access to everything for everyone ‣Simple API (REST) and data format (JSON) ‣We can get pretty pictures from it! ‣Encourages interaction with data