Save 37% off PRO during our Black Friday Sale! »

Berlin 2013 - Session - Brad Lhotsky

0580d500edfdb2e5e80e4732ac8df1ea?s=47 Monitorama
September 20, 2013

Berlin 2013 - Session - Brad Lhotsky



September 20, 2013


  1. ElasticSearch for Logging One Man's Sordid Journey of Discovery Brad

  2. ‣Agile Development (for Structure!) ‣Test everything (mostly in production) ‣Failure

    is encouraged ‣IT Budget for taking the site down ‣Amazing Business Monitoring ‣KPI's for IT tied to business metrics ‣ElasticSearch was successful for Front-End
  3. bouncing logs into ElasticSearch

  4. None
  5. LogStash ‣Many Input / Filter / Output Plugins ‣Thriving Community

    ‣Daily Index Layout ‣Front-end? Not so much.
  6. Graylog2 ‣Pluggable Event Stream ‣Excellent Front-end ‣Index Layout-based on number

    of documents
  7. ‣Dealing with "days" make sense ‣Maintenance Operations Easy: Delete, Optimize,

    Close, Open ‣Results in a higher number of shards ‣Which indexes do I search for 1 week of data? ‣Maintenance Operations Expensive ‣Potentially lower number of shards and even index sizes Daily Schema logstash-YYYY.MM.DD Capacity Schema Graylog2
  8. Shameful Self Plug Set of utilities for managing data

    in daily index schemas
  9. Roll Your Own! Perl Python pyes Ruby tire JavaScript

  10. You want pretty pictures?

  11. None
  12. None
  13. ‣Composable dashboards ‣Create incident specific dashboards while investigating the incident

    ‣Leverage the speed of ElasticSearch ‣Melt your cluster!
  14. None
  15. None
  16. None
  17. ElasticSearch is Magic

  18. ElasticSearch Black Magic

  19. index.auto_expand_replicas ‣Clustering order of operations issue ‣Can cause enormous data

    transfers between nodes leaving and entering a cluster ‣Defaults to true ‣You should set it to false
  20. Understanding Shard Allocation Tales from the script gone wrong.

  21. None
  22. None
  23. None
  24. None
  25. None
  26. None
  27. curl -XPUT localhost:9200/_cluster/settings -d '{ "transient" : { "cluster.routing.disable_allocation" :

    true } } For planned maintenance disable reallocation!
  28. curl -XPUT localhost:9200/_cluster/settings -d '{ "transient" : { "cluster.routing.disable_allocation" :

    false } } Re-enable when your node is back.
  29. ‣There are no query killers! ‣Memory is limited. ‣Aggressive caching

    by default consumes the heap. ‣This is normally good ‣Except when it's not ‣Thread pools are malleable by default, and maintaining buffers for them can also cost memory. A Perl programmer learns about Java memory management.
  30. Thanks to Jason for learning me the Graphites!

  31. index.cache.filter.max_size index.cache.filter.expire indices.fielddata.cache.size indices.fielddata.cache.expire Prevent Some Bad Queries

  32. threadpool: index: type: fixed size: 30 queue_size: 1000 reject_policy: caller

    Thread Pool Management (less relevant since 0.90.0)
  33. A Security guy asks about Access Control

  34. There are no solutions, aside from firewalls. ‣If you can

    search, you can search any data in the cluster. ‣If you can search, you can modify or delete data from that index.
  35. ElasticSearch is not a System of Record ‣Not legit for

    Legal Uses ‣That's O.K. we can handle that use case cheaply.
  36. ElasticSearch, Graphite of Logging? ‣Composable investigations with Kibana ‣Easy access

    to everything for everyone ‣Simple API (REST) and data format (JSON) ‣We can get pretty pictures from it! ‣Encourages interaction with data
  37. We're Hiring! Developers, System Administrators, Analysts, Designers!