Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Berlin 2013 - Session - Brad Lhotsky

Monitorama
September 20, 2013
640

Berlin 2013 - Session - Brad Lhotsky

Monitorama

September 20, 2013
Tweet

Transcript

  1. ElasticSearch for Logging
    One Man's Sordid Journey of Discovery
    Brad Lhotsky
    http://twitter.com/reyjrar
    http://github.com/reyjrar

    View Slide

  2. ‣Agile Development (for Structure!)
    ‣Test everything (mostly in production)
    ‣Failure is encouraged
    ‣IT Budget for taking the site down
    ‣Amazing Business Monitoring
    ‣KPI's for IT tied to business metrics
    ‣ElasticSearch was successful for Front-End

    View Slide

  3. bouncing logs into ElasticSearch

    View Slide

  4. View Slide

  5. LogStash
    ‣Many Input / Filter / Output Plugins
    ‣Thriving Community
    ‣Daily Index Layout
    ‣Front-end? Not so much.

    View Slide

  6. Graylog2
    ‣Pluggable Event Stream
    ‣Excellent Front-end
    ‣Index Layout-based on number of documents

    View Slide

  7. ‣Dealing with "days"
    make sense
    ‣Maintenance Operations
    Easy: Delete, Optimize,
    Close, Open
    ‣Results in a higher
    number of shards
    ‣Which indexes do I
    search for 1 week of
    data?
    ‣Maintenance Operations
    Expensive
    ‣Potentially lower
    number of shards and
    even index sizes
    Daily Schema
    logstash-YYYY.MM.DD
    Capacity Schema
    Graylog2

    View Slide

  8. Shameful Self Plug
    https://github.com/reyjrar/es-utils
    Set of utilities for managing data in daily index schemas

    View Slide

  9. Roll Your Own!
    Perl ElasticSearch.pm
    Python pyes
    Ruby tire
    JavaScript Elastic.js
    http://www.elasticsearch.org/guide/clients/

    View Slide

  10. You want pretty pictures?

    View Slide

  11. View Slide

  12. View Slide

  13. ‣Composable dashboards
    ‣Create incident specific dashboards while
    investigating the incident
    ‣Leverage the speed of ElasticSearch
    ‣Melt your cluster!

    View Slide

  14. View Slide

  15. View Slide

  16. View Slide

  17. ElasticSearch is Magic

    View Slide

  18. ElasticSearch
    Black Magic

    View Slide

  19. index.auto_expand_replicas
    ‣Clustering order of operations issue
    ‣Can cause enormous data transfers between
    nodes leaving and entering a cluster
    ‣Defaults to true
    ‣You should set it to false

    View Slide

  20. Understanding
    Shard Allocation
    Tales from the script
    gone wrong.

    View Slide

  21. View Slide

  22. View Slide

  23. View Slide

  24. View Slide

  25. View Slide

  26. View Slide

  27. curl -XPUT localhost:9200/_cluster/settings -d '{
    "transient" : {
    "cluster.routing.disable_allocation" : true
    }
    }
    For planned maintenance
    disable reallocation!

    View Slide

  28. curl -XPUT localhost:9200/_cluster/settings -d '{
    "transient" : {
    "cluster.routing.disable_allocation" : false
    }
    }
    Re-enable when your node is back.

    View Slide

  29. ‣There are no query killers!
    ‣Memory is limited.
    ‣Aggressive caching by default consumes the
    heap.
    ‣This is normally good
    ‣Except when it's not
    ‣Thread pools are malleable by default, and
    maintaining buffers for them can also cost
    memory.
    A Perl programmer
    learns about Java
    memory management.

    View Slide

  30. Thanks to Jason for learning me the
    Graphites!
    http://goo.gl/XS0wzG

    View Slide

  31. index.cache.filter.max_size
    index.cache.filter.expire
    indices.fielddata.cache.size
    indices.fielddata.cache.expire
    Prevent Some Bad Queries

    View Slide

  32. threadpool:
    index:
    type: fixed
    size: 30
    queue_size: 1000
    reject_policy: caller
    Thread Pool Management
    (less relevant since 0.90.0)

    View Slide

  33. A Security guy asks
    about Access Control

    View Slide

  34. There are no solutions,
    aside from firewalls.
    ‣If you can search, you can search any data in
    the cluster.
    ‣If you can search, you can modify or delete
    data from that index.

    View Slide

  35. ElasticSearch is not a
    System of Record
    ‣Not legit for Legal Uses
    ‣That's O.K. we can handle that use case
    cheaply.

    View Slide

  36. ElasticSearch, Graphite of Logging?
    ‣Composable investigations with Kibana
    ‣Easy access to everything for everyone
    ‣Simple API (REST) and data format (JSON)
    ‣We can get pretty pictures from it!
    ‣Encourages interaction with data

    View Slide

  37. We're Hiring!
    Developers,
    System Administrators,
    Analysts,
    Designers!
    booking.com/jobs

    View Slide