ElasticSearch for Logging

ElasticSearch for Logging One Man's Sordid Journey of Discovery Brad
Lhotsky http://twitter.com/reyjrar http://github.com/reyjrar

‣Agile Development (for Structure!) ‣Test everything (mostly in production) ‣Failure
is encouraged ‣IT Budget for taking the site down ‣Amazing Business Monitoring ‣KPI's for IT tied to business metrics ‣ElasticSearch was successful for Front-End

bouncing logs into ElasticSearch

LogStash ‣Many Input / Filter / Output Plugins ‣Thriving Community
‣Daily Index Layout ‣Front-end? Not so much.

Graylog2 ‣Pluggable Event Stream ‣Excellent Front-end ‣Index Layout-based on number
of documents

‣Dealing with "days" make sense ‣Maintenance Operations Easy: Delete, Optimize,
Close, Open ‣Results in a higher number of shards ‣Which indexes do I search for 1 week of data? ‣Maintenance Operations Expensive ‣Potentially lower number of shards and even index sizes Daily Schema logstash-YYYY.MM.DD Capacity Schema Graylog2

Shameful Self Plug https://github.com/reyjrar/es-utils Set of utilities for managing data
in daily index schemas

Roll Your Own! Perl ElasticSearch.pm Python pyes Ruby tire JavaScript
Elastic.js http://www.elasticsearch.org/guide/clients/

You want pretty pictures?

‣Composable dashboards ‣Create incident speciﬁc dashboards while investigating the incident
‣Leverage the speed of ElasticSearch ‣Melt your cluster!

ElasticSearch is Magic

ElasticSearch Black Magic

index.auto_expand_replicas ‣Clustering order of operations issue ‣Can cause enormous data
transfers between nodes leaving and entering a cluster ‣Defaults to true ‣You should set it to false

Understanding Shard Allocation Tales from the script gone wrong.

curl -XPUT localhost:9200/_cluster/settings -d '{ "transient" : { "cluster.routing.disable_allocation" :
true } } For planned maintenance disable reallocation!

curl -XPUT localhost:9200/_cluster/settings -d '{ "transient" : { "cluster.routing.disable_allocation" :
false } } Re-enable when your node is back.

‣There are no query killers! ‣Memory is limited. ‣Aggressive caching
by default consumes the heap. ‣This is normally good ‣Except when it's not ‣Thread pools are malleable by default, and maintaining buffers for them can also cost memory. A Perl programmer learns about Java memory management.

Thanks to Jason for learning me the Graphites! http://goo.gl/XS0wzG

index.cache.filter.max_size index.cache.filter.expire indices.fielddata.cache.size indices.fielddata.cache.expire Prevent Some Bad Queries

threadpool: index: type: fixed size: 30 queue_size: 1000 reject_policy: caller
Thread Pool Management (less relevant since 0.90.0)

A Security guy asks about Access Control

There are no solutions, aside from firewalls. ‣If you can
search, you can search any data in the cluster. ‣If you can search, you can modify or delete data from that index.

ElasticSearch is not a System of Record ‣Not legit for
Legal Uses ‣That's O.K. we can handle that use case cheaply.

ElasticSearch, Graphite of Logging? ‣Composable investigations with Kibana ‣Easy access
to everything for everyone ‣Simple API (REST) and data format (JSON) ‣We can get pretty pictures from it! ‣Encourages interaction with data

We're Hiring! Developers, System Administrators, Analysts, Designers! booking.com/jobs

ElasticSearch for Logging

ElasticSearch for Logging

Brad Lhotsky

More Decks by Brad Lhotsky

Other Decks in Technology

Featured

Transcript