Slide 1

Slide 1 text

ElasticSearch for Logging One Man's Sordid Journey of Discovery Brad Lhotsky http://twitter.com/reyjrar http://github.com/reyjrar

Slide 2

Slide 2 text

‣Agile Development (for Structure!) ‣Test everything (mostly in production) ‣Failure is encouraged ‣IT Budget for taking the site down ‣Amazing Business Monitoring ‣KPI's for IT tied to business metrics ‣ElasticSearch was successful for Front-End

Slide 3

Slide 3 text

bouncing logs into ElasticSearch

Slide 4

Slide 4 text

No content

Slide 5

Slide 5 text

LogStash ‣Many Input / Filter / Output Plugins ‣Thriving Community ‣Daily Index Layout ‣Front-end? Not so much.

Slide 6

Slide 6 text

Graylog2 ‣Pluggable Event Stream ‣Excellent Front-end ‣Index Layout-based on number of documents

Slide 7

Slide 7 text

‣Dealing with "days" make sense ‣Maintenance Operations Easy: Delete, Optimize, Close, Open ‣Results in a higher number of shards ‣Which indexes do I search for 1 week of data? ‣Maintenance Operations Expensive ‣Potentially lower number of shards and even index sizes Daily Schema logstash-YYYY.MM.DD Capacity Schema Graylog2

Slide 8

Slide 8 text

Shameful Self Plug https://github.com/reyjrar/es-utils Set of utilities for managing data in daily index schemas

Slide 9

Slide 9 text

Roll Your Own! Perl ElasticSearch.pm Python pyes Ruby tire JavaScript Elastic.js http://www.elasticsearch.org/guide/clients/

Slide 10

Slide 10 text

You want pretty pictures?

Slide 11

Slide 11 text

No content

Slide 12

Slide 12 text

No content

Slide 13

Slide 13 text

‣Composable dashboards ‣Create incident specific dashboards while investigating the incident ‣Leverage the speed of ElasticSearch ‣Melt your cluster!

Slide 14

Slide 14 text

No content

Slide 15

Slide 15 text

No content

Slide 16

Slide 16 text

No content

Slide 17

Slide 17 text

ElasticSearch is Magic

Slide 18

Slide 18 text

ElasticSearch Black Magic

Slide 19

Slide 19 text

index.auto_expand_replicas ‣Clustering order of operations issue ‣Can cause enormous data transfers between nodes leaving and entering a cluster ‣Defaults to true ‣You should set it to false

Slide 20

Slide 20 text

Understanding Shard Allocation Tales from the script gone wrong.

Slide 21

Slide 21 text

No content

Slide 22

Slide 22 text

No content

Slide 23

Slide 23 text

No content

Slide 24

Slide 24 text

No content

Slide 25

Slide 25 text

No content

Slide 26

Slide 26 text

No content

Slide 27

Slide 27 text

curl -XPUT localhost:9200/_cluster/settings -d '{ "transient" : { "cluster.routing.disable_allocation" : true } } For planned maintenance disable reallocation!

Slide 28

Slide 28 text

curl -XPUT localhost:9200/_cluster/settings -d '{ "transient" : { "cluster.routing.disable_allocation" : false } } Re-enable when your node is back.

Slide 29

Slide 29 text

‣There are no query killers! ‣Memory is limited. ‣Aggressive caching by default consumes the heap. ‣This is normally good ‣Except when it's not ‣Thread pools are malleable by default, and maintaining buffers for them can also cost memory. A Perl programmer learns about Java memory management.

Slide 30

Slide 30 text

Thanks to Jason for learning me the Graphites! http://goo.gl/XS0wzG

Slide 31

Slide 31 text

index.cache.filter.max_size index.cache.filter.expire indices.fielddata.cache.size indices.fielddata.cache.expire Prevent Some Bad Queries

Slide 32

Slide 32 text

threadpool: index: type: fixed size: 30 queue_size: 1000 reject_policy: caller Thread Pool Management (less relevant since 0.90.0)

Slide 33

Slide 33 text

A Security guy asks about Access Control

Slide 34

Slide 34 text

There are no solutions, aside from firewalls. ‣If you can search, you can search any data in the cluster. ‣If you can search, you can modify or delete data from that index.

Slide 35

Slide 35 text

ElasticSearch is not a System of Record ‣Not legit for Legal Uses ‣That's O.K. we can handle that use case cheaply.

Slide 36

Slide 36 text

ElasticSearch, Graphite of Logging? ‣Composable investigations with Kibana ‣Easy access to everything for everyone ‣Simple API (REST) and data format (JSON) ‣We can get pretty pictures from it! ‣Encourages interaction with data

Slide 37

Slide 37 text

We're Hiring! Developers, System Administrators, Analysts, Designers! booking.com/jobs