What is Logstash? What is Logstash? ● Log management framework ● Actually been around a while (2008ish) ● Started by Jordan Sissel and Pete Fritchman ● Stuff comes in ● Bits get twiddled ● Stuff comes out
What is Logstash? What is Logstash? ● Log management framework ● Actually been around a while (2008ish) ● Started by Jordan Sissel and Pete Fritchman ● Stuff comes in ● Bits get twiddled ● Stuff comes out INPUT FILTER OUTPUT
Inputs Inputs This is where events enter the pipeline e.g. the source (and optionally the format) ● STDIN ● Twitter search ● Socket ● ZeroMQ ● Heroku ● AMQP ● Currently 14 inputs in MASTER
Quick note on Input “formats” Quick note on Input “formats” ● Assumed format for incoming events is defined by the plugin ● Most plugins assume plain text ● Others assume JSON ● Some speak 'json_event'* * I'll get to this in a moment
Hash ALL the things Hash ALL the things ● Once event is received, converted to Ruby hash for the remainder of the pipeline { "@source" => "", "@type" => "", "@tags" => [], "@fields" => {}, "@timestamp" => "", "@source_host" => "", "@source_path" => "", "@message" => "" }
Hey man, nice shot! Hey man, nice shot! ● Filters are where you do the work ● Break the “@message @message” into constituent parts ● Identify original timestamp ● Add tags ● Move parts around ● External processing via 0mq ● Currently 13 filters in MASTER
Grok and Roll Grok and Roll ● DRY and RAD for RegEx ● Originally a C library ● Pure Ruby version in Logstash since 1.1 ● Identify patterns, attach identifier
Named fields Named fields ● When you 'identify' something, it is added to the hash under the @fields key { "@source" => "", "@type" => "", "@tags" => [“haproxy_event”], "@fields" => {“syslog_timestamp” => “May 11 06:00:27”}, "@timestamp" => "", "@source_host" => "", "@source_path" => "", "@message" => "" }
Rinse/Repeat Rinse/Repeat ● Filters are processed in order ● Once a field is identified, can be used in interpolation later ● %{syslog_timestamp} ● %{@type} ● @ fields are special but not sacred. ● date and mutate filters for instance.
Get out(put) Get out(put) ● An event can be routed to all or a subset of defined outputs based on various criteria ● Outputs block (sort of) ● Logstash takes a default position that you don't want to lose an event ● 1 thread per defined output each ● ALWAYS use a stdout output for debugging ● This is where it gets REALLY cool ● Currently 27 outputs in MASTER
What does it mean? What does it mean? ● When a message is tagged: “haproxy-event” ● I want to write to Graphite: A value of 1 ● as 'stats.enstratus.X.request_type.Y' ● Where X is the source of the event ● And Y is the HTTP verb
And publishing them for remote And publishing them for remote tailing tailing zeromq { topology => "pubsub" address => "tcp://*:5558" mode => "server" topic => "%{@source_host}.% {level}.%{log}" }
Let's use a familiar example Let's use a familiar example ● Default Chef Handler JSON files ● Parse with Logstash ● Apply some filters ● Send to some places
Chef-handler JSON logs Chef-handler JSON logs ● We're not going to be concerned with how you get them INTO logstash ● chef-gelf handler works (logstash has a gelf input) ● You can write your own (I'm partial to ZeroMQ!) ● Set your input type to “json” ● Set the “type” to something that flags it as a chef event. ● If you send the WHOLE thing, be prepared to cut some stuff (you don't really want the Ohai data in your logs)
What do we care about? What do we care about? { “node”:{“name”:”foo”}, “success”: true, “start_time”:”2012-05-14 01:09:31 +0000”, “end_time”:”2012-05-14 01:10:46 +0000”, “elapsed_time”:”1.14”, “updated_resources”:[], “exception”:””, “backtrace”:”” }
Apply a date filter Apply a date filter (set @timestamp to start_time) (set @timestamp to start_time) date { start_time => “yyyy-MM-dd HH:mm:ss Z”, type => “chef_handler” }
Send result and timing to Send result and timing to Graphite Graphite graphite { metrics => [“stats.% {@source_host}.chef_run.success.% {success}”,”1”, “stats.% {@source_host}.chef_run.duration”, % {elapsed_time}], type => “chef_handler” }
A better approach A better approach ● Create a custom handler ● Build custom JSON from data ● Strip the extra stuff ● Join stack trace array elements into a single newline-separated value ● Send to Logstash and fanout from there
The Future The Future ● Run-time Configuration Changes ● No more restarting ● Push support ● MOAR PLUGINS! ● Internal Metrics Channel ● input { metrics } ● Improved AMQP Support ● Your imagination