What is Logstash? Do you use it? What is it for? Why Ruby? How does it work?
This talk also covers what the newly released Logstash 1.5 brings and what's planned for 2.x
Presented at DevoxxPL 2015 by João Duarte on the 24th of June, 2015
software logging is targeted at humans •Single line, multi line •Plain text, json, xml •Log4j, Log files, Syslog, TCP, UDP •Don’t get me started on date formats Heterogeneity
at Oracle Labs in 2013 • Implementation of the Ruby language using • Graal dynamic compiler • Truffle AST interpreter framework • Simpler code, evolving AST • control over the compiler with Graal https://github.com/jruby/jruby/wiki/Truffle
Input: capture an occurrence in outside world into an event; • Filter: transform, drop and validate events; • Output: send an event to the outside world.
27.115751 Lock_time: 0.000070 Rows_sent: 55996 Rows_examined: 56000 SET timestamp=1345373510; SELECT ID FROM wp_posts WHERE post_parent = 17 AND post_status IN ( 'publish', 'closed' ) AND post_type = 'topic' ORDER BY ID DESC; .....
Query_time: 27.115751 Lock_time: 0.000070 Rows_sent: 55996 Rows_examined: 56000\nSET timestamp=1345373510;\nSELECT ID FROM wp_posts WHERE post_parent = 17 AND post_status IN ( 'publish', 'closed' ) AND post_type = 'topic' ORDER BY ID DESC;", "@timestamp" => "2015-05-11T12:23:31.241Z", "path" => "/var/lib/mysql/mysql-slow.log" "tags" => [ [0] "multiline" ] }
Query_time: 27.115751 Lock_time: 0.000070 Rows_sent: 55996 Rows_examined: 56000\nSET timestamp=1345373510;\nSELECT ID FROM wp_posts WHERE post_parent = 17 AND post_status IN ( 'publish', 'closed' ) AND post_type = 'topic' ORDER BY ID DESC;", "@timestamp" => "2015-05-11T12:23:31.241Z", "path" => "/var/lib/mysql/mysql-slow.log" "tags" => [ [0] "multiline" ] }
27.115751 Lock_time: 0.000070 Rows_sent: 55996 Rows_examined: 56000\nSET timestamp=1345373510;\nSELECT ID FROM wp_posts WHERE post_parent = 17 AND post_status IN ( 'publish', 'closed' ) AND post_type = 'topic' ORDER BY ID DESC;
27.115751 Lock_time: 0.000070 Rows_sent: 55996 Rows_examined: 56000\nSET timestamp=1345373510;\nSELECT ID FROM wp_posts WHERE post_parent = 17 AND post_status IN ( 'publish', 'closed' ) AND post_type = 'topic' ORDER BY ID DESC;
27.115751 Lock_time: 0.000070 Rows_sent: 55996 Rows_examined: 56000\nSET timestamp=1345373510;\nSELECT ID FROM wp_posts WHERE post_parent = 17 AND post_status IN ( 'publish', 'closed' ) AND post_type = 'topic' ORDER BY ID DESC; # Time: %{GREEDYDATA:time}\n# Query_time: %{NUMBER:query_time:float} Lock_time: %{NUMBER:lock_time:float} Rows_sent: %{NUMBER:rows_sent:int} Rows_examined: %{NUMBER:rows_examined:int}\nSET timestamp= %{NUMBER:query_timestamp};\n%{GREEDYDATA:query}
27.115751 Lock_time: 0.000070 Rows_sent: 55996 Rows_examined: 56000\nSET timestamp=1345373510;\nSELECT ID FROM wp_posts WHERE post_parent = 17 AND post_status IN ( 'publish', 'closed' ) AND post_type = 'topic' ORDER BY ID DESC; # Time: %{GREEDYDATA:time}\n# Query_time: %{NUMBER:query_time:float} Lock_time: %{NUMBER:lock_time:float} Rows_sent: %{NUMBER:rows_sent:int} Rows_examined: %{NUMBER:rows_examined:int}\nSET timestamp= %{NUMBER:query_timestamp};\n%{GREEDYDATA:query}
27.115751 Lock_time: 0.000070 Rows_sent: 55996 Rows_examined: 56000\nSET timestamp=1345373510;\nSELECT ID FROM wp_posts WHERE post_parent = 17 AND post_status IN ( 'publish', 'closed' ) AND post_type = 'topic' ORDER BY ID DESC; # Time: %{GREEDYDATA:time}\n# Query_time: %{NUMBER:query_time:float} Lock_time: %{NUMBER:lock_time:float} Rows_sent: %{NUMBER:rows_sent:int} Rows_examined: %{NUMBER:rows_examined:int}\nSET timestamp= %{NUMBER:query_timestamp};\n%{GREEDYDATA:query}
27.115751 Lock_time: 0.000070 Rows_sent: 55996 Rows_examined: 56000\nSET timestamp=1345373510;\nSELECT ID FROM wp_posts WHERE post_parent = 17 AND post_status IN ( 'publish', 'closed' ) AND post_type = 'topic' ORDER BY ID DESC; # Time: %{GREEDYDATA:time}\n# Query_time: %{NUMBER:query_time:float} Lock_time: %{NUMBER:lock_time:float} Rows_sent: %{NUMBER:rows_sent:int} Rows_examined: %{NUMBER:rows_examined:int}\nSET timestamp= %{NUMBER:query_timestamp};\n%{GREEDYDATA:query}
27.115751 Lock_time: 0.000070 Rows_sent: 55996 Rows_examined: 56000\nSET timestamp=1345373510;\nSELECT ID FROM wp_posts WHERE post_parent = 17 AND post_status IN ( 'publish', 'closed' ) AND post_type = 'topic' ORDER BY ID DESC; # Time: %{GREEDYDATA:time}\n# Query_time: %{NUMBER:query_time:float} Lock_time: %{NUMBER:lock_time:float} Rows_sent: %{NUMBER:rows_sent:int} Rows_examined: %{NUMBER:rows_examined:int}\nSET timestamp= %{NUMBER:query_timestamp};\n%{GREEDYDATA:query}
27.115751 Lock_time: 0.000070 Rows_sent: 55996 Rows_examined: 56000\nSET timestamp=1345373510;\nSELECT ID FROM wp_posts WHERE post_parent = 17 AND post_status IN ( 'publish', 'closed' ) AND post_type = 'topic' ORDER BY ID DESC; # Time: %{GREEDYDATA:time}\n# Query_time: %{NUMBER:query_time:float} Lock_time: %{NUMBER:lock_time:float} Rows_sent: %{NUMBER:rows_sent:int} Rows_examined: %{NUMBER:rows_examined:int}\nSET timestamp= %{NUMBER:query_timestamp};\n%{GREEDYDATA:query}
27.115751 Lock_time: 0.000070 Rows_sent: 55996 Rows_examined: 56000\nSET timestamp=1345373510;\nSELECT ID FROM wp_posts WHERE post_parent = 17 AND post_status IN ( 'publish', 'closed' ) AND post_type = 'topic' ORDER BY ID DESC;", "@timestamp" => "2015-05-11T13:32:16.328Z", "path" => "/var/lib/mysql/mysql-slow.log", "tags" => [ [0] "multiline" ], "time" => "120819 5:51:50", "query_time" => 27.115751, "lock_time" => 7.0e-05, "rows_sent" => 55996, "rows_examined" => 56000, "query_timestamp" => "1345373510", "query" => "SELECT ID FROM wp_posts WHERE post_parent = 17 AND post_status IN ( 'publish', 'closed' ) AND post_type = 'topic' ORDER BY ID DESC;" } Added structure to the event
Core and plugins can have separate release cycles • Install/Uninstall/Update Plugins • from rubygems.org, local .gem file, local path • A plugin's spec suite can be executed in its repo
usage scenarios => multiple gemsets • run logstash from a clean git clone • run core tests from a clean git clone • package a release from a clean git clone • run logstash from a release • run plugin tests from a release The fix: • patching Bundler to reduce side-effects
a logstash instance • Allow a set of logstash instances to fetch configurations from a common location • Allow dynamic configuration updates • Coordinate multiple instances to reach a cluster-level entity Add support for clustering Logstash instances https://github.com/elastic/logstash/issues/2632
running instance • Use an (REST?) API to do runtime status queries of • health • throughput • queue sizes • latencies • Minimize impact of extraction (make it togglable?) Provide APIs to monitor pipeline https://github.com/elastic/logstash/issues/2611
plugins • Must not alienate contributions • Testing needs to be easier • How to easily communicate the quality status b) Improve integration testing • Lot of experimentation with containers c) Better performance d) Predictable behaviour
it • Create a plugin • File an issue • Write a test • Experiment with the ELK stack • Logstash → Elasticsearch → Kibana • Complain on IRC.. I’m "jsvd" on freenode#logstash