Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Dive Deep with Logstash – From Pipelines to Persistent Queues

Elastic Co
February 18, 2016

Dive Deep with Logstash – From Pipelines to Persistent Queues

Last year at Elastic{ON} you heard about what’s coming in Logstash. Core developers Andrew and Colin have been busy making this future a reality. Come see them demonstrate the work they’ve done so far to make Logstash more resilient and even easier to use!

Elastic Co

February 18, 2016
Tweet

More Decks by Elastic Co

Other Decks in Technology

Transcript

  1. ‹#› Colin Surprenant, Software Engineer Andrew Cholakian, Software Engineer Feb

    2016 Dive Deep with Logstash From Pipelines to Persistent Queues colinsurprenant andrewvc
  2. Agenda 2 Logstash quick intro/overview The (Old) Life of an

    Event Moving the Old Pipeline Into the Future Java Event Persistence 1 2 3 4 5
  3. Input Plugins Filter Plugins 4 Output Plugins date, advisor, alter,

    anonymize, checksum, cidr, cipher, geoip, clone, collate, csv,, dns, drop, elapsed, elasticsearch, environment, extractnumbers, fingerprint,gelfify, geoip, useragent, grep, grok, grokdiscovery, i18n, json, json_encode, kv, metaevent, metrics, multiline, mutate, noop, ~200 plugins
  4. 5

  5. Definitions • 3 stages processing • Orchestrate data flow •

    Manage queuing • Manage plugins lifecycle Pipeline • Internal data representation • Raw input data turned into Event at input • Event mutated across filters • The main API in the config language • The main API in plugins Event
  6. input { file { codec => lines } } filter

    { … } output { file { … codec => json } } Example Configuration - Codecs 8
  7. Agenda 11 Java Event Moving the Old Pipeline Into the

    Future Persistence The (Old) Life of an Event 2 4 3 Logstash quick intro/overview 1 5
  8. The Old (Logstash <= 2.1) Pipeline One event at a

    time with buffered queues 12 Input Codec Sized Queue (20) Filter Worker 1 Codec Input Output A Worker 1 Output A Worker 2 Output B Worker 1 Sized Queue (20) Filter Worker 2 Filter Worker (n)
  9. The Old (Logstash <= 2.1) Pipeline One event at a

    time with buffered queues 13 Input Codec Sized Queue (20) Filter Worker 1 Codec Input Output A Worker 1 Output A Worker 2 Output B Worker 1 Sized Queue (20) Filter Worker 2 Filter Worker (n)
  10. The Old (Logstash <= 2.1) Pipeline One event at a

    time with buffered queues 14 Input Codec Sized Queue (20) Filter Worker 1 Codec Input Output A Worker 1 Output A Worker 2 Output B Worker 1 Sized Queue (20) Filter Worker 2 Filter Worker (n)
  11. The Old (Logstash <= 2.1) Pipeline One event at a

    time with buffered queues 15 Input Codec Sized Queue (20) Filter Worker 1 Codec Input Output A Worker 1 Output A Worker 2 Output B Worker 1 Sized Queue (20) Filter Worker 2 Filter Worker (n)
  12. The Old (Logstash <= 2.1) Pipeline One event at a

    time with buffered queues 16 Input Codec Sized Queue (20) Filter Worker 1 Codec Input Output A Worker 1 Output A Worker 2 Output B Worker 1 Sized Queue (20) Filter Worker 2 Filter Worker (n)
  13. The Old (Logstash <= 2.1) Pipeline One event at a

    time with buffered queues 17 Input Codec Sized Queue (20) Filter Worker 1 Codec Input Output A Worker 1 Output A Worker 2 Output B Worker 1 Sized Queue (20) Filter Worker 2 Filter Worker (n)
  14. The Old (Logstash <= 2.1) Pipeline One event at a

    time with buffered queues 18 Input Codec Sized Queue (20) Filter Worker 1 Codec Input Output A Worker 1 Output A Worker 2 Output B Worker 1 Sized Queue (20) Filter Worker 2 Filter Worker (n)
  15. The Old (Logstash <= 2.1) Pipeline One event at a

    time with buffered queues 19 Input Codec Sized Queue (20) Filter Worker 1 Codec Input Output A Worker 1 Output A Worker 2 Output B Worker 1 Sized Queue (20) Filter Worker 2 Filter Worker (n)
  16. Agenda 20 Java Event The (Old) Life of an Event

    Persistence Moving the Old Pipeline Into the Future 3 4 2 Logstash quick intro/overview 1 5
  17. The Simplest Thing we Could Do Make Both Queues Durable

    24 Input Codec Sized Queue (20) Filter Worker 1 Codec Input Output A Worker 1 Output A Worker 2 Output B Worker 1 Sized Queue (20) Filter Worker 2 Filter Worker (n)
  18. The Simplest Thing we Could Do Make Both Queues Durable

    25 Input Codec Persistent Sized Queue (20) Filter Worker 1 Codec Input Output A Worker 1 Output A Worker 2 Output B Worker 1 Persistent Sized Queue (20) Filter Worker 2 Filter Worker (n)
  19. One Durable Queue, One In-Memory Make the First Queue Durable

    26 Input Codec Persistent Sized Queue (20) Filter Worker 1 Codec Input Output A Worker 1 Output A Worker 2 Output B Worker 1 Sized Queue (20) Filter Worker 2 Filter Worker (n)
  20. One Durable Queue 27 Input Codec Persistent Sized Queue (20)

    Codec Input Filters Outputs Filters Outputs Filters Outputs
  21. One Durable Queue + Batcher 28 Input Codec Persistent Sized

    Queue (20) Codec Input Filters Outputs Batcher Filters Outputs Batcher Filters Outputs Batcher
  22. 31 AFile.Log Line Data - 1 Line Data - 2

    Line Data - 3 Line Data - 4 File Input JSON Codec Synchronous Queue Worker Threads Queue “ACK” for persistence
  23. 32 AFile.Log Line Data - 1 Line Data - 2

    Line Data - 3 Line Data - 4 File Input JSON Codec Synchronous Queue Worker Threads Queue “ACK” for persistence Line Data - 1 Line Data - 1 Event / Line 1
  24. 33 JSON Codec Synchronous Queue Worker Threads Queue “ACK” for

    persistence Batcher Filters Outputs Batcher Filters Outputs Batcher Filters Outputs Where to? Event / Line 1
  25. 34 JSON Codec Synchronous Queue Worker Threads Queue “ACK” for

    persistence Batcher Filters Outputs Batcher Filters Outputs Batcher Filters Outputs Event / Line 1 Event / Line 1
  26. 35 JSON Codec Synchronous Queue Worker Threads Queue “ACK” for

    persistence Batcher Filters Outputs Batcher Filters Outputs Batcher Filters Outputs
  27. 36 JSON Codec Synchronous Queue Worker Threads Queue “ACK” for

    persistence Batcher Filters Outputs Batcher Filters Outputs Batcher Filters Outputs
  28. Output Processing Thread The Old (Logstash <= 2.1) Pipeline One

    event at a time with buffered queues 38 Input Codec Sized Queue (20) Filter Worker 1 Codec Input Output A Worker 1 Output A Worker 2 Output B Worker 1 Sized Queue (20) Filter Worker 2 Filter Worker (n) Input Thread Input Thread Filter Worker Thread Filter Worker Thread Filter Worker Thread Output Delegating Thread Output Worker Thread Output Worker Thread Output Worker Thread
  29. A Simpler Threading Story 39 Input Codec Persistent Sized Queue

    (20) Codec Input Filters Outputs Batcher Filters Outputs Batcher Filters Outputs Batcher Input Thread Input Thread Pipeline Worker Thread Pipeline Worker Thread Pipeline Worker Thread
  30. input { stdin {} } filter { grok { …

    } geoip {… } useragent { … } date { … } } output { codec => dots } Apache Pipeline (no IO) Overview Full Config @ https://gist.github.com/andrewvc/a5708783166e01d904ef 42
  31. User Execution Time for Apache Parser Parsing Apache common log

    format with Geo-IP, Date, and UserAgent filters 43 0 1000 2000 3000 4000 5000 User execution time (lower is better) Logstash 2.2.0 (NG) Logstash 2.1.2 User Time
  32. System Execution Time for Apache Parser Parsing Apache common log

    format with Geo-IP, Date, and UserAgent filters 44 0 20 40 60 80 100 System execution time (lower is better) Logstash 2.2.0 (NG) Logstash 2.1.2 System Time
  33. Wall Clock Execution Time Parsing Apache common log format with

    Geo-IP, Date, and UserAgent filters 45 0 100 200 300 400 500 Wall clock execution time in seconds (lower is better) Logstash 2.2.0 (NG) Logstash 2.1.2 Wall Time
  34. Wall Time This is the total processing time as measured

    by the clock on the wall. -26% -30% -13% Performance Summary 46 User Time Time spent executing userspace code System Time Time spent in kernel code, including resolving lock contention.
  35. input { file { … } } filter { grok

    { … } geoip {… } useragent { … } date { … } } output { elasticsearch { … } } Apache Pipeline (no IO) Overview Full test info @ https://github.com/elastic/logstash/pull/4340#issuecomment-164062362 48
  36. Event Throughput / Time 28.67% speedup on new pipeline comparing

    best case - worst case 49 0 1000 2000 3000 4000 5000 Events per second. Larger is better. Logstash 2.2.0 (NG) Logstash 2.1.2 Events per Second
  37. 50 Performance Tips • TEST EVERY CHANGE • Tune worker

    count with -w. More IO = more workers! • Tune batch count with -b. Bigger batches are not always better! • Batch size of pipeline is new max batch size for output plugins • Monitor GC activity for memory pressure! Source: Gray Arial10pt
  38. Agenda 51 The (Old) Life of an Event Moving the

    Old Pipeline Into the Future Persistence Java Event 4 2 3 Logstash quick intro/overview 1 5
  39. 52

  40. Event Object 53 Simplified Object Composition Event Accessors Timestamp 1

    Field reference handling - Config API - Plugin API 2 Date/Time normalization 3 Notable functions: - sprint() - to_json() & from_json
  41. filter { if [type] == "syslog" { mutate { add_field

    => [“[times][created_at]", "%{syslog_timestamp}"] add_field => [“[times][received_at]", "%{@timestamp}"] } } } Event Object Logstash Config Accessors 55 1 Field reference in conditional expression 2 Nested field reference
  42. filter { if [type] == "syslog" { mutate { add_field

    => [“[times][created_at]", "%{syslog_timestamp}"] add_field => [“[times][received_at]", "%{@timestamp}"] } } } Event Object Logstash Config sprintf() 56 1 sprints format string - refer to field values from within strings
  43. event[@target] = value if event[“[deep][field]”] == value event.tag(“sometag”) end event[“[deep][field]”]

    = event.sprintf(format) event.timestamp = LogStash::Timestamp.new json = event.to_json e = Event.from_json(s) Event Object Ruby plugin API 58 1 field reference getters & setters 2 tag setter sprints function 3 4 timestamp getter & setter 5 json serialization/deserialization
  44. Java Event Performance - config #1 61 1 Dec 16

    Pipeline-TNG merge 70% increase Jan 27 Java Event merge 60% increase 2
  45. Java Event Performance - config #2 63 1 Dec 16

    Pipeline-TNG merge 70% increase Jan 31 Java Event fix merge 50% increase 3 Jan 27 Java Event merge -90% decrease 2
  46. Why Java? 64 Java API • Paves the way for

    native Java/Scala/Closure/ Groovy plugins • Share plugins with ES Ingest Node Faster Serialization • Java Serializable interface • Pure inner Java data structures Faster Persistence • Leverage Faster Serialization • Direct access to Java NIO + Memory Mapping
  47. Java Event 100% Ruby plugins compatibility 65 Event Accessors Timestamp

    JRuby API Proxy 1 explicit control over Ruby/Java type conversions 2 Pure Java internal objects representation
  48. Agenda 66 Persistence The (Old) Life of an Event Moving

    the Old Pipeline Into the Future Java Event Logstash quick intro/overview 5 2 3 4 1
  49. Reliability legacy pipeline 69 2 bulk requests batching 1000s of

    items 1 look ‘ma - another queue Output Stage
  50. The Road to Reliability 70 Java Event 1 2 Filter

    & Output Merged Micro Batching & Acknowledgement 3 ✓ ✓
  51. The Road to Reliability 71 input filter + output Persistent

    Queue ACK N batch N batch N+1 batch N+2 batch N
  52. Simplified Architecture 76 elasticsearch Payments   Server Database Web  

    Server … … … … … 1 Variable size persistent queue
  53. ‹#› Colin Surprenant, Software Engineer Andrew Cholakian, Software Engineer Feb

    2016 Dive Deep with Logstash From Pipelines to Persistent Queues colinsurprenant andrewvc
  54. ‹#› Please attribute Elastic with a link to elastic.co Except

    where otherwise noted, this work is licensed under http://creativecommons.org/licenses/by-nd/4.0/ Creative Commons and the double C in a circle are registered trademarks of Creative Commons in the United States and other countries. Third party marks and brands are the property of their respective holders. 78