Slide 1

Slide 1 text

Log Aggregation Using Fluentd + Elasticsearch + Kibana❤️ hsatac@KKTIX, 2014 1

Slide 2

Slide 2 text

Ash Wu hSATAC hsatac@KKTIX, 2014 2

Slide 3

Slide 3 text

hsatac@KKTIX, 2014 3

Slide 4

Slide 4 text

Issues to Solve • Log files are scattered across servers. • grep & zgrep through all the mess. • Servers come and go(autoscaling) and we lose log data. • Keep logs of last 6 months. => Querying & Archiving hsatac@KKTIX, 2014 4

Slide 5

Slide 5 text

Fluentd Comes to Rescue hsatac@KKTIX, 2014 5

Slide 6

Slide 6 text

Fluentd • Data collector • Open sourced • Written in Ruby (Easy to write plugins) • Combined C language and Ruby for performance • Process 13,000 events/second/core • Simple, flexible and reliable. hsatac@KKTIX, 2014 6

Slide 7

Slide 7 text

Before hsatac@KKTIX, 2014 7

Slide 8

Slide 8 text

After hsatac@KKTIX, 2014 8

Slide 9

Slide 9 text

No Fixed 2-tier Setup • Where's the server? Where's the clients? • A fluentd intance is a node. • Each node and be a server, or a client, or both at the same time. • Define the data flow and servers topology as you like. hsatac@KKTIX, 2014 9

Slide 10

Slide 10 text

Data Flow Input & Output hsatac@KKTIX, 2014 10

Slide 11

Slide 11 text

Input • in_forward (listen on udp & tcp) • in_http • in_tail (tail a regular file) • in_exec • in_syslog hsatac@KKTIX, 2014 11

Slide 12

Slide 12 text

Output • out_file • out_foward (Send to other fluentd instance) • out_exec • out_copy • out_stdout • out_s3 hsatac@KKTIX, 2014 12

Slide 13

Slide 13 text

Collector Example hsatac@KKTIX, 2014 13

Slide 14

Slide 14 text

Buffered Output e.g. out_file, out_forward Events will be flushed when: • Events chunk size exceeds buffer_chunk_limit • Time limie flush_interval hsatac@KKTIX, 2014 14

Slide 15

Slide 15 text

hsatac@KKTIX, 2014 15

Slide 16

Slide 16 text

Time Sliced Plugin • Flush regularly (daily, hourly...) according to time_slice_format. • e.g. %Y%m%d for daily chunk • Just like logrotate hsatac@KKTIX, 2014 16

Slide 17

Slide 17 text

Station Example hsatac@KKTIX, 2014 17

Slide 18

Slide 18 text

Overview hsatac@KKTIX, 2014 18

Slide 19

Slide 19 text

Overview hsatac@KKTIX, 2014 19

Slide 20

Slide 20 text

Or... hsatac@KKTIX, 2014 20

Slide 21

Slide 21 text

Or... hsatac@KKTIX, 2014 21

Slide 22

Slide 22 text

Or... hsatac@KKTIX, 2014 22

Slide 23

Slide 23 text

Elasticsearch & Kibana • Elasticsearch • Distributed restful search and analytics (schema free) • Kibana • Realtime query frontend • Data visualizer • No code required hsatac@KKTIX, 2014 23

Slide 24

Slide 24 text

Kibana Basics • Query & filters • Panels (charts) hsatac@KKTIX, 2014 24

Slide 25

Slide 25 text

hsatac@KKTIX, 2014 25

Slide 26

Slide 26 text

hsatac@KKTIX, 2014 26

Slide 27

Slide 27 text

hsatac@KKTIX, 2014 27

Slide 28

Slide 28 text

hsatac@KKTIX, 2014 28

Slide 29

Slide 29 text

hsatac@KKTIX, 2014 29

Slide 30

Slide 30 text

hsatac@KKTIX, 2014 30

Slide 31

Slide 31 text

hsatac@KKTIX, 2014 31

Slide 32

Slide 32 text

hsatac@KKTIX, 2014 32

Slide 33

Slide 33 text

hsatac@KKTIX, 2014 33

Slide 34

Slide 34 text

Rails & Fluentd Integration hsatac@KKTIX, 2014 34

Slide 35

Slide 35 text

Attempt 1 Official Document hsatac@KKTIX, 2014 35

Slide 36

Slide 36 text

Official Document http://www.fluentd.org/datasources/rails • Rails side: • lograge gem (supress log) • act-fluent-logger-rails send log to fluentd hsatac@KKTIX, 2014 36

Slide 37

Slide 37 text

Official Document http://www.fluentd.org/datasources/rails • Fluentd side: • fluent-plugin-parser plugin to parse JSON. • fluent-plugin-elasticsearch plugin send parsed data to elasticsearch. hsatac@KKTIX, 2014 37

Slide 38

Slide 38 text

FAIL hsatac@KKTIX, 2014 38

Slide 39

Slide 39 text

Official Document http://www.fluentd.org/datasources/rails • lograge is good for access log, but what about other logs? • act-fluent-logger-rails crashes puma • What's all the fuss about JSON encode / decode thing? hsatac@KKTIX, 2014 39

Slide 40

Slide 40 text

Attempt 2 hsatac@KKTIX, 2014 40

Slide 41

Slide 41 text

Attempt 2 • act-fluent-logger-rails only to keep all the logs. • Replace puma with unicorn to avoid threading issue. hsatac@KKTIX, 2014 41

Slide 42

Slide 42 text

FAIL hsatac@KKTIX, 2014 42

Slide 43

Slide 43 text

Attempt 2 • All logs are kept, but the format is not search friendly. 2013-01-18T15:04:50+09:00 foo { "messages":"Started GET \"/\" for 127.0.0.1 at 2013-01-18 15:04:49 +0900\n Processing by TopController#index as HTML\n Completed 200 OK in 635ms (Views: 479.3ms | ActiveRecord: 39.6ms)"], "level":"INFO" } • Unicorn return weird 404 with act-fluent-logger-rails gem under massive requests. hsatac@KKTIX, 2014 43

Slide 44

Slide 44 text

Logging should never interfere with the functionality of your application. — Jack Ma hsatac@KKTIX, 2014 44

Slide 45

Slide 45 text

I didn't say that. — Jack Ma hsatac@KKTIX, 2014 45

Slide 46

Slide 46 text

Attempt 3 New Strategy hsatac@KKTIX, 2014 46

Slide 47

Slide 47 text

Attempt 3 • Rails side: • Remove act-fluent-logger-rails, unstable. • Keep lograge gem, using Logstash formatter. • logstash-logger gem converts other logs into logstash format. • Output logs to production.log. hsatac@KKTIX, 2014 47

Slide 48

Slide 48 text

Attempt 3 • Fluentd side: • Tail production.log with format json • Forward to station fluentd. • Done. hsatac@KKTIX, 2014 48

Slide 49

Slide 49 text

Thanks Q & A hsatac@KKTIX, 2014 49