Slide 1

Slide 1 text

Norikra to realtime log analytics harukasan / MICHII Shunsuke

Slide 2

Slide 2 text

Harukasan / MICHII Shunsuke - Infrastructure engineer in pixiv since 2012 - Develops contents distribution / convertor / storage - distributes up-to 16Gbps image traffic - Log collecting/analytics platform - Elasticsearch/Kibana - Fluentd

Slide 3

Slide 3 text

http://qiita.com/harukasan/items/957012833e5a361f7aa1

Slide 4

Slide 4 text

http://qiita.com/harukasan/items/a7c1dd1a11a61cd1ad75

Slide 5

Slide 5 text

Agenda - Log ecosystem - Batch processing vs. Stream processing - Getting started with Norikra - Norikra Deployment

Slide 6

Slide 6 text

Application Application Application Database Storage service HDFS RDB / Other  ʁ rsync syslog ssh custom script … Storage log storage

Slide 7

Slide 7 text

Application Application Application Database Storage service Fluentd HDFS RDB / Other  Storage log storage

Slide 8

Slide 8 text

Application Application Application Database Storage service Fluentd HDFS RDB / Other  Storage Google BigQuery Elasticsearch MongoDB log storage Treasure Data

Slide 9

Slide 9 text

Application Application Application Database Storage service HDFS RDB / Other  Storage Google BigQuery Elasticsearch MongoDB log storage Treasure Data Fluentd

Slide 10

Slide 10 text

Application Application Application Database Storage service HDFS RDB / Other  Storage Google BigQuery Elasticsearch MongoDB log storage Treasure Data Fluentd Kibana Spreadsheet HRForecast Tableau GrowthForecast Custom Script visualisation / analytics

Slide 11

Slide 11 text

Application Application Application Database Storage service HDFS RDB / Other  Storage Google BigQuery Elasticsearch MongoDB log storage Treasure Data Fluentd Kibana Spreadsheet HRForecast Tableau GrowthForecast Custom Script visualisation / analytics GAS

Slide 12

Slide 12 text

Application Application Application Database Storage service HDFS RDB / Other  Storage Google BigQuery Elasticsearch MongoDB log storage Treasure Data Fluentd Kibana Spreadsheet HRForecast Tableau GrowthForecast Custom Script visualisation / analytics Shib

Slide 13

Slide 13 text

Application Application Application Database Storage service HDFS RDB / Other  Storage Google BigQuery Elasticsearch MongoDB log storage Treasure Data Fluentd Kibana Spreadsheet HRForecast Tableau GrowthForecast Custom Script visualisation / analytics

Slide 14

Slide 14 text

Application Application Application Database Storage pixiv RDB / Other  Storage Google BigQuery Elasticsearch MongoDB log storage Fluentd Kibana HRForecast Tableau Custom Script visualisation / analytics Jenkins

Slide 15

Slide 15 text

Log ecosystem with Fluentd - Every log can stream to any type storages/queues - Every log are converted to structured data

Slide 16

Slide 16 text

Log Analytics Batch processing Ad-hoc analysis Offline analysis Stream processing

Slide 17

Slide 17 text

Batch processing Daily / Weekly / Monthly Reporting - page view - conversion count - num. of events デイリーレポート ================ - 2015/06/03更新 ■ページビュー 2015/05/30 (水) 888888 PV 2015/05/30 (木) 888888 PV 2015/05/30 (金) 888888 PV 2015/05/30 (土) 888888 PV 2015/05/31 (日) 888888 PV ★過去最高 2015/06/01 (月) 888888 PV 2015/06/02 (火) 888888 PV 2015/06/03 (水) 888888 PV ■新規登録数 2015/05/30 (水) 8888 人

Slide 18

Slide 18 text

Ad-hoc analysis - Kibana with Elasticsearch - BI Tools: Tableau, QlickView, Pentaho…

Slide 19

Slide 19 text

Offline Analysis - Excel is awesome - Analysis small data on laptops - Many techniques and know-how in Japan

Slide 20

Slide 20 text

Sometimes, Batch processes
 are too heavy Minutely Report - to know burst access - to know changes in the day Minutely Notification - to report error - to detect attacks

Slide 21

Slide 21 text

Stream Processing
 to realtime analytics - Process small data (almost case, in-memory) - High throughput - Low latency time window data stream 1 min.

Slide 22

Slide 22 text

Norikra - Streaming processing server - Schema-less - Use SQL-like query

Slide 23

Slide 23 text

Realtime Aggregation SELECT COUNT(1, status REGEXP '^2..$') AS count_2xx, COUNT(1, status REGEXP '^3..$') AS count_3xx, COUNT(1, status REGEXP '^4..$') AS count_4xx, COUNT(1, status REGEXP '^5..$') AS count_5xx FROM access_log.win:time_batch(1 min)

Slide 24

Slide 24 text

Fluentd Norikra

Slide 25

Slide 25 text

Output from fluent-plugin-norikra type forward # output to Norikra type norikra norikra localhost:26571 # specify norikra host (26571: default port) target_map_tag true # create target with tag

Slide 26

Slide 26 text

Auto generated targets

Slide 27

Slide 27 text

Fluentd Norikra Elasticsearch GrowthForecast Idobata Google BigQuery ?

Slide 28

Slide 28 text

Fluentd Norikra Elasticsearch GrowthForecast Idobata Google BigQuery Fluentd

Slide 29

Slide 29 text

Sweep from Norikra type norikra norikra localhost:26571 method sweep # sweep output of query target gf # specify query group tag query_name # use query_name as tag tag_prefix norikra.gf # add tag prefix interval 10s …

Slide 30

Slide 30 text

Sweep from Norikra … method sweep # sweep output of query target idobata # specify query group tag query_name # use query_name as tag tag_prefix norikra.idobata # add tag prefix interval 10s …

Slide 31

Slide 31 text

Sweep from Norikra … method sweep # sweep output of query target es # specify query group tag query_name # use query_name as tag tag_prefix norikra.es # add tag prefix interval 10s

Slide 32

Slide 32 text

Output to GrowthForecast type growthforecast remove_prefix norikra.gf name_key_pattern . gfapi_url http://localhost:5125/api/ graph_path norikra/${tag}/${key_name}

Slide 33

Slide 33 text

Output to Idobata type idobata webhook_url #{put_your_hook_url_here} message_template <%= record['message'] %>

Slide 34

Slide 34 text

Routing query to output

Slide 35

Slide 35 text

HTTP Status count SELECT COUNT(1, status REGEXP '^2..$') AS count_2xx, COUNT(1, status REGEXP '^3..$') AS count_3xx, COUNT(1, status REGEXP '^4..$') AS count_4xx, COUNT(1, status REGEXP '^5..$') AS count_5xx FROM access_log.win:time_batch(1 min) Name status_count Group gf Query

Slide 36

Slide 36 text

HTTP Status count SELECT COUNT(1, status REGEXP '^2..$') AS count_2xx, COUNT(1, status REGEXP '^3..$') AS count_3xx, COUNT(1, status REGEXP '^4..$') AS count_4xx, COUNT(1, status REGEXP '^5..$') AS count_5xx FROM access_log.win:time_batch(1 min) Name status_count Group mackerel Query

Slide 37

Slide 37 text

HTTP Status count SELECT "Notify: over 1000 access" AS message, COUNT(*) AS count FROM access_log.win:time_batch(1 min) WHERE count > 1000 Name notify_error Group idobata Query

Slide 38

Slide 38 text

Fluentd Norikra Elasticsearch GrowthForecast Idobata Google BigQuery Fluentd Output to anywhere with Fluentd

Slide 39

Slide 39 text

Norikra Deployment

Slide 40

Slide 40 text

Application Application Application Database Storage service Fluentd Active Fluentd Standby Computing node Norikra Fluentd GrowthForecast SPOF

Slide 41

Slide 41 text

Hardware structure - Norikra needs many memory (min. 8GB) - CPU cores are not so much required - Norikra is SPOF yet - Norikra can’t share query stats between active/standby

Slide 42

Slide 42 text

Build environment - Install JVM 1.7 by apt - Build JRuby by xbuild xbuild/ruby-install jruby-1.7.18 ~/local/jruby-1.7.18/

Slide 43

Slide 43 text

Install with Gemfile Gemfile: source "https://rubygems.org/" platforms :jruby do gem "norikra" end

Slide 44

Slide 44 text

Daemonize with Supervisord [program:norikra] command=/home/norikra/local/jruby-1.7.18/bin/norikra start \ --logdir=/var/log/norikra \ -s /home/norikra/norikra/norikra-stat.json \ --ui-context-path=/norikra \ -Xmx2048m … user=norikra directory=/home/norikra/norikra autostart=true autorestart=true environment=LANG=C

Slide 45

Slide 45 text

Conclusion - Use Norikra with Fluentd - Contribute to Norikra