Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Norikra to realtime log analytics

Norikra to realtime log analytics

Norikra meetup #2
2015-06-03

Harukasan

June 03, 2015
Tweet

More Decks by Harukasan

Other Decks in Technology

Transcript

  1. Norikra to realtime log analytics
    harukasan / MICHII Shunsuke

    View Slide

  2. Harukasan / MICHII Shunsuke
    - Infrastructure engineer in pixiv since 2012
    - Develops contents distribution / convertor / storage
    - distributes up-to 16Gbps image traffic
    - Log collecting/analytics platform
    - Elasticsearch/Kibana
    - Fluentd

    View Slide

  3. http://qiita.com/harukasan/items/957012833e5a361f7aa1

    View Slide

  4. http://qiita.com/harukasan/items/a7c1dd1a11a61cd1ad75

    View Slide

  5. Agenda
    - Log ecosystem
    - Batch processing vs. Stream processing
    - Getting started with Norikra
    - Norikra Deployment

    View Slide

  6. Application
    Application
    Application
    Database
    Storage
    service
    HDFS
    RDB / Other

    ʁ
    rsync
    syslog
    ssh
    custom script …
    Storage
    log storage

    View Slide

  7. Application
    Application
    Application
    Database
    Storage
    service
    Fluentd
    HDFS
    RDB / Other

    Storage
    log storage

    View Slide

  8. Application
    Application
    Application
    Database
    Storage
    service
    Fluentd
    HDFS
    RDB / Other

    Storage
    Google BigQuery
    Elasticsearch
    MongoDB
    log storage
    Treasure Data

    View Slide

  9. Application
    Application
    Application
    Database
    Storage
    service
    HDFS
    RDB / Other

    Storage
    Google BigQuery
    Elasticsearch
    MongoDB
    log storage
    Treasure Data
    Fluentd

    View Slide

  10. Application
    Application
    Application
    Database
    Storage
    service
    HDFS
    RDB / Other

    Storage
    Google BigQuery
    Elasticsearch
    MongoDB
    log storage
    Treasure Data
    Fluentd
    Kibana
    Spreadsheet
    HRForecast
    Tableau
    GrowthForecast
    Custom Script
    visualisation / analytics

    View Slide

  11. Application
    Application
    Application
    Database
    Storage
    service
    HDFS
    RDB / Other

    Storage
    Google BigQuery
    Elasticsearch
    MongoDB
    log storage
    Treasure Data
    Fluentd
    Kibana
    Spreadsheet
    HRForecast
    Tableau
    GrowthForecast
    Custom Script
    visualisation / analytics
    GAS

    View Slide

  12. Application
    Application
    Application
    Database
    Storage
    service
    HDFS
    RDB / Other

    Storage
    Google BigQuery
    Elasticsearch
    MongoDB
    log storage
    Treasure Data
    Fluentd
    Kibana
    Spreadsheet
    HRForecast
    Tableau
    GrowthForecast
    Custom Script
    visualisation / analytics
    Shib

    View Slide

  13. Application
    Application
    Application
    Database
    Storage
    service
    HDFS
    RDB / Other

    Storage
    Google BigQuery
    Elasticsearch
    MongoDB
    log storage
    Treasure Data
    Fluentd
    Kibana
    Spreadsheet
    HRForecast
    Tableau
    GrowthForecast
    Custom Script
    visualisation / analytics

    View Slide

  14. Application
    Application
    Application
    Database
    Storage
    pixiv
    RDB / Other

    Storage
    Google BigQuery
    Elasticsearch
    MongoDB
    log storage
    Fluentd
    Kibana
    HRForecast
    Tableau
    Custom Script
    visualisation / analytics
    Jenkins

    View Slide

  15. Log ecosystem with Fluentd
    - Every log can stream to any type storages/queues
    - Every log are converted to structured data

    View Slide

  16. Log Analytics
    Batch processing
    Ad-hoc analysis
    Offline analysis
    Stream processing

    View Slide

  17. Batch processing
    Daily / Weekly / Monthly Reporting
    - page view
    - conversion count
    - num. of events
    デイリーレポート
    ================
    - 2015/06/03更新
    ■ページビュー
    2015/05/30 (水) 888888 PV
    2015/05/30 (木) 888888 PV
    2015/05/30 (金) 888888 PV
    2015/05/30 (土) 888888 PV
    2015/05/31 (日) 888888 PV ★過去最高
    2015/06/01 (月) 888888 PV
    2015/06/02 (火) 888888 PV
    2015/06/03 (水) 888888 PV
    ■新規登録数
    2015/05/30 (水) 8888 人

    View Slide

  18. Ad-hoc analysis
    - Kibana with Elasticsearch
    - BI Tools: Tableau, QlickView, Pentaho…

    View Slide

  19. Offline Analysis
    - Excel is awesome
    - Analysis small data on laptops
    - Many techniques and know-how in Japan

    View Slide

  20. Sometimes, Batch processes

    are too heavy
    Minutely Report
    - to know burst access
    - to know changes in the day
    Minutely Notification
    - to report error
    - to detect attacks

    View Slide

  21. Stream Processing

    to realtime analytics
    - Process small data (almost case, in-memory)
    - High throughput
    - Low latency
    time window
    data stream
    1 min.

    View Slide

  22. Norikra
    - Streaming processing server
    - Schema-less
    - Use SQL-like query

    View Slide

  23. Realtime Aggregation
    SELECT
    COUNT(1, status REGEXP '^2..$') AS count_2xx,
    COUNT(1, status REGEXP '^3..$') AS count_3xx,
    COUNT(1, status REGEXP '^4..$') AS count_4xx,
    COUNT(1, status REGEXP '^5..$') AS count_5xx
    FROM access_log.win:time_batch(1 min)

    View Slide

  24. Fluentd Norikra

    View Slide

  25. Output from fluent-plugin-norikra

    type forward

    # output to Norikra
    type norikra
    norikra localhost:26571 # specify norikra host (26571: default port)
    target_map_tag true # create target with tag

    View Slide

  26. Auto generated targets

    View Slide

  27. Fluentd Norikra
    Elasticsearch
    GrowthForecast
    Idobata
    Google BigQuery
    ?

    View Slide

  28. Fluentd Norikra
    Elasticsearch
    GrowthForecast
    Idobata
    Google BigQuery
    Fluentd

    View Slide

  29. Sweep from Norikra

    type norikra
    norikra localhost:26571

    method sweep # sweep output of query
    target gf # specify query group
    tag query_name # use query_name as tag
    tag_prefix norikra.gf # add tag prefix
    interval 10s


    View Slide

  30. Sweep from Norikra


    method sweep # sweep output of query
    target idobata # specify query group
    tag query_name # use query_name as tag
    tag_prefix norikra.idobata # add tag prefix
    interval 10s


    View Slide

  31. Sweep from Norikra


    method sweep # sweep output of query
    target es # specify query group
    tag query_name # use query_name as tag
    tag_prefix norikra.es # add tag prefix
    interval 10s


    View Slide

  32. Output to GrowthForecast

    type growthforecast
    remove_prefix norikra.gf
    name_key_pattern .
    gfapi_url http://localhost:5125/api/
    graph_path norikra/${tag}/${key_name}

    View Slide

  33. Output to Idobata

    type idobata
    webhook_url #{put_your_hook_url_here}
    message_template <%= record['message'] %>

    View Slide

  34. Routing query to output

    View Slide

  35. HTTP Status count
    SELECT
    COUNT(1, status REGEXP '^2..$') AS count_2xx,
    COUNT(1, status REGEXP '^3..$') AS count_3xx,
    COUNT(1, status REGEXP '^4..$') AS count_4xx,
    COUNT(1, status REGEXP '^5..$') AS count_5xx
    FROM access_log.win:time_batch(1 min)
    Name
    status_count
    Group
    gf
    Query

    View Slide

  36. HTTP Status count
    SELECT
    COUNT(1, status REGEXP '^2..$') AS count_2xx,
    COUNT(1, status REGEXP '^3..$') AS count_3xx,
    COUNT(1, status REGEXP '^4..$') AS count_4xx,
    COUNT(1, status REGEXP '^5..$') AS count_5xx
    FROM access_log.win:time_batch(1 min)
    Name
    status_count
    Group
    mackerel
    Query

    View Slide

  37. HTTP Status count
    SELECT
    "Notify: over 1000 access" AS message,
    COUNT(*) AS count
    FROM access_log.win:time_batch(1 min)
    WHERE count > 1000
    Name
    notify_error
    Group
    idobata
    Query

    View Slide

  38. Fluentd Norikra
    Elasticsearch
    GrowthForecast
    Idobata
    Google BigQuery
    Fluentd
    Output to anywhere with Fluentd

    View Slide

  39. Norikra Deployment

    View Slide

  40. Application
    Application
    Application
    Database
    Storage
    service
    Fluentd
    Active
    Fluentd
    Standby
    Computing node
    Norikra
    Fluentd
    GrowthForecast
    SPOF

    View Slide

  41. Hardware structure
    - Norikra needs many memory (min. 8GB)
    - CPU cores are not so much required
    - Norikra is SPOF yet
    - Norikra can’t share query stats between active/standby

    View Slide

  42. Build environment
    - Install JVM 1.7 by apt
    - Build JRuby by xbuild
    xbuild/ruby-install jruby-1.7.18 ~/local/jruby-1.7.18/

    View Slide

  43. Install with Gemfile
    Gemfile:
    source "https://rubygems.org/"
    platforms :jruby do
    gem "norikra"
    end

    View Slide

  44. Daemonize with Supervisord
    [program:norikra]
    command=/home/norikra/local/jruby-1.7.18/bin/norikra start \
    --logdir=/var/log/norikra \
    -s /home/norikra/norikra/norikra-stat.json \
    --ui-context-path=/norikra \
    -Xmx2048m

    user=norikra
    directory=/home/norikra/norikra
    autostart=true
    autorestart=true
    environment=LANG=C

    View Slide

  45. Conclusion
    - Use Norikra with Fluentd
    - Contribute to Norikra

    View Slide