How to monitor your Symfony application

How to monitor your Symfony application

8665aad8f35b1710df79e9aef52d6daa?s=128

Alexandre Salomé

September 16, 2016
Tweet

Transcript

  1. How to monitor your Symfony applications Alexandre Salomé

  2. About me • The “Poney” guy • Architect on the

    back-office software, Auchan Retail France • 7 years of experience on Symfony • Developer since my childhood
  3. About you

  4. Summary • A little theory • Overview of existing solutions

    • What to monitor • Alerting • Our solution at Auchan Retail France
  5. A little theory

  6. Metrics vs Events

  7. Metrics vs Events

  8. Metrics • Numbers that change over time • Formerly time-series

    data • name + time = value
  9. Metrics: Examples • System metrics ◦ Load average ◦ RAM

    usage ◦ Disk I/O • Service metrics ◦ Number of SQL queries ◦ Cache hits and misses • Application metrics ◦ Number of registrations ◦ Page generation duration
  10. Metrics: Aggregation • Important when querying • Different aggregations ◦

    Sum ◦ Average ◦ Max ◦ Min ◦ 90th percentile • Can be used to reduce storage size ◦ Every minute from now to 1 week ago ◦ Every 15 minutes from 1 week ago to 1 month ago ◦ Every hour from 1 month ago to 6 months ago ◦ Every day from 6 month ago to …
  11. Metrics: Deviation Some metrics are pushed as growing numbers: •

    MySQL query count • Network transfer To get the rate, you need to compute the deviation :
  12. Metrics vs Events

  13. Events An event is a message from an application, your

    system, or service. It’s in a text format: 2016-09-06 20:47:13 - Alexandre is preparing slides for the conference
  14. Events: examples • Linux logs • Apache or Nginx logs

    • Symfony logs • MySQL logs • Slow query logs
  15. Events: field extraction Parse messages with a regex : 2016-01-14

    12:34:32 boston-01: User “alice” connected to the application from IP 12.34.56.78 Get a data table: Date 2016-01-14 Time 12:34:32 Server boston-01 Event type login Username alice IP 12.34.56.78
  16. Events: field extraction • Logstash provides a lot of built-in

    regular expressions : Example: parsing of Apache/Nginx logs grok { match => { "message" => "%{COMBINEDAPACHELOG}" } }
  17. Metrics and Events > Comparison Metrics are good at: •

    Time series data • Consolidation over time • Mathematics • Storage size Numbers Events are good at: • Storing any message in any format • Extracting fields from messages for indexation and queries Text
  18. Overview of existing solutions

  19. Metrics storages • ++ Graphite : aggregation • + InfluxDB

    : clustering • OpenTSDB : scalable • Promotheus
  20. Metrics from your system and services A good solution: collectd

    • Plugins for system and services metrics : CPU usage, RAM, load average, network, MySQL, Apache, AMQP, Carbon, CPU Temperature, Filesystem, Disk, IRQ, NFS, PostgreSQL, Syslog, MongoDB, Redis, File count, … • You can add custom metrics by using the Exec plugin Sends all metrics to your storage
  21. Metrics from your system and services A complete solution: Zabbix

    • Agents to collect metrics • Web UI to get realtime alerting • Alert by Mail/SMS/Anything • Complete metrics extraction ◦ System metrics ◦ Service metrics ◦ Remote calls
  22. Metrics buffering with StatsD • A Node.JS application to buffer

    your metrics flow • Lot of available backends • Manage different metric types ◦ Counters (+1, +3, +2) ◦ Sampling ( ◦ Gauges (200, +3, -2) • A very simple UDP protocol • Flush metrics every X seconds • Optimize performance
  23. Metrics from your Symfony application • <3 m6web/statsd-bundle <3 •

    algatux/influxdb-bundle • https://packagist.org/search/?q=<your-solution>-bundle
  24. Events: the so-famous ELK • ElasticSearch is the storage •

    LogStash is the log processing tool • Kibana is the dashboard
  25. See also Events storage: • Graylog • Fluentd Awesome Sysadmin

    : https://github.com/kahun/awesome-sysadmin
  26. A simple start

  27. A simple start: metrics

  28. A simple start: events

  29. A good fail • Metrics are events • 1 hour

    = 10 MB • 1 day = 200 MB • 1 week = 1.5 GB
  30. What to monitor

  31. Anything that changes can be measured • Measure anything and

    everything • 3 levels: ◦ System: the Debian/Archlinux/whatever system you are using ◦ Services: Apache, MySQL, Docker, Nginx, Redis, … ◦ Applications: your Symfony application How to Measure Anything: Finding the Value of Intangibles in Business By Douglas W. Hubbard
  32. System Metrics • Load average • RAM • Free disk

    • IOWait • Network usage • Inodes Events • System logs
  33. Services Metrics • MySQL ◦ Query count ◦ Cache hit/miss

    • Apache ◦ Query count ◦ Busy/idle workers • HAProxy • Redis • …. Events • Apache|Nginx access logs • Apache|Nginx error logs • MySQL logs • ElasticSearch logs • ...
  34. Applications Metrics • Memory/duration per route • Feature usage •

    Custom metrics ◦ Registration ◦ Checkout process Events • Symfony logs • Custom logs ◦ Registration GeoIP ◦ Checkout details ◦ Feature details
  35. Application: generic measures for your application http://bit.ly/2ciZDLI

  36. Application: generic measures for your application http://bit.ly/2ciZDLI

  37. Application: generic measures for your application http://bit.ly/2ciZDLI

  38. Application metrics • Use application events ◦ Don’t couple your

    application code to your monitoring • M6Web/StatsdBundle provides a smart way to achieve this: m6_statsd: clients: default: Events: forum.read: increment : mysite.forum.read
  39. Application events • Use Symfony monolog channels to route your

    messages and create powerful dashboard • Example: the deprecated channel for deprecation message
  40. Deprecated channel http://bit.ly/2c8oWpk

  41. Deprecated channel http://bit.ly/2c8oWpk

  42. Deprecated channel

  43. What to measure • System and service performance ◦ Load

    average ◦ Free disk • System and service errors ◦ Syslog errors ◦ HTTP codes >= 500 • User behavior ◦ Feature usage ◦ Registration count ◦ Page views
  44. Alerting

  45. A little note on alerting • It’s nice to measure,

    it’s better to be alerted • Define rules and get notified when a rule is violated • Don’t put thresolds at 95% : if your filesystem is filled at 95%, your system is probably already suffering ◦ Prefer 60% • Handling the problem before it happens avoids recovering over a crash • The alerting rules can be complex ◦ On work hours, send a mail to the team ◦ Otherwise, send an SMS to the IT manager phone ◦ If the IT manager is on holidays, send to his backup
  46. Grafana alerting • Since version 3.1.0 • By now, only

    support Graphite backend
  47. Our experience at Auchan Retail France

  48. Our stack • Splunk for events • Zabbix for metrics

  49. Zabbix • Used for monitoring and alerting of system/service metrics

  50. Splunk • ELK + Cash effect = Splunk • The

    whole company can use it • On-the-fly field extraction ◦ Beautiful interface to configure them • Powerful expression language: ◦ index=apache sourcetype=frontend | timechart count BY host ◦ index=apache sourcetype=frontend host=auchan.fr | stats avg(response_time) BY path • Powerful graph constructor • Data models → Pivot tables for business
  51. Conclusion • Track everything that changes • Instrumentalize your application

    • Track your critical business features • Create decisional dashboards • Alert at 60%, not at 95% • If you have (lot of) money, take Splunk
  52. The end Thank you!

  53. Questions & Answers

  54. Photos credits • Andrew Malone - Measuring - https://flic.kr/p/aqhCH8 •

    Sebastian Schulze - SymfonyLive 2010 - https://flic.kr/p/7Ef7vx • KimManleyOrt - At the Math Grad House - https://flic.kr/p/m2UBWH • Usehung - Chemistry - https://flic.kr/p/4uT7Er • Cybjorg - Gauges - https://flic.kr/p/5r3LuJ • Shan Ambrose - alert - https://flic.kr/p/cAk4KC • Nicolas Buffler - Projet 365 - 209/365 - https://flic.kr/p/mkHfLF • Derek Bridges - Questions - https://flic.kr/p/5DeuzB • Poneys - Internet