Slide 1

Slide 1 text

How to monitor your Symfony applications Alexandre Salomé

Slide 2

Slide 2 text

About me ● The “Poney” guy ● Architect on the back-office software, Auchan Retail France ● 7 years of experience on Symfony ● Developer since my childhood

Slide 3

Slide 3 text

About you

Slide 4

Slide 4 text

Summary ● A little theory ● Overview of existing solutions ● What to monitor ● Alerting ● Our solution at Auchan Retail France

Slide 5

Slide 5 text

A little theory

Slide 6

Slide 6 text

Metrics vs Events

Slide 7

Slide 7 text

Metrics vs Events

Slide 8

Slide 8 text

Metrics ● Numbers that change over time ● Formerly time-series data ● name + time = value

Slide 9

Slide 9 text

Metrics: Examples ● System metrics ○ Load average ○ RAM usage ○ Disk I/O ● Service metrics ○ Number of SQL queries ○ Cache hits and misses ● Application metrics ○ Number of registrations ○ Page generation duration

Slide 10

Slide 10 text

Metrics: Aggregation ● Important when querying ● Different aggregations ○ Sum ○ Average ○ Max ○ Min ○ 90th percentile ● Can be used to reduce storage size ○ Every minute from now to 1 week ago ○ Every 15 minutes from 1 week ago to 1 month ago ○ Every hour from 1 month ago to 6 months ago ○ Every day from 6 month ago to …

Slide 11

Slide 11 text

Metrics: Deviation Some metrics are pushed as growing numbers: ● MySQL query count ● Network transfer To get the rate, you need to compute the deviation :

Slide 12

Slide 12 text

Metrics vs Events

Slide 13

Slide 13 text

Events An event is a message from an application, your system, or service. It’s in a text format: 2016-09-06 20:47:13 - Alexandre is preparing slides for the conference

Slide 14

Slide 14 text

Events: examples ● Linux logs ● Apache or Nginx logs ● Symfony logs ● MySQL logs ● Slow query logs

Slide 15

Slide 15 text

Events: field extraction Parse messages with a regex : 2016-01-14 12:34:32 boston-01: User “alice” connected to the application from IP 12.34.56.78 Get a data table: Date 2016-01-14 Time 12:34:32 Server boston-01 Event type login Username alice IP 12.34.56.78

Slide 16

Slide 16 text

Events: field extraction ● Logstash provides a lot of built-in regular expressions : Example: parsing of Apache/Nginx logs grok { match => { "message" => "%{COMBINEDAPACHELOG}" } }

Slide 17

Slide 17 text

Metrics and Events > Comparison Metrics are good at: ● Time series data ● Consolidation over time ● Mathematics ● Storage size Numbers Events are good at: ● Storing any message in any format ● Extracting fields from messages for indexation and queries Text

Slide 18

Slide 18 text

Overview of existing solutions

Slide 19

Slide 19 text

Metrics storages ● ++ Graphite : aggregation ● + InfluxDB : clustering ● OpenTSDB : scalable ● Promotheus

Slide 20

Slide 20 text

Metrics from your system and services A good solution: collectd ● Plugins for system and services metrics : CPU usage, RAM, load average, network, MySQL, Apache, AMQP, Carbon, CPU Temperature, Filesystem, Disk, IRQ, NFS, PostgreSQL, Syslog, MongoDB, Redis, File count, … ● You can add custom metrics by using the Exec plugin Sends all metrics to your storage

Slide 21

Slide 21 text

Metrics from your system and services A complete solution: Zabbix ● Agents to collect metrics ● Web UI to get realtime alerting ● Alert by Mail/SMS/Anything ● Complete metrics extraction ○ System metrics ○ Service metrics ○ Remote calls

Slide 22

Slide 22 text

Metrics buffering with StatsD ● A Node.JS application to buffer your metrics flow ● Lot of available backends ● Manage different metric types ○ Counters (+1, +3, +2) ○ Sampling ( ○ Gauges (200, +3, -2) ● A very simple UDP protocol ● Flush metrics every X seconds ● Optimize performance

Slide 23

Slide 23 text

Metrics from your Symfony application ● <3 m6web/statsd-bundle <3 ● algatux/influxdb-bundle ● https://packagist.org/search/?q=-bundle

Slide 24

Slide 24 text

Events: the so-famous ELK ● ElasticSearch is the storage ● LogStash is the log processing tool ● Kibana is the dashboard

Slide 25

Slide 25 text

See also Events storage: ● Graylog ● Fluentd Awesome Sysadmin : https://github.com/kahun/awesome-sysadmin

Slide 26

Slide 26 text

A simple start

Slide 27

Slide 27 text

A simple start: metrics

Slide 28

Slide 28 text

A simple start: events

Slide 29

Slide 29 text

A good fail ● Metrics are events ● 1 hour = 10 MB ● 1 day = 200 MB ● 1 week = 1.5 GB

Slide 30

Slide 30 text

What to monitor

Slide 31

Slide 31 text

Anything that changes can be measured ● Measure anything and everything ● 3 levels: ○ System: the Debian/Archlinux/whatever system you are using ○ Services: Apache, MySQL, Docker, Nginx, Redis, … ○ Applications: your Symfony application How to Measure Anything: Finding the Value of Intangibles in Business By Douglas W. Hubbard

Slide 32

Slide 32 text

System Metrics ● Load average ● RAM ● Free disk ● IOWait ● Network usage ● Inodes Events ● System logs

Slide 33

Slide 33 text

Services Metrics ● MySQL ○ Query count ○ Cache hit/miss ● Apache ○ Query count ○ Busy/idle workers ● HAProxy ● Redis ● …. Events ● Apache|Nginx access logs ● Apache|Nginx error logs ● MySQL logs ● ElasticSearch logs ● ...

Slide 34

Slide 34 text

Applications Metrics ● Memory/duration per route ● Feature usage ● Custom metrics ○ Registration ○ Checkout process Events ● Symfony logs ● Custom logs ○ Registration GeoIP ○ Checkout details ○ Feature details

Slide 35

Slide 35 text

Application: generic measures for your application http://bit.ly/2ciZDLI

Slide 36

Slide 36 text

Application: generic measures for your application http://bit.ly/2ciZDLI

Slide 37

Slide 37 text

Application: generic measures for your application http://bit.ly/2ciZDLI

Slide 38

Slide 38 text

Application metrics ● Use application events ○ Don’t couple your application code to your monitoring ● M6Web/StatsdBundle provides a smart way to achieve this: m6_statsd: clients: default: Events: forum.read: increment : mysite.forum.read

Slide 39

Slide 39 text

Application events ● Use Symfony monolog channels to route your messages and create powerful dashboard ● Example: the deprecated channel for deprecation message

Slide 40

Slide 40 text

Deprecated channel http://bit.ly/2c8oWpk

Slide 41

Slide 41 text

Deprecated channel http://bit.ly/2c8oWpk

Slide 42

Slide 42 text

Deprecated channel

Slide 43

Slide 43 text

What to measure ● System and service performance ○ Load average ○ Free disk ● System and service errors ○ Syslog errors ○ HTTP codes >= 500 ● User behavior ○ Feature usage ○ Registration count ○ Page views

Slide 44

Slide 44 text

Alerting

Slide 45

Slide 45 text

A little note on alerting ● It’s nice to measure, it’s better to be alerted ● Define rules and get notified when a rule is violated ● Don’t put thresolds at 95% : if your filesystem is filled at 95%, your system is probably already suffering ○ Prefer 60% ● Handling the problem before it happens avoids recovering over a crash ● The alerting rules can be complex ○ On work hours, send a mail to the team ○ Otherwise, send an SMS to the IT manager phone ○ If the IT manager is on holidays, send to his backup

Slide 46

Slide 46 text

Grafana alerting ● Since version 3.1.0 ● By now, only support Graphite backend

Slide 47

Slide 47 text

Our experience at Auchan Retail France

Slide 48

Slide 48 text

Our stack ● Splunk for events ● Zabbix for metrics

Slide 49

Slide 49 text

Zabbix ● Used for monitoring and alerting of system/service metrics

Slide 50

Slide 50 text

Splunk ● ELK + Cash effect = Splunk ● The whole company can use it ● On-the-fly field extraction ○ Beautiful interface to configure them ● Powerful expression language: ○ index=apache sourcetype=frontend | timechart count BY host ○ index=apache sourcetype=frontend host=auchan.fr | stats avg(response_time) BY path ● Powerful graph constructor ● Data models → Pivot tables for business

Slide 51

Slide 51 text

Conclusion ● Track everything that changes ● Instrumentalize your application ● Track your critical business features ● Create decisional dashboards ● Alert at 60%, not at 95% ● If you have (lot of) money, take Splunk

Slide 52

Slide 52 text

The end Thank you!

Slide 53

Slide 53 text

Questions & Answers

Slide 54

Slide 54 text

Photos credits ● Andrew Malone - Measuring - https://flic.kr/p/aqhCH8 ● Sebastian Schulze - SymfonyLive 2010 - https://flic.kr/p/7Ef7vx ● KimManleyOrt - At the Math Grad House - https://flic.kr/p/m2UBWH ● Usehung - Chemistry - https://flic.kr/p/4uT7Er ● Cybjorg - Gauges - https://flic.kr/p/5r3LuJ ● Shan Ambrose - alert - https://flic.kr/p/cAk4KC ● Nicolas Buffler - Projet 365 - 209/365 - https://flic.kr/p/mkHfLF ● Derek Bridges - Questions - https://flic.kr/p/5DeuzB ● Poneys - Internet