Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Things Your Application Does When You're Not Looking (PHP Serbia 2017)

Things Your Application Does When You're Not Looking (PHP Serbia 2017)

It’s 2:00am. Is your application working properly? How will you know for sure when you wake up tomorrow morning? In this talk we’ll look at strategies for effective logging, monitoring and metrics collection from within your application. We’ll look at what kind of data you should and shouldn’t log, and with what frequency, as well as some great open-source tools to use. We’ll also take some first steps with Statsd and Graphite to help see trends in performance, user behavior, and overall application health. We’ll also learn some useful ways to interpret the data we collect to get the best picture of what’s really going on when you’re not looking.

44a352b02a91a9e841da7533bc5d9b8e?s=128

Josh Butts

May 28, 2017
Tweet

Transcript

  1. Things Your Application Does While You’re Not Looking Josh Butts

    PHP Srbija 2017
  2. About Me • VP of Engineering,
 Ziff Davis Commerce •

    Austin PHP Organizer • github.com/jimbojsb • @jimbojsb 2
  3. None
  4. Ⓧ ✔

  5. Agenda • What is application health? • How can we

    collect data to determine if our application is healthy • How can we make this data actionable? 5
  6. What is application health? • Depends on who you ask

    • Combination of performance and quality – Uptime – Response time – Error rate 6
  7. Uptime • Set realistic expectations - no one is up

    100% of the time • How many 9’s can you tolerate? • Measure uptime monthly • Planned maintenance counts! 7
  8. Up isn’t good enough, but it’s a start • Ping

    monitoring is an absolute minimum • ICMP ping is not good enough • Need to at least check the status code • Should really check for a content snippet • You should outsource this – Pingdom – UptimeRobot 8
  9. Error Rate • Number of requests that generate an E_WARNING

    or above / total requests • Uncaught exceptions: E_Fatal • What’s acceptable? – 0% is not realistic – 1% is a good place to start – 0.5% is what we use 9
  10. Why Error Rate is Hard • PHP error handlers are

    terrible • You really need an extension • There are a few third party tools that do this, but they aren’t cheap 10
  11. Silent Killers • Does a caught exception count as towards

    your error rate? • How many times do you drop the exception? • Would you even know if your password reset page was throwing a 500 error? • Even the best testing can’t fix stupid users 11
  12. You’re not the only one 12

  13. Application Logs

  14. Application Logs • Logs are your best source for debugging

    production errors • Log facts • Speak to your future self • Use a service or tool to aggregate logs 14
  15. Log Highlights • Be wordy, but avoid pointless words •

    Take advantage of log levels • Take advantage of different application environments • Keep your logs to “one-liners” 15
  16. Log Levels • DEBUG • INFO • NOTICE • WARNING

    • ERROR • CRITICAL • ALERT • EMERGENCY 16
  17. DEBUG • Most detailed and verbose level • Database queries

    • “per-item” information in a loop • Probably turn this off in production 17
  18. INFO • This is the “default” for most things •

    General events – user logins – application state changes – material domain object modifications 18
  19. NOTICE • Like INFO but slightly more important • You

    might actually care about these • Transactions with values that are acceptable but higher or lower than usual • Might review these weekly 19
  20. WARN • Undesired behavior that isn’t necessarily wrong • Calling

    deprecated APIs • Unexpected null result sets 20
  21. ERROR • Runtime logic errors • Unexpected invalid arguments •

    Caught exceptions • Doesn’t require immediate attention • Look at these daily 21
  22. CRITICAL • First level where you should consider real- time

    notifications • Unable to connect to a 3rd party service • Connection timeouts • High latency 22
  23. ALERT • Application is partially down or non- functional •

    Failed to connect to a critical internal resource • This should send SMS messages, wake people up • Recommend a time or repeat threshold 23
  24. EMERGENCY • Everything has gone to hell • Hardware failures

    • Wake everyone up, keep calling until someone acknowledges • Rare to see this, because logging has probably also failed 24
  25. PHP Logging Software • Monolog • Just use this one

    • Your favorite framework probably has a wrapper around Monolog 25
  26. Basic Monolog Setup 26 <?php $loggerName = 'myapp'; $logger =

    new \Monolog\Logger($loggerName);
  27. Useful Monolog Setup 27 <?php $loggerName = 'myapp'; $logger =

    new \Monolog\Logger($loggerName); $file = __DIR__ . '/app.log'; touch($file); chmod($file, 0666); $logger->pushHandler(new Monolog\Handler\StreamHandler($file));
  28. SAPI-aware Monolog 28 $sapi = php_sapi_name(); $loggerName = php_sapi_name() ==

    'cli' ? "myapp-cli" : "myapp-web"; $logger = new \Monolog\Logger($loggerName); if ($sapi == 'cli') { $logger->pushHandler(new \Monolog\Handler\StreamHandler("php://stdout")); } else { // file setup here, touch, chmod, etc $logger->pushHandler(new Monolog\Handler\StreamHandler($file)); }
  29. Add extra info to your logs 29 $logger->pushProcessor(function($record) { $record["extra"]

    = array(gethostname()); return $record; });
  30. Logging to a Service 30 $handler = new Monolog\Handler\SyslogUdpHandler('data.logentries.com', 12345);

    $handler->setFormatter(new \Monolog\Formatter\LineFormatter()); $logger->pushHandler($handler);
  31. Environment-Aware Log Levels 31 if (APPLICATION_ENV == 'production') { $udpHandler

    = new SyslogUdpHandler('data.logentries.com', 12345, Logger::INFO); $udpHandler->setFormatter(new \Monolog\Formatter\LineFormatter()); $emailHandler = SwiftMailerHandler($swiftMailer, Logger::ALERT); $logger->pushHandler($udpHandler); $logger->pushHandler($emailHandler); }
  32. Application Stats

  33. Metrics Collection • Everyone likes graphs • Data visualizations help

    you spot outliers in real-time • Create a dashboard that displays them 33
  34. Example Baseline metrics • PHP execution time • PHP memory

    usage • Number of database queries per request • Job queue length • Time to process jobs • Emails sent 34
  35. Application Metrics • User logins / failed logins • Password

    resets • Page views for key pages • Deployments • Caught exceptions • Overall page views 35
  36. Statsd / Graphite • Statsd is a node.js tool that

    collects stats from your application • Graphite is a visualization tool that lets you access information from Statsd in graph form 36
  37. Graphite UI 37

  38. Graphite Example 38

  39. Types of Statsd Metrics • Counters • Timers • Gauges

    • Sets 39
  40. Examples of Counters • Count every request • Count every

    transactional email sent • Count every job from your job queue by type • Count every caught exception 40
  41. Examples of Timers • Time your index.php at the top

    and bottom • Time your crontabs, especially overnight ones • You can even submit timers for multi-page events (conversion funnels, etc) 41
  42. Metric Naming • . delimited names • Think of it

    like namespaces • Use a top-level namespace per-app (client- side) 42
  43. Send PHP data to Statsd 43 $connection = new Domnikl\Statsd\Connection\UdpSocket('localhost');

    $client = new Domnikl\Statsd\Client($connection, 'myapp');
  44. Time your “Page Render” 44 <?php $connection = new Domnikl\Statsd\Connection\UdpSocket('localhost');

    $client = new Domnikl\Statsd\Client($connection, ‘orca'); $client->startTiming(‘render_time'); $application = new Application(); $response = $application->dispatch(); echo $response; $client->endTiming('render_time');
  45. Page Render Example 45

  46. Count Your Pageviews 46 $connection = new Domnikl\Statsd\Connection\UdpSocket('localhost'); $client =

    new Domnikl\Statsd\Client($connection, 'orca'); $client->startTiming('render_time'); $application = new Application(); $response = $application->dispatch(); echo $response; $client->increment('pageviews'); $client->endTiming('render_time');
  47. Pageview Count Example 47

  48. Absence of Data is also Data 48

  49. Job Queue Example 49 class Worker { protected $statsd; public

    function run($job) { try { $this->processJob($job); $this->statsd->increment("worker.success"); } catch (\Exception $e) { $this->buryJob($job); $this->statsd->increment("worker.buried"); } } }
  50. Job Queue Example 50

  51. Logs vs Stats • Why not both? • Logs are

    searchable • Stats are graph-able, visual • Make sure you can correlate logs and stats 51
  52. Make it Actionable • You have to actually look at

    this stuff • Identify problems with stats • Investigate problems with logs • Revisit your data collection when you encounter anything serious • Get tools to help you 52
  53. Got Budget? 53

  54. Questions?

  55. https://joind.in/talk/0a751