Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Things Your Application Does When You're Not Looking (PHP Serbia 2017)

Things Your Application Does When You're Not Looking (PHP Serbia 2017)

It’s 2:00am. Is your application working properly? How will you know for sure when you wake up tomorrow morning? In this talk we’ll look at strategies for effective logging, monitoring and metrics collection from within your application. We’ll look at what kind of data you should and shouldn’t log, and with what frequency, as well as some great open-source tools to use. We’ll also take some first steps with Statsd and Graphite to help see trends in performance, user behavior, and overall application health. We’ll also learn some useful ways to interpret the data we collect to get the best picture of what’s really going on when you’re not looking.

Josh Butts

May 28, 2017
Tweet

More Decks by Josh Butts

Other Decks in Technology

Transcript

  1. About Me • VP of Engineering,
 Ziff Davis Commerce •

    Austin PHP Organizer • github.com/jimbojsb • @jimbojsb 2
  2. Agenda • What is application health? • How can we

    collect data to determine if our application is healthy • How can we make this data actionable? 5
  3. What is application health? • Depends on who you ask

    • Combination of performance and quality – Uptime – Response time – Error rate 6
  4. Uptime • Set realistic expectations - no one is up

    100% of the time • How many 9’s can you tolerate? • Measure uptime monthly • Planned maintenance counts! 7
  5. Up isn’t good enough, but it’s a start • Ping

    monitoring is an absolute minimum • ICMP ping is not good enough • Need to at least check the status code • Should really check for a content snippet • You should outsource this – Pingdom – UptimeRobot 8
  6. Error Rate • Number of requests that generate an E_WARNING

    or above / total requests • Uncaught exceptions: E_Fatal • What’s acceptable? – 0% is not realistic – 1% is a good place to start – 0.5% is what we use 9
  7. Why Error Rate is Hard • PHP error handlers are

    terrible • You really need an extension • There are a few third party tools that do this, but they aren’t cheap 10
  8. Silent Killers • Does a caught exception count as towards

    your error rate? • How many times do you drop the exception? • Would you even know if your password reset page was throwing a 500 error? • Even the best testing can’t fix stupid users 11
  9. Application Logs • Logs are your best source for debugging

    production errors • Log facts • Speak to your future self • Use a service or tool to aggregate logs 14
  10. Log Highlights • Be wordy, but avoid pointless words •

    Take advantage of log levels • Take advantage of different application environments • Keep your logs to “one-liners” 15
  11. Log Levels • DEBUG • INFO • NOTICE • WARNING

    • ERROR • CRITICAL • ALERT • EMERGENCY 16
  12. DEBUG • Most detailed and verbose level • Database queries

    • “per-item” information in a loop • Probably turn this off in production 17
  13. INFO • This is the “default” for most things •

    General events – user logins – application state changes – material domain object modifications 18
  14. NOTICE • Like INFO but slightly more important • You

    might actually care about these • Transactions with values that are acceptable but higher or lower than usual • Might review these weekly 19
  15. WARN • Undesired behavior that isn’t necessarily wrong • Calling

    deprecated APIs • Unexpected null result sets 20
  16. ERROR • Runtime logic errors • Unexpected invalid arguments •

    Caught exceptions • Doesn’t require immediate attention • Look at these daily 21
  17. CRITICAL • First level where you should consider real- time

    notifications • Unable to connect to a 3rd party service • Connection timeouts • High latency 22
  18. ALERT • Application is partially down or non- functional •

    Failed to connect to a critical internal resource • This should send SMS messages, wake people up • Recommend a time or repeat threshold 23
  19. EMERGENCY • Everything has gone to hell • Hardware failures

    • Wake everyone up, keep calling until someone acknowledges • Rare to see this, because logging has probably also failed 24
  20. PHP Logging Software • Monolog • Just use this one

    • Your favorite framework probably has a wrapper around Monolog 25
  21. Useful Monolog Setup 27 <?php $loggerName = 'myapp'; $logger =

    new \Monolog\Logger($loggerName); $file = __DIR__ . '/app.log'; touch($file); chmod($file, 0666); $logger->pushHandler(new Monolog\Handler\StreamHandler($file));
  22. SAPI-aware Monolog 28 $sapi = php_sapi_name(); $loggerName = php_sapi_name() ==

    'cli' ? "myapp-cli" : "myapp-web"; $logger = new \Monolog\Logger($loggerName); if ($sapi == 'cli') { $logger->pushHandler(new \Monolog\Handler\StreamHandler("php://stdout")); } else { // file setup here, touch, chmod, etc $logger->pushHandler(new Monolog\Handler\StreamHandler($file)); }
  23. Logging to a Service 30 $handler = new Monolog\Handler\SyslogUdpHandler('data.logentries.com', 12345);

    $handler->setFormatter(new \Monolog\Formatter\LineFormatter()); $logger->pushHandler($handler);
  24. Environment-Aware Log Levels 31 if (APPLICATION_ENV == 'production') { $udpHandler

    = new SyslogUdpHandler('data.logentries.com', 12345, Logger::INFO); $udpHandler->setFormatter(new \Monolog\Formatter\LineFormatter()); $emailHandler = SwiftMailerHandler($swiftMailer, Logger::ALERT); $logger->pushHandler($udpHandler); $logger->pushHandler($emailHandler); }
  25. Metrics Collection • Everyone likes graphs • Data visualizations help

    you spot outliers in real-time • Create a dashboard that displays them 33
  26. Example Baseline metrics • PHP execution time • PHP memory

    usage • Number of database queries per request • Job queue length • Time to process jobs • Emails sent 34
  27. Application Metrics • User logins / failed logins • Password

    resets • Page views for key pages • Deployments • Caught exceptions • Overall page views 35
  28. Statsd / Graphite • Statsd is a node.js tool that

    collects stats from your application • Graphite is a visualization tool that lets you access information from Statsd in graph form 36
  29. Examples of Counters • Count every request • Count every

    transactional email sent • Count every job from your job queue by type • Count every caught exception 40
  30. Examples of Timers • Time your index.php at the top

    and bottom • Time your crontabs, especially overnight ones • You can even submit timers for multi-page events (conversion funnels, etc) 41
  31. Metric Naming • . delimited names • Think of it

    like namespaces • Use a top-level namespace per-app (client- side) 42
  32. Time your “Page Render” 44 <?php $connection = new Domnikl\Statsd\Connection\UdpSocket('localhost');

    $client = new Domnikl\Statsd\Client($connection, ‘orca'); $client->startTiming(‘render_time'); $application = new Application(); $response = $application->dispatch(); echo $response; $client->endTiming('render_time');
  33. Count Your Pageviews 46 $connection = new Domnikl\Statsd\Connection\UdpSocket('localhost'); $client =

    new Domnikl\Statsd\Client($connection, 'orca'); $client->startTiming('render_time'); $application = new Application(); $response = $application->dispatch(); echo $response; $client->increment('pageviews'); $client->endTiming('render_time');
  34. Job Queue Example 49 class Worker { protected $statsd; public

    function run($job) { try { $this->processJob($job); $this->statsd->increment("worker.success"); } catch (\Exception $e) { $this->buryJob($job); $this->statsd->increment("worker.buried"); } } }
  35. Logs vs Stats • Why not both? • Logs are

    searchable • Stats are graph-able, visual • Make sure you can correlate logs and stats 51
  36. Make it Actionable • You have to actually look at

    this stuff • Identify problems with stats • Investigate problems with logs • Revisit your data collection when you encounter anything serious • Get tools to help you 52