Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Things Your Application Does While You're Not Looking (Zendcon 2015)

Josh Butts
October 20, 2015

Things Your Application Does While You're Not Looking (Zendcon 2015)

Josh Butts

October 20, 2015
Tweet

More Decks by Josh Butts

Other Decks in Programming

Transcript

  1. About Me • VP of Engineering
 at Offers.com • Austin

    PHP Organizer • github.com/jimbojsb • @jimbojsb 2
  2. About Offers.com • We help people save money • Launched

    in 2009 • 100k lines of PHP across multiple apps • Millions of Uniques / Month 3
  3. Agenda • What is application health? • How can we

    collect data to determine if our application is healthy • How can we make this data actionable? 4
  4. What is application health? • Depends on who you ask

    • Combination of performance and quality – Uptime – Response time – Error rate 5
  5. Uptime • Set realistic expectations - no one is up

    100% of the time • How many 9’s can you tolerate? • Measure uptime monthly • Planned maintenance counts! 6
  6. Up isn’t good enough, but it’s a start • Ping

    monitoring is an absolute minimum • ICMP ping is not good enough • Need to at least check the status code • Should really check for a content snippet • You should outsource this – Pingdom – UptimeRobot 7
  7. Error Rate • Number of requests that generate an E_WARNING

    or above / total requests • Uncaught exceptions: E_Fatal • What’s acceptable? – 0% is not realistic – 1% is a good place to start – 0.5% is what we use 8
  8. Why Error Rate is Hard • PHP error handlers are

    terrible • You really need an extension • There are a few third party tools that do this, but they aren’t cheap 9
  9. Silent Killers • Does a caught exception count as towards

    your error rate? • How many times do you drop the exception? • Would you even know if your password reset page was throwing a 500 error? • Even the best testing can’t fix stupid users 10
  10. Application Logs • Logs are your best source for debugging

    production errors • Log facts • Speak to your future self • Use a service or tool to aggregate logs 12
  11. Log Highlights • Be wordy, but avoid pointless words •

    Take advantage of log levels • Take advantage of different application environments • Keep your logs to “one-liners” 13
  12. Log Levels • DEBUG • INFO • NOTICE • WARNING

    • ERROR • CRITICAL • ALERT • EMERGENCY 14
  13. DEBUG • Most detailed and verbose level • Database queries

    • “per-item” information in a loop • Probably turn this off in production 15
  14. INFO • This is the “default” for most things •

    General events – user logins – application state changes – material domain object modifications 16
  15. NOTICE • Like INFO but slightly more important • You

    might actually care about these • Transactions with values that are normal but higher or lower than expected • Might review these weekly 17
  16. WARN • Undesired behavior that isn’t necessarily wrong • Calling

    deprecated APIs • Unexpected null result sets 18
  17. ERROR • Runtime logic errors • Unexpected invalid arguments •

    Caught exceptions • Doesn’t require immediate attention • Look at these daily 19
  18. CRITICAL • First level where you should consider real-time notifications

    • Unable to connect to a 3rd party service • Connection timeouts • High latency 20
  19. ALERT • Application is partially down or non-functional • Failed

    to connect to a critical internal resource • This should send SMS messages, wake people up • Recommend a time threshold 21
  20. EMERGENCY • Everything has gone to hell • Hardware failures

    • Wake everyone up, keep calling until someone acknowledges • Rare to see this, because logging has probably also failed 22
  21. PHP Logging Software • Monolog • Pretty much everyone uses

    this one • Log4PHP • Pretty much no one uses this one • The one that comes with your favorite framework 23
  22. Useful Monolog Setup 25 <?php $loggerName = 'myapp'; $logger =

    new \Monolog\Logger($loggerName); $file = __DIR__ . '/app.log'; touch($file); chmod($file, 0666); $logger->pushHandler(new Monolog\Handler\StreamHandler($file));
  23. SAPI-aware Monolog 26 $sapi = php_sapi_name(); $loggerName = php_sapi_name() ==

    'cli' ? "myapp-cli" : "myapp-web"; $logger = new \Monolog\Logger($loggerName); if ($sapi == 'cli') { $logger->pushHandler(new \Monolog\Handler\StreamHandler("php:// stdout")); } else { // file setup here, touch, chmod, etc $logger->pushHandler(new Monolog\Handler\StreamHandler($file)); }
  24. Logging to a Service 28 $handler = new Monolog\Handler\SyslogUdpHandler('data.logentries.com', 12345);

    $handler->setFormatter(new \Monolog\Formatter\LineFormatter()); $logger->pushHandler($handler);
  25. Environment-Aware Log Levels 29 if (APPLICATION_ENV == 'production') { $udpHandler

    = new Monolog\Handler \SyslogUdpHandler('data.logentries.com', 12345, \Monolog\Logger::INFO); $udpHandler->setFormatter(new \Monolog\Formatter\LineFormatter()); $emailHandler = new Monolog\Handler\SwiftMailerHandler($swiftMailer, \Monolog\Logger::ALERT); $logger->pushHandler($udpHandler); $logger->pushHandler($emailHandler); }
  26. Metrics Collection • Everyone likes graphs • Data visualizations help

    you spot outliers in real-time • Create a dashboard that displays them 31
  27. Example Baseline metrics • PHP execution time • PHP memory

    usage • Number of database queries per request • Job queue length • Time to process jobs • Emails sent 32
  28. Application Metrics • User logins / failed logins • Password

    resets • Page views for key pages • Deployments • Caught exceptions • Overall page views 33
  29. Statsd / Graphite • Statsd is a node.js tool that

    collects stats from your application • Graphite is a visualization tool that lets you access information from Statsd in graph form 34
  30. Examples of Counters • Count every request • Count every

    transactional email sent • Count every job from your job queue by type • Count every caught exception 38
  31. Examples of Timers • Time your index.php at the top

    and bottom • Time your crontabs, especially overnight ones • You can even submit timers for multi-page events (conversion funnels, etc) 39
  32. Metric Naming • . delimited names • Think of it

    like namespaces • Use a top-level namespace per- app (client-side) 40
  33. Time your “Page Render” 42 <?php $connection = new Domnikl\Statsd\Connection\UdpSocket('localhost');

    $client = new Domnikl\Statsd\Client($connection, ‘orca'); $client->startTiming(‘render_time'); $application = new Application(); $response = $application->dispatch(); echo $response; $client->endTiming('render_time');
  34. Count Your Pageviews 44 $connection = new Domnikl\Statsd\Connection\UdpSocket('localhost'); $client =

    new Domnikl\Statsd\Client($connection, 'orca'); $client->startTiming('render_time'); $application = new Application(); $response = $application->dispatch(); echo $response; $client->increment('pageviews'); $client->endTiming('render_time');
  35. Job Queue Example 46 class Worker { protected $statsd; public

    function run($job) { try { $this->processJob($job); $this->statsd->increment("worker.success"); } catch (\Exception $e) { $this->buryJob($job); $this->statsd->increment("worker.buried"); } } }
  36. Logs vs Stats • Why not both? • Logs are

    searchable • Stats are graph-able, visual • Make sure you can correlate logs and stats 48
  37. Make it Actionable • You have to actually look at

    this stuff • Identify problems with stats • Investigate problems with logs • Revisit your data collection when you encounter anything serious • Get tools to help you 49