Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Things Your Application Does While You're Not Looking (Zendcon 2015)

44a352b02a91a9e841da7533bc5d9b8e?s=47 Josh Butts
October 20, 2015

Things Your Application Does While You're Not Looking (Zendcon 2015)

44a352b02a91a9e841da7533bc5d9b8e?s=128

Josh Butts

October 20, 2015
Tweet

More Decks by Josh Butts

Other Decks in Programming

Transcript

  1. Things Your Application Does While You’re Not Looking Josh Butts

    ZendCon 2015
  2. About Me • VP of Engineering
 at Offers.com • Austin

    PHP Organizer • github.com/jimbojsb • @jimbojsb 2
  3. About Offers.com • We help people save money • Launched

    in 2009 • 100k lines of PHP across multiple apps • Millions of Uniques / Month 3
  4. Agenda • What is application health? • How can we

    collect data to determine if our application is healthy • How can we make this data actionable? 4
  5. What is application health? • Depends on who you ask

    • Combination of performance and quality – Uptime – Response time – Error rate 5
  6. Uptime • Set realistic expectations - no one is up

    100% of the time • How many 9’s can you tolerate? • Measure uptime monthly • Planned maintenance counts! 6
  7. Up isn’t good enough, but it’s a start • Ping

    monitoring is an absolute minimum • ICMP ping is not good enough • Need to at least check the status code • Should really check for a content snippet • You should outsource this – Pingdom – UptimeRobot 7
  8. Error Rate • Number of requests that generate an E_WARNING

    or above / total requests • Uncaught exceptions: E_Fatal • What’s acceptable? – 0% is not realistic – 1% is a good place to start – 0.5% is what we use 8
  9. Why Error Rate is Hard • PHP error handlers are

    terrible • You really need an extension • There are a few third party tools that do this, but they aren’t cheap 9
  10. Silent Killers • Does a caught exception count as towards

    your error rate? • How many times do you drop the exception? • Would you even know if your password reset page was throwing a 500 error? • Even the best testing can’t fix stupid users 10
  11. APPLICATION LOGS 11

  12. Application Logs • Logs are your best source for debugging

    production errors • Log facts • Speak to your future self • Use a service or tool to aggregate logs 12
  13. Log Highlights • Be wordy, but avoid pointless words •

    Take advantage of log levels • Take advantage of different application environments • Keep your logs to “one-liners” 13
  14. Log Levels • DEBUG • INFO • NOTICE • WARNING

    • ERROR • CRITICAL • ALERT • EMERGENCY 14
  15. DEBUG • Most detailed and verbose level • Database queries

    • “per-item” information in a loop • Probably turn this off in production 15
  16. INFO • This is the “default” for most things •

    General events – user logins – application state changes – material domain object modifications 16
  17. NOTICE • Like INFO but slightly more important • You

    might actually care about these • Transactions with values that are normal but higher or lower than expected • Might review these weekly 17
  18. WARN • Undesired behavior that isn’t necessarily wrong • Calling

    deprecated APIs • Unexpected null result sets 18
  19. ERROR • Runtime logic errors • Unexpected invalid arguments •

    Caught exceptions • Doesn’t require immediate attention • Look at these daily 19
  20. CRITICAL • First level where you should consider real-time notifications

    • Unable to connect to a 3rd party service • Connection timeouts • High latency 20
  21. ALERT • Application is partially down or non-functional • Failed

    to connect to a critical internal resource • This should send SMS messages, wake people up • Recommend a time threshold 21
  22. EMERGENCY • Everything has gone to hell • Hardware failures

    • Wake everyone up, keep calling until someone acknowledges • Rare to see this, because logging has probably also failed 22
  23. PHP Logging Software • Monolog • Pretty much everyone uses

    this one • Log4PHP • Pretty much no one uses this one • The one that comes with your favorite framework 23
  24. Basic Monolog Setup 24 <?php $loggerName = 'myapp'; $logger =

    new \Monolog\Logger($loggerName);
  25. Useful Monolog Setup 25 <?php $loggerName = 'myapp'; $logger =

    new \Monolog\Logger($loggerName); $file = __DIR__ . '/app.log'; touch($file); chmod($file, 0666); $logger->pushHandler(new Monolog\Handler\StreamHandler($file));
  26. SAPI-aware Monolog 26 $sapi = php_sapi_name(); $loggerName = php_sapi_name() ==

    'cli' ? "myapp-cli" : "myapp-web"; $logger = new \Monolog\Logger($loggerName); if ($sapi == 'cli') { $logger->pushHandler(new \Monolog\Handler\StreamHandler("php:// stdout")); } else { // file setup here, touch, chmod, etc $logger->pushHandler(new Monolog\Handler\StreamHandler($file)); }
  27. Add extra info to your logs 27 $logger->pushProcessor(function($record) { $record["extra"]

    = array(gethostname()); return $record; });
  28. Logging to a Service 28 $handler = new Monolog\Handler\SyslogUdpHandler('data.logentries.com', 12345);

    $handler->setFormatter(new \Monolog\Formatter\LineFormatter()); $logger->pushHandler($handler);
  29. Environment-Aware Log Levels 29 if (APPLICATION_ENV == 'production') { $udpHandler

    = new Monolog\Handler \SyslogUdpHandler('data.logentries.com', 12345, \Monolog\Logger::INFO); $udpHandler->setFormatter(new \Monolog\Formatter\LineFormatter()); $emailHandler = new Monolog\Handler\SwiftMailerHandler($swiftMailer, \Monolog\Logger::ALERT); $logger->pushHandler($udpHandler); $logger->pushHandler($emailHandler); }
  30. STATS 30

  31. Metrics Collection • Everyone likes graphs • Data visualizations help

    you spot outliers in real-time • Create a dashboard that displays them 31
  32. Example Baseline metrics • PHP execution time • PHP memory

    usage • Number of database queries per request • Job queue length • Time to process jobs • Emails sent 32
  33. Application Metrics • User logins / failed logins • Password

    resets • Page views for key pages • Deployments • Caught exceptions • Overall page views 33
  34. Statsd / Graphite • Statsd is a node.js tool that

    collects stats from your application • Graphite is a visualization tool that lets you access information from Statsd in graph form 34
  35. Graphite UI 35

  36. Graphite Example 36

  37. Types of Statsd Metrics • Counters • Timers • Gauges

    • Sets 37
  38. Examples of Counters • Count every request • Count every

    transactional email sent • Count every job from your job queue by type • Count every caught exception 38
  39. Examples of Timers • Time your index.php at the top

    and bottom • Time your crontabs, especially overnight ones • You can even submit timers for multi-page events (conversion funnels, etc) 39
  40. Metric Naming • . delimited names • Think of it

    like namespaces • Use a top-level namespace per- app (client-side) 40
  41. Send PHP data to Statsd 41 $connection = new Domnikl\Statsd\Connection\UdpSocket('localhost');

    $client = new Domnikl\Statsd\Client($connection, 'myapp');
  42. Time your “Page Render” 42 <?php $connection = new Domnikl\Statsd\Connection\UdpSocket('localhost');

    $client = new Domnikl\Statsd\Client($connection, ‘orca'); $client->startTiming(‘render_time'); $application = new Application(); $response = $application->dispatch(); echo $response; $client->endTiming('render_time');
  43. Page Render Example 43

  44. Count Your Pageviews 44 $connection = new Domnikl\Statsd\Connection\UdpSocket('localhost'); $client =

    new Domnikl\Statsd\Client($connection, 'orca'); $client->startTiming('render_time'); $application = new Application(); $response = $application->dispatch(); echo $response; $client->increment('pageviews'); $client->endTiming('render_time');
  45. Pageview Count Example 45

  46. Job Queue Example 46 class Worker { protected $statsd; public

    function run($job) { try { $this->processJob($job); $this->statsd->increment("worker.success"); } catch (\Exception $e) { $this->buryJob($job); $this->statsd->increment("worker.buried"); } } }
  47. Job Queue Example 47

  48. Logs vs Stats • Why not both? • Logs are

    searchable • Stats are graph-able, visual • Make sure you can correlate logs and stats 48
  49. Make it Actionable • You have to actually look at

    this stuff • Identify problems with stats • Investigate problems with logs • Revisit your data collection when you encounter anything serious • Get tools to help you 49
  50. Got Budget? 50

  51. QUESTIONS Anyone have 51

  52. JOIND.IN/15514 I’d love your feedback: 52