Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Things Your Application Does While You're Not Looking (Zendcon 2015)

44a352b02a91a9e841da7533bc5d9b8e?s=47 Josh Butts
October 20, 2015

Things Your Application Does While You're Not Looking (Zendcon 2015)

44a352b02a91a9e841da7533bc5d9b8e?s=128

Josh Butts

October 20, 2015
Tweet

Transcript

  1. Things Your Application Does While You’re Not Looking Josh Butts

    ZendCon 2015
  2. About Me • VP of Engineering
 at Offers.com • Austin

    PHP Organizer • github.com/jimbojsb • @jimbojsb 2
  3. About Offers.com • We help people save money • Launched

    in 2009 • 100k lines of PHP across multiple apps • Millions of Uniques / Month 3
  4. Agenda • What is application health? • How can we

    collect data to determine if our application is healthy • How can we make this data actionable? 4
  5. What is application health? • Depends on who you ask

    • Combination of performance and quality – Uptime – Response time – Error rate 5
  6. Uptime • Set realistic expectations - no one is up

    100% of the time • How many 9’s can you tolerate? • Measure uptime monthly • Planned maintenance counts! 6
  7. Up isn’t good enough, but it’s a start • Ping

    monitoring is an absolute minimum • ICMP ping is not good enough • Need to at least check the status code • Should really check for a content snippet • You should outsource this – Pingdom – UptimeRobot 7
  8. Error Rate • Number of requests that generate an E_WARNING

    or above / total requests • Uncaught exceptions: E_Fatal • What’s acceptable? – 0% is not realistic – 1% is a good place to start – 0.5% is what we use 8
  9. Why Error Rate is Hard • PHP error handlers are

    terrible • You really need an extension • There are a few third party tools that do this, but they aren’t cheap 9
  10. Silent Killers • Does a caught exception count as towards

    your error rate? • How many times do you drop the exception? • Would you even know if your password reset page was throwing a 500 error? • Even the best testing can’t fix stupid users 10
  11. APPLICATION LOGS 11

  12. Application Logs • Logs are your best source for debugging

    production errors • Log facts • Speak to your future self • Use a service or tool to aggregate logs 12
  13. Log Highlights • Be wordy, but avoid pointless words •

    Take advantage of log levels • Take advantage of different application environments • Keep your logs to “one-liners” 13
  14. Log Levels • DEBUG • INFO • NOTICE • WARNING

    • ERROR • CRITICAL • ALERT • EMERGENCY 14
  15. DEBUG • Most detailed and verbose level • Database queries

    • “per-item” information in a loop • Probably turn this off in production 15
  16. INFO • This is the “default” for most things •

    General events – user logins – application state changes – material domain object modifications 16
  17. NOTICE • Like INFO but slightly more important • You

    might actually care about these • Transactions with values that are normal but higher or lower than expected • Might review these weekly 17
  18. WARN • Undesired behavior that isn’t necessarily wrong • Calling

    deprecated APIs • Unexpected null result sets 18
  19. ERROR • Runtime logic errors • Unexpected invalid arguments •

    Caught exceptions • Doesn’t require immediate attention • Look at these daily 19
  20. CRITICAL • First level where you should consider real-time notifications

    • Unable to connect to a 3rd party service • Connection timeouts • High latency 20
  21. ALERT • Application is partially down or non-functional • Failed

    to connect to a critical internal resource • This should send SMS messages, wake people up • Recommend a time threshold 21
  22. EMERGENCY • Everything has gone to hell • Hardware failures

    • Wake everyone up, keep calling until someone acknowledges • Rare to see this, because logging has probably also failed 22
  23. PHP Logging Software • Monolog • Pretty much everyone uses

    this one • Log4PHP • Pretty much no one uses this one • The one that comes with your favorite framework 23
  24. Basic Monolog Setup 24 <?php $loggerName = 'myapp'; $logger =

    new \Monolog\Logger($loggerName);
  25. Useful Monolog Setup 25 <?php $loggerName = 'myapp'; $logger =

    new \Monolog\Logger($loggerName); $file = __DIR__ . '/app.log'; touch($file); chmod($file, 0666); $logger->pushHandler(new Monolog\Handler\StreamHandler($file));
  26. SAPI-aware Monolog 26 $sapi = php_sapi_name(); $loggerName = php_sapi_name() ==

    'cli' ? "myapp-cli" : "myapp-web"; $logger = new \Monolog\Logger($loggerName); if ($sapi == 'cli') { $logger->pushHandler(new \Monolog\Handler\StreamHandler("php:// stdout")); } else { // file setup here, touch, chmod, etc $logger->pushHandler(new Monolog\Handler\StreamHandler($file)); }
  27. Add extra info to your logs 27 $logger->pushProcessor(function($record) { $record["extra"]

    = array(gethostname()); return $record; });
  28. Logging to a Service 28 $handler = new Monolog\Handler\SyslogUdpHandler('data.logentries.com', 12345);

    $handler->setFormatter(new \Monolog\Formatter\LineFormatter()); $logger->pushHandler($handler);
  29. Environment-Aware Log Levels 29 if (APPLICATION_ENV == 'production') { $udpHandler

    = new Monolog\Handler \SyslogUdpHandler('data.logentries.com', 12345, \Monolog\Logger::INFO); $udpHandler->setFormatter(new \Monolog\Formatter\LineFormatter()); $emailHandler = new Monolog\Handler\SwiftMailerHandler($swiftMailer, \Monolog\Logger::ALERT); $logger->pushHandler($udpHandler); $logger->pushHandler($emailHandler); }
  30. STATS 30

  31. Metrics Collection • Everyone likes graphs • Data visualizations help

    you spot outliers in real-time • Create a dashboard that displays them 31
  32. Example Baseline metrics • PHP execution time • PHP memory

    usage • Number of database queries per request • Job queue length • Time to process jobs • Emails sent 32
  33. Application Metrics • User logins / failed logins • Password

    resets • Page views for key pages • Deployments • Caught exceptions • Overall page views 33
  34. Statsd / Graphite • Statsd is a node.js tool that

    collects stats from your application • Graphite is a visualization tool that lets you access information from Statsd in graph form 34
  35. Graphite UI 35

  36. Graphite Example 36

  37. Types of Statsd Metrics • Counters • Timers • Gauges

    • Sets 37
  38. Examples of Counters • Count every request • Count every

    transactional email sent • Count every job from your job queue by type • Count every caught exception 38
  39. Examples of Timers • Time your index.php at the top

    and bottom • Time your crontabs, especially overnight ones • You can even submit timers for multi-page events (conversion funnels, etc) 39
  40. Metric Naming • . delimited names • Think of it

    like namespaces • Use a top-level namespace per- app (client-side) 40
  41. Send PHP data to Statsd 41 $connection = new Domnikl\Statsd\Connection\UdpSocket('localhost');

    $client = new Domnikl\Statsd\Client($connection, 'myapp');
  42. Time your “Page Render” 42 <?php $connection = new Domnikl\Statsd\Connection\UdpSocket('localhost');

    $client = new Domnikl\Statsd\Client($connection, ‘orca'); $client->startTiming(‘render_time'); $application = new Application(); $response = $application->dispatch(); echo $response; $client->endTiming('render_time');
  43. Page Render Example 43

  44. Count Your Pageviews 44 $connection = new Domnikl\Statsd\Connection\UdpSocket('localhost'); $client =

    new Domnikl\Statsd\Client($connection, 'orca'); $client->startTiming('render_time'); $application = new Application(); $response = $application->dispatch(); echo $response; $client->increment('pageviews'); $client->endTiming('render_time');
  45. Pageview Count Example 45

  46. Job Queue Example 46 class Worker { protected $statsd; public

    function run($job) { try { $this->processJob($job); $this->statsd->increment("worker.success"); } catch (\Exception $e) { $this->buryJob($job); $this->statsd->increment("worker.buried"); } } }
  47. Job Queue Example 47

  48. Logs vs Stats • Why not both? • Logs are

    searchable • Stats are graph-able, visual • Make sure you can correlate logs and stats 48
  49. Make it Actionable • You have to actually look at

    this stuff • Identify problems with stats • Investigate problems with logs • Revisit your data collection when you encounter anything serious • Get tools to help you 49
  50. Got Budget? 50

  51. QUESTIONS Anyone have 51

  52. JOIND.IN/15514 I’d love your feedback: 52