Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Things Your Application Does When You're Not Looking (PHP Serbia 2017)

Things Your Application Does When You're Not Looking (PHP Serbia 2017)

It’s 2:00am. Is your application working properly? How will you know for sure when you wake up tomorrow morning? In this talk we’ll look at strategies for effective logging, monitoring and metrics collection from within your application. We’ll look at what kind of data you should and shouldn’t log, and with what frequency, as well as some great open-source tools to use. We’ll also take some first steps with Statsd and Graphite to help see trends in performance, user behavior, and overall application health. We’ll also learn some useful ways to interpret the data we collect to get the best picture of what’s really going on when you’re not looking.

Josh Butts

May 28, 2017
Tweet

More Decks by Josh Butts

Other Decks in Technology

Transcript

  1. Things Your Application Does
    While You’re Not Looking
    Josh Butts
    PHP Srbija 2017

    View Slide

  2. About Me
    • VP of Engineering,

    Ziff Davis Commerce
    • Austin PHP Organizer
    • github.com/jimbojsb
    • @jimbojsb
    2

    View Slide

  3. View Slide

  4. Ⓧ ✔

    View Slide

  5. Agenda
    • What is application health?
    • How can we collect data to determine if our
    application is healthy
    • How can we make this data actionable?
    5

    View Slide

  6. What is application health?
    • Depends on who you ask
    • Combination of performance and quality
    – Uptime
    – Response time
    – Error rate
    6

    View Slide

  7. Uptime
    • Set realistic expectations - no one is up
    100% of the time
    • How many 9’s can you tolerate?
    • Measure uptime monthly
    • Planned maintenance counts!
    7

    View Slide

  8. Up isn’t good enough, but it’s a start
    • Ping monitoring is an absolute minimum
    • ICMP ping is not good enough
    • Need to at least check the status code
    • Should really check for a content snippet
    • You should outsource this
    – Pingdom
    – UptimeRobot
    8

    View Slide

  9. Error Rate
    • Number of requests that generate an
    E_WARNING or above / total requests
    • Uncaught exceptions: E_Fatal
    • What’s acceptable?
    – 0% is not realistic
    – 1% is a good place to start
    – 0.5% is what we use
    9

    View Slide

  10. Why Error Rate is Hard
    • PHP error handlers are terrible
    • You really need an extension
    • There are a few third party tools that do
    this, but they aren’t cheap
    10

    View Slide

  11. Silent Killers
    • Does a caught exception count as towards
    your error rate?
    • How many times do you drop the
    exception?
    • Would you even know if your password
    reset page was throwing a 500 error?
    • Even the best testing can’t fix stupid users
    11

    View Slide

  12. You’re not the only one
    12

    View Slide

  13. Application Logs

    View Slide

  14. Application Logs
    • Logs are your best source for debugging
    production errors
    • Log facts
    • Speak to your future self
    • Use a service or tool to aggregate logs
    14

    View Slide

  15. Log Highlights
    • Be wordy, but avoid pointless words
    • Take advantage of log levels
    • Take advantage of different application
    environments
    • Keep your logs to “one-liners”
    15

    View Slide

  16. Log Levels
    • DEBUG
    • INFO
    • NOTICE
    • WARNING
    • ERROR
    • CRITICAL
    • ALERT
    • EMERGENCY
    16

    View Slide

  17. DEBUG
    • Most detailed and verbose level
    • Database queries
    • “per-item” information in a loop
    • Probably turn this off in production
    17

    View Slide

  18. INFO
    • This is the “default” for most things
    • General events
    – user logins
    – application state changes
    – material domain object modifications
    18

    View Slide

  19. NOTICE
    • Like INFO but slightly more important
    • You might actually care about these
    • Transactions with values that are
    acceptable but higher or lower than usual
    • Might review these weekly
    19

    View Slide

  20. WARN
    • Undesired behavior that isn’t necessarily
    wrong
    • Calling deprecated APIs
    • Unexpected null result sets
    20

    View Slide

  21. ERROR
    • Runtime logic errors
    • Unexpected invalid arguments
    • Caught exceptions
    • Doesn’t require immediate attention
    • Look at these daily
    21

    View Slide

  22. CRITICAL
    • First level where you should consider real-
    time notifications
    • Unable to connect to a 3rd party service
    • Connection timeouts
    • High latency
    22

    View Slide

  23. ALERT
    • Application is partially down or non-
    functional
    • Failed to connect to a critical internal
    resource
    • This should send SMS messages, wake
    people up
    • Recommend a time or repeat threshold
    23

    View Slide

  24. EMERGENCY
    • Everything has gone to hell
    • Hardware failures
    • Wake everyone up, keep calling until
    someone acknowledges
    • Rare to see this, because logging has
    probably also failed
    24

    View Slide

  25. PHP Logging Software
    • Monolog
    • Just use this one
    • Your favorite framework probably has a
    wrapper around Monolog
    25

    View Slide

  26. Basic Monolog Setup
    26
    $loggerName = 'myapp';
    $logger = new \Monolog\Logger($loggerName);

    View Slide

  27. Useful Monolog Setup
    27
    $loggerName = 'myapp';
    $logger = new \Monolog\Logger($loggerName);
    $file = __DIR__ . '/app.log';
    touch($file);
    chmod($file, 0666);
    $logger->pushHandler(new Monolog\Handler\StreamHandler($file));

    View Slide

  28. SAPI-aware Monolog
    28
    $sapi = php_sapi_name();
    $loggerName = php_sapi_name() == 'cli' ? "myapp-cli" : "myapp-web";
    $logger = new \Monolog\Logger($loggerName);
    if ($sapi == 'cli') {
    $logger->pushHandler(new \Monolog\Handler\StreamHandler("php://stdout"));
    } else {
    // file setup here, touch, chmod, etc
    $logger->pushHandler(new Monolog\Handler\StreamHandler($file));
    }

    View Slide

  29. Add extra info to your logs
    29
    $logger->pushProcessor(function($record) {
    $record["extra"] = array(gethostname());
    return $record;
    });

    View Slide

  30. Logging to a Service
    30
    $handler = new Monolog\Handler\SyslogUdpHandler('data.logentries.com', 12345);
    $handler->setFormatter(new \Monolog\Formatter\LineFormatter());
    $logger->pushHandler($handler);

    View Slide

  31. Environment-Aware Log Levels
    31
    if (APPLICATION_ENV == 'production') {
    $udpHandler = new SyslogUdpHandler('data.logentries.com', 12345, Logger::INFO);
    $udpHandler->setFormatter(new \Monolog\Formatter\LineFormatter());
    $emailHandler = SwiftMailerHandler($swiftMailer, Logger::ALERT);
    $logger->pushHandler($udpHandler);
    $logger->pushHandler($emailHandler);
    }

    View Slide

  32. Application Stats

    View Slide

  33. Metrics Collection
    • Everyone likes graphs
    • Data visualizations help you spot outliers in
    real-time
    • Create a dashboard that displays them
    33

    View Slide

  34. Example Baseline metrics
    • PHP execution time
    • PHP memory usage
    • Number of database queries per request
    • Job queue length
    • Time to process jobs
    • Emails sent
    34

    View Slide

  35. Application Metrics
    • User logins / failed logins
    • Password resets
    • Page views for key pages
    • Deployments
    • Caught exceptions
    • Overall page views
    35

    View Slide

  36. Statsd / Graphite
    • Statsd is a node.js tool that collects stats
    from your application
    • Graphite is a visualization tool that lets you
    access information from Statsd in graph
    form
    36

    View Slide

  37. Graphite UI
    37

    View Slide

  38. Graphite Example
    38

    View Slide

  39. Types of Statsd Metrics
    • Counters
    • Timers
    • Gauges
    • Sets
    39

    View Slide

  40. Examples of Counters
    • Count every request
    • Count every transactional email sent
    • Count every job from your job queue by
    type
    • Count every caught exception
    40

    View Slide

  41. Examples of Timers
    • Time your index.php at the top and bottom
    • Time your crontabs, especially overnight
    ones
    • You can even submit timers for multi-page
    events (conversion funnels, etc)
    41

    View Slide

  42. Metric Naming
    • . delimited names
    • Think of it like namespaces
    • Use a top-level namespace per-app (client-
    side)
    42

    View Slide

  43. Send PHP data to Statsd
    43
    $connection = new Domnikl\Statsd\Connection\UdpSocket('localhost');
    $client = new Domnikl\Statsd\Client($connection, 'myapp');

    View Slide

  44. Time your “Page Render”
    44
    $connection = new Domnikl\Statsd\Connection\UdpSocket('localhost');
    $client = new Domnikl\Statsd\Client($connection, ‘orca');
    $client->startTiming(‘render_time');
    $application = new Application();
    $response = $application->dispatch();
    echo $response;
    $client->endTiming('render_time');

    View Slide

  45. Page Render Example
    45

    View Slide

  46. Count Your Pageviews
    46
    $connection = new
    Domnikl\Statsd\Connection\UdpSocket('localhost');
    $client = new Domnikl\Statsd\Client($connection, 'orca');
    $client->startTiming('render_time');
    $application = new Application();
    $response = $application->dispatch();
    echo $response;
    $client->increment('pageviews');
    $client->endTiming('render_time');

    View Slide

  47. Pageview Count Example
    47

    View Slide

  48. Absence of Data is also Data
    48

    View Slide

  49. Job Queue Example
    49
    class Worker
    {
    protected $statsd;
    public function run($job)
    {
    try {
    $this->processJob($job);
    $this->statsd->increment("worker.success");
    } catch (\Exception $e) {
    $this->buryJob($job);
    $this->statsd->increment("worker.buried");
    }
    }
    }

    View Slide

  50. Job Queue Example
    50

    View Slide

  51. Logs vs Stats
    • Why not both?
    • Logs are searchable
    • Stats are graph-able, visual
    • Make sure you can correlate logs and stats
    51

    View Slide

  52. Make it Actionable
    • You have to actually look at this stuff
    • Identify problems with stats
    • Investigate problems with logs
    • Revisit your data collection when you
    encounter anything serious
    • Get tools to help you
    52

    View Slide

  53. Got Budget?
    53

    View Slide

  54. Questions?

    View Slide

  55. https://joind.in/talk/0a751

    View Slide