Slide 1

Slide 1 text

Things  Your  Application  Does   While  You’re  Not  Looking Josh Butts VP of Engineering

Slide 2

Slide 2 text

About  Me • VP  of  Engineering
 at  Offers.com   • Austin  PHP  Organizer   •            github.com/jimbojsb                               •              @jimbojsb 2

Slide 3

Slide 3 text

About  Offers.com • We  help  people  save  money   • Launched  in  2009   • 50k  line  ZF1  application   • Millions  of  Uniques  /  Month 3

Slide 4

Slide 4 text

Agenda • What  is  application  health?   • How  can  we  collect  data  to   determine  if  our  application  is   healthy   • How  can  we  make  this  data   actionable? 4

Slide 5

Slide 5 text

What  is  application  health? • Depends  on  who  you  ask   • Combination  of  performance  and   quality   – Uptime   – Response  time   – Error  rate 5

Slide 6

Slide 6 text

Uptime • Set  realistic  expectations  -­‐  no  one   is  up  100%  of  the  time   • How  many  9’s  can  you  tolerate?   • Measure  uptime  monthly   • Planned  maintenance  counts! 6

Slide 7

Slide 7 text

Up  isn’t  good  enough,  but  it’s  a  start • Ping  monitoring  is  an  absolute  minimum   • ICMP  ping  is  not  good  enough   • Need  to  at  least  check  the  status  code   • Should  really  check  for  a  content  snippet   • You  should  outsource  this   – Pingdom   – UptimeRobot 7

Slide 8

Slide 8 text

Error  Rate • Number  of  requests  that  generate  an   E_WARNING  or  above  /  total  requests   • Uncaught  exceptions:  E_Fatal   • What’s  acceptable?   – 0%  is  not  realistic   – 1%  is  a  good  place  to  start   – 0.5%  is  what  we  use 8

Slide 9

Slide 9 text

Why  Error  Rate  is  Hard • PHP  error  handlers  are  terrible   • You  really  need  an  extension   • There  are  a  few  third  party  tools   that  do  this,  but  they  aren’t  cheap 9

Slide 10

Slide 10 text

Silent  Killers • Does  a  caught  exception  count  as   towards  your  error  rate?   • How  many  times  do  you  drop  the   exception?   • Would  you  even  know  if  your  password   reset  page  was  throwing  a  500  error?   • Even  the  best  testing  can’t  fix  stupid   users 10

Slide 11

Slide 11 text

APPLICATION  LOGS 11

Slide 12

Slide 12 text

Application  Logs • Logs  are  your  best  source  for   debugging  production  errors   • Log  facts   • Speak  to  your  future  self   • Use  a  service  or  tool  to  aggregate   logs 12

Slide 13

Slide 13 text

Log  Highlights • Be  wordy,  but  avoid  pointless   words   • Take  advantage  of  log  levels   • Take  advantage  of  different   application  environments   • Keep  your  logs  to  “one-­‐liners” 13

Slide 14

Slide 14 text

Log  Levels • DEBUG     • INFO   • NOTICE   • WARNING   • ERROR   • CRITICAL   • ALERT   • EMERGENCY 14

Slide 15

Slide 15 text

DEBUG • Most  detailed  and  verbose  level   • Database  queries   • “per-­‐item”  information  in  a  loop   • Probably  turn  this  off  in  production 15

Slide 16

Slide 16 text

INFO • This  is  the  “default”  for  most  things   • General  events   – user  logins   – application  state  changes   – material  domain  object   modifications 16

Slide 17

Slide 17 text

NOTICE • Like  INFO  but  slightly  more   important   • You  might  actually  care  about  these   • Transactions  with  values  that  are   normal  but  higher  or  lower  than   expected   • Might  review  these  weekly 17

Slide 18

Slide 18 text

WARN • Undesired  behavior  that  isn’t   necessarily  wrong   • Calling  deprecated  APIS   • Unexpected  null  result  sets 18

Slide 19

Slide 19 text

ERROR • Runtime  logic  errors   • Unexpected  invalid  arguments   • Caught  exceptions   • Doesn’t  require  immediate   attention   • Look  at  these  daily 19

Slide 20

Slide 20 text

CRITICAL • First  level  where  you  should   consider  real-­‐time  notifications   • Unable  to  connect  to  a  3rd  party   service   • Connection  timeouts   • High  latency 20

Slide 21

Slide 21 text

ALERT • Application  is  partially  down  or   non-­‐functional   • Failed  to  connect  to  a  critical   internal  resource   • This  should  send  SMS  messages,   wake  people  up   • Recommend  a  time  threshold 21

Slide 22

Slide 22 text

EMERGENCY • Everything  has  gone  to  hell   • Hardware  failures   • Wake  everyone  up,  keep  calling   until  someone  acknowledges   • Rare  to  see  this,  because  logging   has  probably  also  failed 22

Slide 23

Slide 23 text

PHP  Logging  Software • Monolog     • Pretty  much  everyone  uses  this   one   • Log4PHP   • Pretty  much  no  one  uses  this  one   • The  one  that  comes  with  your   favorite  framework 23

Slide 24

Slide 24 text

Basic  Monolog  Setup 24

Slide 25

Slide 25 text

Useful  Monolog  Setup 25 pushHandler(new Monolog\Handler\StreamHandler($file));

Slide 26

Slide 26 text

SAPI-­‐aware  Monolog 26 $sapi = php_sapi_name(); $loggerName = php_sapi_name() == 'cli' ? "myapp-cli" : "myapp-web"; $logger = new \Monolog\Logger($loggerName); if ($sapi == 'cli') { $logger->pushHandler(new \Monolog\Handler\StreamHandler("php:// stdout")); } else { // file setup here, touch, chmod, etc $logger->pushHandler(new Monolog\Handler\StreamHandler($file)); }

Slide 27

Slide 27 text

Add  extra  info  to  your  logs 27 $logger->pushProcessor(function($record) { $record["extra"] = array(gethostname()); return $record; });

Slide 28

Slide 28 text

Logging  to  a  Service 28 $handler = new Monolog\Handler\SyslogUdpHandler('data.logentries.com', 12345); $handler->setFormatter(new \Monolog\Formatter\LineFormatter()); $logger->pushHandler($handler);

Slide 29

Slide 29 text

Environment-­‐Aware  Log  Levels 29 if (APPLICATION_ENV == 'production') { $udpHandler = new Monolog\Handler \SyslogUdpHandler('data.logentries.com', 12345, \Monolog\Logger::INFO); $udpHandler->setFormatter(new \Monolog\Formatter\LineFormatter()); $emailHandler = new Monolog\Handler\SwiftMailerHandler($swiftMailer, \Monolog\Logger::ALERT); $logger->pushHandler($udpHandler); $logger->pushHandler($emailHandler); }

Slide 30

Slide 30 text

Sample  Log  File 30 [2015-03-27 14:49:21] orca-web.DEBUG: {“type”:”view”,”data":{"buoy": 1496827310190429296,"path":"\/sears\/" [2015-03-29 22:30:05] orca-web.INFO: orca.pages.all.render_time: 0.12709784507751|ms|@1.000 [] [“vagrant-ubuntu 
 [2015-03-29 22:30:05] orca-web.INFO: orca.pages.all.views.anonymous:1| c|@1.000 [] [“vagrant-ubuntu-trusty-64"] 
 [2015-03-30 15:40:26] orca-web.INFO: Captcha failed for 10.0.2.2; Requested [] [“vagrant-ubuntu-trusty-64"] 
 [2015-03-30 15:52:59] orca-web.INFO: Captcha Passed: 10.0.2.2; Requested [] [“vagrant-ubuntu-trusty-64"] 
 [2015-03-30 15:53:00] orca-web.ALERT: ELASTICSEARCH ERROR With Path: / error: {"error":"IndexMissingException[[o

Slide 31

Slide 31 text

STATS 31

Slide 32

Slide 32 text

Metrics  Collection • Everyone  likes  graphs   • Data  visualizations  help  you  spot   outliers  in  real-­‐time   • Create  a  dashboard  that  displays   them 32

Slide 33

Slide 33 text

Example  Baseline  metrics • PHP  execution  time   • PHP  memory  usage   • Number  of  database  queries  per   request   • Job  queue  length   • Time  to  process  jobs   • Emails  sent 33

Slide 34

Slide 34 text

Application  Metrics • User  logins  /  failed  logins   • Password  resets   • Page  views  for  key  pages   • Deployments   • Caught  exceptions   • Overall  page  views 34

Slide 35

Slide 35 text

Statsd  /  Graphite • Statsd  is  a  node.js  app  that  collects   stats  from  your  application   • Graphite  is  a  visualization  tool  that   lets  you  access  information  from   Statsd  in  graph  form 35

Slide 36

Slide 36 text

Graphite  UI 36

Slide 37

Slide 37 text

Graphite  Example 37

Slide 38

Slide 38 text

Types  of  Statsd  Metrics • Counters   • Timers   • Gauges   • Sets 38

Slide 39

Slide 39 text

Examples  of  Counters • Count  every  request   • Count  every  transactional  email   sent   • Count  every  job  from  your  job   queue  by  type   • Count  every  caught  exception 39

Slide 40

Slide 40 text

Examples  of  Timers • Time  your  index.php  at  the  top  and   bottom   • Time  your  crontabs,  especially   overnight  ones   • You  can  even  submit  timers  for   multi-­‐page  events  (conversion   funnels,  etc) 40

Slide 41

Slide 41 text

Metric  Naming • .  delimited  names   • Think  of  it  like  namespaces   • Plan  ahead   • Use  a  top-­‐level  namespace  per-­‐app   (client-­‐side) 41

Slide 42

Slide 42 text

Send  PHP  data  to  Statsd 42 $connection = new Domnikl\Statsd\Connection\UdpSocket('localhost'); $client = new Domnikl\Statsd\Client($connection, 'myapp');

Slide 43

Slide 43 text

Time  your  “Page  Render” 43 startTiming(‘render_time'); $application = new Application(); $response = $application->dispatch(); echo $response; $client->endTiming('render_time');

Slide 44

Slide 44 text

Page  Render  Example 44

Slide 45

Slide 45 text

Count  Your  Pageviews 45 $connection = new Domnikl\Statsd\Connection\UdpSocket('localhost'); $client = new Domnikl\Statsd\Client($connection, 'orca'); $client->startTiming('render_time'); $application = new Application(); $response = $application->dispatch(); echo $response; $client->increment('pageviews'); $client->endTiming('render_time');

Slide 46

Slide 46 text

Pageview  Count  Example 46

Slide 47

Slide 47 text

Job  Queue  Example 47 class Worker { protected $statsd; public function run($job) { try { $this->processJob($job); $this->statsd->increment("worker.success"); } catch (\Exception $e) { $this->buryJob($job); $this->statsd->increment("worker.buried"); } } }

Slide 48

Slide 48 text

Job  Queue  Example 48

Slide 49

Slide 49 text

Logs  vs  Stats • Why  not  both?   • Logs  are  searchable   • Stats  are  graph-­‐able,  visual   • Make  sure  you  can  correlate  logs   and  stats 49

Slide 50

Slide 50 text

Make  it  Actionable • You  have  to  actually  look  at  this   stuff   • Identify  problems  with  stats   • Investigate  problems  with  logs   • Revisit  your  data  collection  when   you  encounter  anything  serious   • Get  tools  to  help  you 50

Slide 51

Slide 51 text

Got  Budget? 51

Slide 52

Slide 52 text

QUESTIONS Anyone  have 52

Slide 53

Slide 53 text

JOIND.IN/13548 I’d  love  your  feedback: 53