What the logs don’t tell you… how averages hide the problems that hit users and developers

What the logs don’t tell you… how averages hide the problems that hit users and developers

by David O'Neill @ APIStrat 2014 in Chicago

Transcript

  1. 1 © 2013 APImetrics, Inc. All Rights Reserved. Confidential. www.apimetrics.io

    What the logs don’t tell you…. David O’Neill | david@apimetrics.com | +1 206 972 1140
  2. 2 © 2013 APImetrics, Inc. All Rights Reserved. Confidential. www.apimetrics.io

    Disclaimer I am not a statistician and nor do I play one in the movies.
  3. 3 © 2013 APImetrics, Inc. All Rights Reserved. Confidential. www.apimetrics.io

    Testing… it’s for life… David’s ‘Laws’ 1.  An API deployed doesn’t necessarily stay deployed 2.  The reliability of an API is proportional to the entertaining things its users can think of to break it, and the number of users doing things 3.  The use you get from your server logs is a function of the size of your user base and the amount of data actually in them
  4. 4 © 2013 APImetrics, Inc. All Rights Reserved. Confidential. www.apimetrics.io

    End-to-End or just ‘end’? What are you interested in? That your servers are serving? Or… Can your users can use it? Our focus is on the impact on users
  5. 5 © 2013 APImetrics, Inc. All Rights Reserved. Confidential. www.apimetrics.io

    Where are your primary users in relation to where the data is served from? Don’t rely on past trends… Where did I put that server farm anyway?
  6. 6 © 2013 APImetrics, Inc. All Rights Reserved. Confidential. www.apimetrics.io

    Averages what are they hiding?
  7. 7 © 2013 APImetrics, Inc. All Rights Reserved. Confidential. www.apimetrics.io

    Can a cache impact users? Yes… •  Before Cache – 2,000ms+ responses to remote calls •  After – 250ms… 10x improvement on remote calls •  But the server response remains the same
  8. 8 © 2013 APImetrics, Inc. All Rights Reserved. Confidential. www.apimetrics.io

    When HTTP-200 is not all OK API gives a HTTP-200 code but the JSON returned is invalid
  9. 9 © 2013 APImetrics, Inc. All Rights Reserved. Confidential. www.apimetrics.io

    How many failures is an issue? And do you see them? -  these are timeouts… no HTTP code -  Outage took over a week to resolve
  10. 10 © 2013 APImetrics, Inc. All Rights Reserved. Confidential. www.apimetrics.io

    Does slow matter? Not always… A more normal distribution here… very low frequency of slow responses
  11. 11 © 2013 APImetrics, Inc. All Rights Reserved. Confidential. www.apimetrics.io

    Does slow matter? Well, sometimes… •  Flat profile across a wide range of times and long latencies… •  Lots the wrong side of the graph…
  12. 12 © 2013 APImetrics, Inc. All Rights Reserved. Confidential. www.apimetrics.io

    Top Tips •  Testing and monitoring is essential before, during and after deployment •  Server logs, especially if you have a lot of traffic can bury issues in the noise •  Not all problems are equal •  APIs don’t behave like websites and problems can be less obvious •  The first time you know about a bad API is often when the complaints start
  13. 13 © 2013 APImetrics, Inc. All Rights Reserved. Confidential. www.apimetrics.io

    David O’Neill | +1 206 972 1140 | david@apimetrics.com