Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Embracing failure on the front-end

Embracing failure on the front-end

Clay Smith

July 25, 2014
Tweet

More Decks by Clay Smith

Other Decks in Programming

Transcript

  1. 1/11/16 What this talk covers NOT COVERED: MY RECIPE FOR

    TEXAS-STYLE BEEF CHILI. FIND ME AFTER TO TALK ABOUT IT. The inevitability that Javascript apps will break. Borrowing good ideas about failure from operations teams. A bit about the theory of complex systems failure. Open-source tools and services that help make apps more resilient. Why talking about failure in the front-end is important.
  2. 1/11/16 GOOGLE TRENDS ALL THE THINGS One trend is twice

    as popular as the other trend on average.
  3. 1/11/16 DR. COOK IS MY HERO RICHARD I. COOK, MD.

    HOW COMPLEX SYSTEMS FAIL. “Complex systems are intrinsically hazardous systems.” SOME THEORY, PART 1
  4. 1/11/16 “Exception” tracking with window.onerror MAY YOU NEVER HAVE TO

    SEE THIS DIALOG AGAIN DANGER: THIS GETS PRETTY UGLY.
  5. 1/11/16 So you want to use a 3rd party service…

    SERIOUSLY, PAUL IRISH APPEARS IN ALL MY TALKS. THERE ARE LOTS: HTTPS://PLUS.GOOGLE.COM/+PAULIRISH/POSTS/12BVL5EXFJN
  6. 1/11/16 NS_TOO_MUCH_NOISE. NOT REALLY SURE WHY I REDACTED THE URLS.

    FURTHER READING: HTTP://BLOG.MELDIUM.COM/HOME/2013/9/30/SO-YOURE-THINKING-OF-TRACKING-YOUR-JS-ERRORS Example window.onerror output
  7. 1/11/16 DOES THIS SOUND LIKE COMMON SENSE YET? "Change introduces

    new forms of failure." RICHARD I. COOK, MD. HOW COMPLEX SYSTEMS FAIL. SOME THEORY, PART 2
  8. 1/11/16 Monitor change with phantomas CREEPY PICTURE, NO? I BET

    HE WRITES ERLANG. I ALSO DON'T KNOW HOW TO SAY PHANTOMAS. HTTPS://GITHUB.COM/MACBRE/PHANTOMAS JEAN MARAIS AS FANTÔMAS IN THE 1964 FILM. Phantomas is “PhantomJS-based web performance metrics collector and monitoring tool”. phantomas --cookie '_session=<redacted>' --reporter=statsd --statsd-host 127.0.0.1 --statsd-prefix stg --runs 5 http://staging-web.com
  9. 1/11/16 How to get super-detailed site metrics… if you’re lazy

    and cheap. 5 HABITS OF HIGHLY LAZY FRONT-END PERFORMANCE ENGINEERS Cloud server/your laptop with phantomas installed Cron job that runs phantomas with statsd output DataDog Lite Account + Install DataDog Agent on Server Configure Alerting (I recommend PagerDuty) Get woken up at 3am
  10. 1/11/16 Make the metrics understandable and actionable THIS LOOKS IMPRESSIVE

    WHILE YOU READ HACKER NEWS ON YOUR OTHER MONITOR TESTING DASHBOARD FOR STAGING ENVIRONMENT IN DATADOG. EVEN FANCIER: INTEGRATE IT INTO YOUR WEB APP: HTTPS://GITHUB.COM/BLOG/1252-HOW-WE-KEEP-GITHUB-FAST
  11. 1/11/16 Get alerted as things happen YOU'LL BE ANGRY AT

    ME WHEN THIS WAKES YOU UP AT 3AM CREATING A NEW METRIC ALERT IN DATADOG Choose a phantomas metric Define conditions
  12. 1/11/16 SAY THIS THE NEXT TIME YOU BLOW SOMETHING UP.

    “Failure free operations require experience with failure.” RICHARD I. COOK, MD. HOW COMPLEX SYSTEMS FAIL. See also: https://blog.pagerduty.com/2013/11/failure-friday-at-pagerduty/ SOME THEORY, PART 3
  13. 1/11/16 Inject chaos into your front-end ORIGINAL GRAPHIC SLIGHTLY REDACTED

    HTTPS://GITHUB.COM/TRAVIS-HILTERBRAND/CHAOS-MONKEY-BROWSER HTTPS://GITHUB.COM/MIKL/NODE-CHAOS-MONKEYWARE
  14. 1/11/16 EMBRACING FAILURE ON THE FRONT-END var props = {

    probability:0.5, allowedMethods:['GET'], mischiefTypes:[ ChaosMonkey.MischiefTypes.delay, ChaosMonkey.MischiefTypes.http403 ] }; ChaosMonkey(props); CONFIGURING CHAOS-MONKEY- BROWSER (*JQUERY REQUIRED) With a 50% probability, this configuration will cause jQuery ajax GET requests to slowly fail with a 403 response. CDN Failure API Failure Connection Failure Bad SSL certificates And more! Prepares for:
  15. 1/11/16 Other possible strategies HOW TO ANNOY PEOPLE DURING CODE

    REVIEW 1. DISABLE/SLOW DOWN NETWORK CONNECTION (IN CHROME CANARY DEVTOOLS): 2. WHAT HAPPENS WHEN YOU DISABLE JS? (USING PLUGIN RECOMMENDED): AMAZON.COM ISN’T HAPPY WITHOUT JAVASCRIPT
  16. 1/11/16 Lessons learned in failure SERIOUSLY, REMEMBER ONE OF THESE

    THINGS Measure errors and key performance metrics over time Bad performance = failure Annoy yourself to fix the broken things with alerting Find remediation steps to make sure it doesn’t happen again Get experience with failure before 7pm on a Friday
  17. 1/11/16 Thanks! @smithclay [email protected] Additional resources (more reading): • https://info.aiaa.org/tac/SMG/SOSTC/Shared%20Documents/How%20Complex%20Systems%20Fail.pdf

    • http://blog.meldium.com/home/2013/9/30/so-youre-thinking-of-tracking-your-js-errors • https://blog.pagerduty.com/2013/11/failure-friday-at-pagerduty/