Building Resilient Front End Systems (Smashingconf)

649e3d33ce29a5e6bb4ff3025c6aaffa?s=47 Ianfeather
September 10, 2018

Building Resilient Front End Systems (Smashingconf)

649e3d33ce29a5e6bb4ff3025c6aaffa?s=128

Ianfeather

September 10, 2018
Tweet

Transcript

  1. BUILDING RESILIENT FRONTEND SYSTEMS Ian Feather - BuzzFeed - @ianfeather

  2. None
  3. RESILIENCE IS FUNCTION IN A HOSTILE ENVIRONMENT

  4. GUARANTEE THE MOST BASIC LEVEL OF UX

  5. UNDERSTAND YOUR TIERS OF USER EXPERIENCE

  6. 1. HOW OUR SYSTEMS FAIL 2. DESIGNING FOR FAILURE 3.

    MITIGATING RISK 4. LEARNING
  7. HOW OUR SYSTEMS FAIL SECTION 1

  8. HOW OUR SYSTEMS FAIL 1. MALICIOUS INTERFERENCE

  9. HTTPS IS TABLE STAKES

  10. None
  11. HTTPS IS TABLE STAKES

  12. HOW OUR SYSTEMS FAIL 1. MALICIOUS INTERFERENCE

  13. HOW OUR SYSTEMS FAIL 1. MALICIOUS INTERFERENCE 2. 3RD PARTY

    AVAILABILITY
  14. CONTROL YOUR POINTS OF FAILURE

  15. 2016 DYN DNS 5 HRS AWS s3 9 HRS 2017

    Fastly CDN 1 HR AWS S3 2 hrs
  16. HOW OUR SYSTEMS FAIL 1. MALICIOUS INTERFERENCE 2. 3RD PARTY

    AVAILABILITY
  17. HOW OUR SYSTEMS FAIL 1. MALICIOUS INTERFERENCE 2. 3RD PARTY

    AVAILABILITY 3. DEVELOPER ERROR
  18. None
  19. ADD SLIDE ABOUT SENTRY

  20. SLACK ALERTS

  21. KNOWING IT’S BROKEN BEFORE TWITTER DOES

  22. None
  23. None
  24. THEORY VS PRACTICE

  25. HOW OUR SYSTEMS FAIL 1. MALICIOUS INTERFERENCE 2. 3RD PARTY

    AVAILABILITY 3. DEVELOPER ERROR
  26. HOW OUR SYSTEMS FAIL 1. MALICIOUS INTERFERENCE 2. 3RD PARTY

    AVAILABILITY 3. DEVELOPER ERROR 4. THE NETWORK
  27. THEORY VS PRACTICE

  28. THEORY VS PRACTICE

  29. ~1% OF REQUESTS FOR JAVASCRIPT WILL TIMEOUT

  30. 13 MILLION REQUESTS FOR JAVASCRIPT WILL TIMEOUT

  31. HOW OUR SYSTEMS FAIL 1. MALICIOUS INTERFERENCE 2. 3RD PARTY

    AVAILABILITY 3. DEVELOPER ERROR 4. THE NETWORK
  32. HOW OUR SYSTEMS FAIL 1. MALICIOUS INTERFERENCE 2. 3RD PARTY

    AVAILABILITY 3. DEVELOPER ERROR 4. THE NETWORK 5. USER’S PRIVILEGE
  33. ~9% OF OUR USERS USE SOME FORM OF CONTENT BLOCKER

  34. ~4% WON’T SUCCESSFULLY DOWNLOAD OUR FONTS

  35. 40 MILLION PAGEVIEWS PER MONTH

  36. None
  37. HOW OUR SYSTEMS FAIL 1. MALICIOUS INTERFERENCE 2. 3RD PARTY

    AVAILABILITY 3. DEVELOPER ERROR 4. THE NETWORK 5. USER’S PRIVILEGE
  38. HOPE FOR THE BEST?

  39. “technical glitches… cost the e-commerce giant an estimated $1.2 million

    a minute”
  40. DESIGN FOR FAILURE SECTION 2

  41. DESIGN FOR FAILURE 1. PRIORITIZE CRITICAL PARTS OF THE PAGE

  42. User FONTS html IMAGES DATA (xhr) IMAGES CSS JS IMAGES

    Images HTML
  43. None
  44. None
  45. None
  46. DESIGN FOR FAILURE 1. PRIORITIZE CRITICAL PARTS OF THE PAGE

  47. DESIGN FOR FAILURE 1. PRIORITIZE CRITICAL PARTS OF THE PAGE

    2. MAKE ERRORS A FIRST CLASS CITIZEN
  48. None
  49. SOMETHING BROKE! SHOULD I TELL THEM?

  50. IT BROKE. SHOULD I TELL THEM?

  51. None
  52. DESIGN FOR FAILURE 1. PRIORITIZE CRITICAL PARTS OF THE PAGE

    2. MAKE ERRORS A FIRST CLASS CITIZEN
  53. MITIGATE RISK SECTION 3

  54. MITIGATE RISK 1. LOCK YOUR RUNTIME DEPENDENCIES

  55. { "name": “my-project", "version": "1.0.0", "dependencies": { "node-fetch": "~2.2.0", "node-fetch":

    "^2.2.0", "node-fetch": "2.2.0" } }
  56. CONTROL YOUR POINTS OF FAILURE

  57. None
  58. MITIGATE RISK 1. LOCK YOUR RUNTIME DEPENDENCIES

  59. MITIGATE RISK 1. LOCK YOUR RUNTIME DEPENDENCIES 2. BUILD IN

    REDUNDANCY
  60. HAVE TWO OF EVERYTHING

  61. ✖ Asset SERVER 1 Asset SERVER 2 www.asset-server-two.com/styles.css www.asset-server-one.com/styles.css

  62. Asset SERVER 1 Asset SERVER 2 www.asset-server.com/styles.css Proxy service

  63. CLOUD PROVIDER CDN STATIC ASSET SERVER IMAGE SERVICE POLYFILL SERVICE

    AB TEST SERVICE FONT PROVIDER 2 X ?
  64. PLAN Z

  65. MITIGATE RISK 1. LOCK YOUR RUNTIME DEPENDENCIES 2. BUILD IN

    REDUNDANCY
  66. MITIGATE RISK 1. LOCK YOUR RUNTIME DEPENDENCIES 2. BUILD IN

    REDUNDANCY 3. SERVE STALE CONTENT
  67. SERVER CDN

  68. CDN ✖ SERVICE WORKER SERVER

  69. MITIGATE RISK 1. LOCK YOUR RUNTIME DEPENDENCIES 2. BUILD IN

    REDUNDANCY 3. SERVE STALE CONTENT
  70. LEARN FROM MISTAKES SECTION 4

  71. LEARN FROM MISTAKES 1. POSTMORTEMS

  72. BLAMELESS

  73. HOW DID WE HANDLE IT AS A TEAM?

  74. HOW COULD IT HAVE BEEN PREVENTED?

  75. LEARN FROM MISTAKES 1. POSTMORTEMS

  76. LEARN FROM MISTAKES 1. POSTMORTEMS 2. FIRE DRILLS & CHAOS

    TESTING
  77. FIRE DRILLS ARE A SAFE SPACE TO PRACTICE

  78. 1. LIMIT IMPACT 2. DIRECT COMMUNICATIONS 3. DELEGATE EARLY

  79. CHAOS TESTING

  80. DELIBERATELY INTRODUCE FAILURE TO ENSURE YOUR SYSTEMS ARE RESILIENT

  81. LEARN FROM MISTAKES 1. POSTMORTEMS 2. FIRE DRILLS & CHAOS

    TESTING
  82. IN SUMMARY

  83. KNOW WHAT’S IMPORTANT TO YOUR USERS

  84. IDENTIFY HOW YOUR SYSTEM WILL DEGRADE

  85. IDENTIFY POINTS OF FAILURE AND BUILD IN FAIL-SAFES

  86. LEARN FROM EVERY FAILURE

  87. THANK YOU IAN FEATHER - BUZZFEED - @IANFEATHER