Speeding up without slowing down

Speeding up without slowing down

At FT we built one of the world's fastest media websites, and release to production dozens of times a day. But the architectural and organisational decisions aimed at allowing us to deliver reliable features quickly and consistently don't always fit neatly with our desire to optimise performance.

In this warts-and-all talk, you'll learn
- how we build FT.com
- how a highly componentised, microservices stack with a rapid release cycle can sometimes get in the way of performance
- some admissions of where we've gone wrong, and how we try to fix that

6fe43e0038cf0e5579b549d417d4f3ec?s=128

Rhys Evans

May 10, 2018
Tweet

Transcript

  1. Speeding up without slowing down Building a faster ft.com... fast

    Rhys Evans Principal Engineer, Financial Times @wheresrhys
  2. @wheresrhys www.ft.com

  3. @wheresrhys http://projects.hearstnp.com/performance/

  4. We’re hiring!

  5. 1. Our tech stack - what and why 2. Resolving

    conflicts with perf i. Microservices in the critical path - the FT paywall ii. Loading and caching a distributed front end 3. Group therapy
  6. Our tech stack - what and why

  7. @wheresrhys The old ‘falcon’ site

  8. @wheresrhys

  9. @wheresrhys • Multiple environments • One big bang, cluster-bug release

    a month • Proliferation of hacks to bypass releases Strategic Products team formed to bring about change Slow and risky to develop
  10. If we ever need to start from scratch again, we’ve

    gone badly wrong
  11. 5 pillars of FT.com 1. Take back control 2. Straight

    to prod deployment 3. Feature flags 4. Microservices 5. Componentisation
  12. @wheresrhys Take back control Falcon was a free for all

    of: • Tag managers • Third party bloatware and vulnerabilities • Unvalidated ideas nobody ever switched off Tech insisted on more control of what made it on to FT.com
  13. @wheresrhys Straight to prod deployment 2 environments – local and

    production. From merged PR to production ≈ 10 minutes. GitHub CircleCI Heroku
  14. @wheresrhys Why? • Smaller releases ⇒ bugs easier to find

    and fix • Few environment/config bugs • “Build the right it before you build it right” • Rewarding dev experience Straight to prod deployment
  15. @wheresrhys Feature flags • Hide work in progress • QA

    in production • Split big features into smaller releases Flags mean we don’t break too much stuff
  16. @wheresrhys Microservices • Easy to comprehend and test • Confidence

    and speed deploying • Scaling is trivial • Fault tolerance through isolation
  17. @wheresrhys Componentisation • Easy to comprehend and test • Avoids

    duplication of effort • Consistent branding across many sites • High standards e.g. accessibility
  18. These practices free us to release quality software quickly, with

    confidence
  19. Resolving conflicts with perf, part 1: Microservices in the critical

    path - the FT paywall
  20. @wheresrhys User-facing app Dependency Dependency Dependency The critical path =

    every request needed to send a useful response to the user Microservices = more requests ...so, bad for perf? The critical path
  21. @wheresrhys Caching can greatly reduce the number of requests Optimising

    the critical path becomes a cache hit rate problem Caching the critical path User-facing app Dependency Dependency Dependency Cache
  22. @wheresrhys Can we cache a paywall?

  23. @wheresrhys Cookie: FT_Session=f12a87eb00e7ec94; FT_edition=uk;FT_AB_Tests=canIHaveAPup py=on; CDN (Fastly) Application Vary: Cookie

    Preflight Unique per user
  24. @wheresrhys Cookie: FT_Session=f12a87 eb00e7ec94; FT_edition=uk; FT_AB_Tests=canIH aveAPuppy=on; Preflight Families of

    users FT-Authorized: true FT-Edition: uk FT-AB-Tests: canIHaveAPuppy=on
  25. @wheresrhys Vary: FT-Authorized, FT-Edition, FT-AB-Tests Cookie: FT_Session=f12a87eb00e7ec94; FT_edition=uk;FT_AB_Tests=canIHaveAPup py=on; CDN

    (Fastly) Application FT-Authorized: true FT-Edition: uk FT-AB-Tests: canIHaveAPuppy=on Preflight
  26. @wheresrhys Vary: FT-Authorized, FT-Edition, FT-AB-Tests Cookie: FT_Session=f12a87eb00e7ec94; FT_edition=uk;FT_AB_Tests=canIHaveAPup py=on; CDN

    (Fastly) Application FT-Authorized: true FT-Edition: uk FT-AB-Tests: canIHaveAPuppy=on Preflight Highly reusable cache Perf bottleneck Not in the critical path (when cache hit)
  27. @wheresrhys Preflight Session Access Barriers Vanity urls A/B testing

  28. @wheresrhys 1. Find the slowest service 2. Whack it 3.

    Repeat Microservice Whack-a-mole
  29. @wheresrhys Measure everything • Make measuring new things easy •

    Be as granular as you can • Medians and percentiles • Count timeouts
  30. @wheresrhys Geography matters • Don’t leave the building • Multiple

    regions • Look out for DNS & routing bugs
  31. @wheresrhys • Pool connections ◦ Nodejs - HTTP agent keepAlive

    • Find commonalities ◦ Poll, memoize, hard code, ... • Timeout Code impatiently
  32. @wheresrhys • Impose a perf budget to avoid regressions •

    Persuade the business with data • The case we made http://bit.ly/2zmGZ4H Don’t slow down
  33. @wheresrhys • Median response < 20ms • Max response =

    200ms • Cache hit ratio ≈ 90% ◦ Resilience ◦ Cost savings The results
  34. @wheresrhys We pay a small complexity tax CDN (Fastly) Application

    Preflight CDN (Fastly) Application ‘Post-flight’
  35. But we didn’t compromise on any of our development principles

  36. Resolving conflicts with perf, part 2: Loading and caching a

    distributed front end
  37. @wheresrhys • Cache assets between visits • Cache assets between

    pages • Implement modern best practices ◦ responsive images ◦ lazy loading ◦ inline critical CSS ◦ ... Front-end perf fundamentals
  38. @wheresrhys Shared js/css across pages: 0% Shared js/css between visits:

    ≈ 0% Inlined css: 0% So, ft.com… ?
  39. @wheresrhys • Frequent releases + components = asset churn •

    Hard to share between independent builds • Identifying CSS to inline is Why so bad? App1 App2 Build1 Assets1
  40. @wheresrhys • But we semver • In-house teams release responsibly

    • Rewarded with consistency and efficiency Pinning versions would make it harder to release software Pinned versions?
  41. @wheresrhys Monolith front-end? • Combines all gremlins into an unruly

    mass • Increases the impact of ◦ Blocked builds ◦ Buggy releases A monolith would make it harder to release software
  42. github.com/Financial-Times/n-ui

  43. @wheresrhys n-ui CDN serving assets unique to each app Bundle

    of preconfigured components used in all our apps npm and Bower component Server with knowledge of all relevant assets and tools Build tool with rudimentary JS and stylesheet splitting App1 CDN serving shared assets App2 Templates and asset loading tools running in the browser Deploy tool for delivering assets to the CDN Inlined critical CSS
  44. Still no compromise

  45. Has anyone managed to recently successfully bower link/npm link n-ui

    in an app? problem started occurring after a n-ui update that’s presumably from an n-ui update? just pushing a bug fix in n-ui right now what I did see was n-ui as a dev-dependency, but I guess that is still going to lead to mind-melting what is going on with n-ui? Apologies on the n-ui issues all, am on it. Also added to my todo list to stop us breaking stuff I’ve *really* had enough of trying to get n-ui updates to work how to fix this motherf***ing `Projects using n-ui must maintain parity between versions` error?! could someone get a grip on whoever is f**king with the styling on the site this week If I can get the build to pass! Damn n-ui!!!
  46. Group therapy

  47. @wheresrhys We allowed a performance bubble to develop, and lost

    sight of SO many things
  48. @wheresrhys Complexity tax vs value delivered preflight n-ui

  49. @wheresrhys Speed Difficulty Shared JS bundle Inlined CSS Preflight Lazy

    loaded fonts Service worker Lazy loaded images
  50. Performance is not a core front-end skill It’s weird, complex

    and optional
  51. – Malte Ubl Designing very large (JavaScript) applications https://t.co/q7Lplkthnr “Empathy

    and experience is what enables you to choose the right abstractions for your application”
  52. @wheresrhys Before n-ui After n-ui

  53. @wheresrhys • Optimisation is opt-in • “Most straightforward” way still

    works • Simple, low-level API more flexible and powerful
  54. @wheresrhys Bad Good Hide complexity Teach skills ‘Clever’ abstractions Syntactic

    sugar Speak to ninjas Speak to noobs Big bang Bit by bit
  55. @wheresrhys Weigh up the pros and cons for your whole

    team Speed Difficulty Shared JS bundle Lazy lo image Lazy loade fonts Service worker
  56. Conclusion

  57. Releasing small and often has great benefits

  58. Perf optimisation can clash with this

  59. Work with your team to find the right balance

  60. You don’t have to be perfect to get great results

  61. Thanks! Rhys Evans Principal Engineer, Financial Times @wheresrhys