Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Speeding up without slowing down

Speeding up without slowing down

At FT we built one of the world's fastest media websites, and release to production dozens of times a day. But the architectural and organisational decisions aimed at allowing us to deliver reliable features quickly and consistently don't always fit neatly with our desire to optimise performance.

In this warts-and-all talk, you'll learn
- how we build FT.com
- how a highly componentised, microservices stack with a rapid release cycle can sometimes get in the way of performance
- some admissions of where we've gone wrong, and how we try to fix that

Rhys Evans

May 10, 2018
Tweet

More Decks by Rhys Evans

Other Decks in Technology

Transcript

  1. Speeding up without slowing down Building a faster ft.com... fast

    Rhys Evans Principal Engineer, Financial Times @wheresrhys
  2. 1. Our tech stack - what and why 2. Resolving

    conflicts with perf i. Microservices in the critical path - the FT paywall ii. Loading and caching a distributed front end 3. Group therapy
  3. @wheresrhys • Multiple environments • One big bang, cluster-bug release

    a month • Proliferation of hacks to bypass releases Strategic Products team formed to bring about change Slow and risky to develop
  4. 5 pillars of FT.com 1. Take back control 2. Straight

    to prod deployment 3. Feature flags 4. Microservices 5. Componentisation
  5. @wheresrhys Take back control Falcon was a free for all

    of: • Tag managers • Third party bloatware and vulnerabilities • Unvalidated ideas nobody ever switched off Tech insisted on more control of what made it on to FT.com
  6. @wheresrhys Straight to prod deployment 2 environments – local and

    production. From merged PR to production ≈ 10 minutes. GitHub CircleCI Heroku
  7. @wheresrhys Why? • Smaller releases ⇒ bugs easier to find

    and fix • Few environment/config bugs • “Build the right it before you build it right” • Rewarding dev experience Straight to prod deployment
  8. @wheresrhys Feature flags • Hide work in progress • QA

    in production • Split big features into smaller releases Flags mean we don’t break too much stuff
  9. @wheresrhys Microservices • Easy to comprehend and test • Confidence

    and speed deploying • Scaling is trivial • Fault tolerance through isolation
  10. @wheresrhys Componentisation • Easy to comprehend and test • Avoids

    duplication of effort • Consistent branding across many sites • High standards e.g. accessibility
  11. @wheresrhys User-facing app Dependency Dependency Dependency The critical path =

    every request needed to send a useful response to the user Microservices = more requests ...so, bad for perf? The critical path
  12. @wheresrhys Caching can greatly reduce the number of requests Optimising

    the critical path becomes a cache hit rate problem Caching the critical path User-facing app Dependency Dependency Dependency Cache
  13. @wheresrhys Vary: FT-Authorized, FT-Edition, FT-AB-Tests Cookie: FT_Session=f12a87eb00e7ec94; FT_edition=uk;FT_AB_Tests=canIHaveAPup py=on; CDN

    (Fastly) Application FT-Authorized: true FT-Edition: uk FT-AB-Tests: canIHaveAPuppy=on Preflight
  14. @wheresrhys Vary: FT-Authorized, FT-Edition, FT-AB-Tests Cookie: FT_Session=f12a87eb00e7ec94; FT_edition=uk;FT_AB_Tests=canIHaveAPup py=on; CDN

    (Fastly) Application FT-Authorized: true FT-Edition: uk FT-AB-Tests: canIHaveAPuppy=on Preflight Highly reusable cache Perf bottleneck Not in the critical path (when cache hit)
  15. @wheresrhys Measure everything • Make measuring new things easy •

    Be as granular as you can • Medians and percentiles • Count timeouts
  16. @wheresrhys • Pool connections ◦ Nodejs - HTTP agent keepAlive

    • Find commonalities ◦ Poll, memoize, hard code, ... • Timeout Code impatiently
  17. @wheresrhys • Impose a perf budget to avoid regressions •

    Persuade the business with data • The case we made http://bit.ly/2zmGZ4H Don’t slow down
  18. @wheresrhys • Median response < 20ms • Max response =

    200ms • Cache hit ratio ≈ 90% ◦ Resilience ◦ Cost savings The results
  19. @wheresrhys We pay a small complexity tax CDN (Fastly) Application

    Preflight CDN (Fastly) Application ‘Post-flight’
  20. @wheresrhys • Cache assets between visits • Cache assets between

    pages • Implement modern best practices ◦ responsive images ◦ lazy loading ◦ inline critical CSS ◦ ... Front-end perf fundamentals
  21. @wheresrhys • Frequent releases + components = asset churn •

    Hard to share between independent builds • Identifying CSS to inline is Why so bad? App1 App2 Build1 Assets1
  22. @wheresrhys • But we semver • In-house teams release responsibly

    • Rewarded with consistency and efficiency Pinning versions would make it harder to release software Pinned versions?
  23. @wheresrhys Monolith front-end? • Combines all gremlins into an unruly

    mass • Increases the impact of ◦ Blocked builds ◦ Buggy releases A monolith would make it harder to release software
  24. @wheresrhys n-ui CDN serving assets unique to each app Bundle

    of preconfigured components used in all our apps npm and Bower component Server with knowledge of all relevant assets and tools Build tool with rudimentary JS and stylesheet splitting App1 CDN serving shared assets App2 Templates and asset loading tools running in the browser Deploy tool for delivering assets to the CDN Inlined critical CSS
  25. Has anyone managed to recently successfully bower link/npm link n-ui

    in an app? problem started occurring after a n-ui update that’s presumably from an n-ui update? just pushing a bug fix in n-ui right now what I did see was n-ui as a dev-dependency, but I guess that is still going to lead to mind-melting what is going on with n-ui? Apologies on the n-ui issues all, am on it. Also added to my todo list to stop us breaking stuff I’ve *really* had enough of trying to get n-ui updates to work how to fix this motherf***ing `Projects using n-ui must maintain parity between versions` error?! could someone get a grip on whoever is f**king with the styling on the site this week If I can get the build to pass! Damn n-ui!!!
  26. @wheresrhys Speed Difficulty Shared JS bundle Inlined CSS Preflight Lazy

    loaded fonts Service worker Lazy loaded images
  27. – Malte Ubl Designing very large (JavaScript) applications https://t.co/q7Lplkthnr “Empathy

    and experience is what enables you to choose the right abstractions for your application”
  28. @wheresrhys • Optimisation is opt-in • “Most straightforward” way still

    works • Simple, low-level API more flexible and powerful
  29. @wheresrhys Bad Good Hide complexity Teach skills ‘Clever’ abstractions Syntactic

    sugar Speak to ninjas Speak to noobs Big bang Bit by bit
  30. @wheresrhys Weigh up the pros and cons for your whole

    team Speed Difficulty Shared JS bundle Lazy lo image Lazy loade fonts Service worker