Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Speeding up without slowing down

Speeding up without slowing down

At FT we built one of the world's fastest media websites, and release to production dozens of times a day. But the architectural and organisational decisions aimed at allowing us to deliver reliable features quickly and consistently don't always fit neatly with our desire to optimise performance.

In this warts-and-all talk, you'll learn
- how we build FT.com
- how a highly componentised, microservices stack with a rapid release cycle can sometimes get in the way of performance
- some admissions of where we've gone wrong, and how we try to fix that

Rhys Evans

May 10, 2018
Tweet

More Decks by Rhys Evans

Other Decks in Technology

Transcript

  1. Speeding up
    without slowing down
    Building a faster ft.com... fast
    Rhys Evans
    Principal Engineer, Financial Times
    @wheresrhys

    View Slide

  2. @wheresrhys
    www.ft.com

    View Slide

  3. @wheresrhys http://projects.hearstnp.com/performance/

    View Slide

  4. We’re hiring!

    View Slide

  5. 1. Our tech stack - what and why
    2. Resolving conflicts with perf
    i. Microservices in the critical path - the FT paywall
    ii. Loading and caching a distributed front end
    3. Group therapy

    View Slide

  6. Our tech stack - what and why

    View Slide

  7. @wheresrhys
    The old ‘falcon’ site

    View Slide

  8. @wheresrhys

    View Slide

  9. @wheresrhys
    ● Multiple environments
    ● One big bang, cluster-bug release a month
    ● Proliferation of hacks to bypass releases
    Strategic Products team formed to bring about
    change
    Slow and risky to develop

    View Slide

  10. If we ever need to start
    from scratch again,
    we’ve gone badly
    wrong

    View Slide

  11. 5 pillars of FT.com
    1. Take back control
    2. Straight to prod deployment
    3. Feature flags
    4. Microservices
    5. Componentisation

    View Slide

  12. @wheresrhys
    Take back control
    Falcon was a free for all of:
    ● Tag managers
    ● Third party bloatware and vulnerabilities
    ● Unvalidated ideas nobody ever switched off
    Tech insisted on more control of what made it on to FT.com

    View Slide

  13. @wheresrhys
    Straight to prod deployment
    2 environments – local and production.
    From merged PR to production ≈ 10 minutes.
    GitHub CircleCI Heroku

    View Slide

  14. @wheresrhys
    Why?
    ● Smaller releases ⇒ bugs easier to find and fix
    ● Few environment/config bugs
    ● “Build the right it before you build it right”
    ● Rewarding dev experience
    Straight to prod deployment

    View Slide

  15. @wheresrhys
    Feature flags
    ● Hide work in progress
    ● QA in production
    ● Split big features into
    smaller releases
    Flags mean we don’t
    break too much stuff

    View Slide

  16. @wheresrhys
    Microservices
    ● Easy to comprehend and
    test
    ● Confidence and speed
    deploying
    ● Scaling is trivial
    ● Fault tolerance through
    isolation

    View Slide

  17. @wheresrhys
    Componentisation
    ● Easy to comprehend and
    test
    ● Avoids duplication of effort
    ● Consistent branding
    across many sites
    ● High standards e.g.
    accessibility

    View Slide

  18. These practices free
    us to release quality
    software quickly, with
    confidence

    View Slide

  19. Resolving conflicts with perf, part 1:
    Microservices in the critical path
    - the FT paywall

    View Slide

  20. @wheresrhys
    User-facing
    app
    Dependency
    Dependency
    Dependency
    The critical path = every request
    needed to send a useful
    response to the user
    Microservices = more requests
    ...so, bad for perf?
    The critical path

    View Slide

  21. @wheresrhys
    Caching can greatly reduce
    the number of requests
    Optimising the critical path
    becomes a cache hit rate
    problem
    Caching the critical path
    User-facing
    app
    Dependency
    Dependency
    Dependency
    Cache

    View Slide

  22. @wheresrhys
    Can we cache a paywall?

    View Slide

  23. @wheresrhys
    Cookie: FT_Session=f12a87eb00e7ec94;
    FT_edition=uk;FT_AB_Tests=canIHaveAPup
    py=on;
    CDN
    (Fastly)
    Application Vary: Cookie
    Preflight Unique per user

    View Slide

  24. @wheresrhys
    Cookie:
    FT_Session=f12a87
    eb00e7ec94;
    FT_edition=uk;
    FT_AB_Tests=canIH
    aveAPuppy=on;
    Preflight
    Families of users
    FT-Authorized: true
    FT-Edition: uk
    FT-AB-Tests:
    canIHaveAPuppy=on

    View Slide

  25. @wheresrhys
    Vary: FT-Authorized,
    FT-Edition, FT-AB-Tests
    Cookie: FT_Session=f12a87eb00e7ec94;
    FT_edition=uk;FT_AB_Tests=canIHaveAPup
    py=on;
    CDN
    (Fastly)
    Application
    FT-Authorized: true
    FT-Edition: uk
    FT-AB-Tests: canIHaveAPuppy=on
    Preflight

    View Slide

  26. @wheresrhys
    Vary: FT-Authorized,
    FT-Edition, FT-AB-Tests
    Cookie: FT_Session=f12a87eb00e7ec94;
    FT_edition=uk;FT_AB_Tests=canIHaveAPup
    py=on;
    CDN
    (Fastly)
    Application
    FT-Authorized: true
    FT-Edition: uk
    FT-AB-Tests: canIHaveAPuppy=on
    Preflight Highly reusable cache
    Perf
    bottleneck
    Not in the critical path
    (when cache hit)

    View Slide

  27. @wheresrhys
    Preflight
    Session
    Access
    Barriers
    Vanity urls
    A/B
    testing

    View Slide

  28. @wheresrhys
    1. Find the slowest service
    2. Whack it
    3. Repeat
    Microservice
    Whack-a-mole

    View Slide

  29. @wheresrhys
    Measure everything
    ● Make measuring new
    things easy
    ● Be as granular as you can
    ● Medians and percentiles
    ● Count timeouts

    View Slide

  30. @wheresrhys
    Geography matters
    ● Don’t leave the building
    ● Multiple regions
    ● Look out for DNS &
    routing bugs

    View Slide

  31. @wheresrhys
    ● Pool connections
    ○ Nodejs - HTTP agent
    keepAlive
    ● Find commonalities
    ○ Poll, memoize, hard
    code, ...
    ● Timeout
    Code impatiently

    View Slide

  32. @wheresrhys
    ● Impose a perf budget to
    avoid regressions
    ● Persuade the business
    with data
    ● The case we made
    http://bit.ly/2zmGZ4H
    Don’t slow down

    View Slide

  33. @wheresrhys
    ● Median response < 20ms
    ● Max response = 200ms
    ● Cache hit ratio ≈ 90%
    ○ Resilience
    ○ Cost savings
    The results

    View Slide

  34. @wheresrhys
    We pay a small complexity tax
    CDN
    (Fastly)
    Application
    Preflight
    CDN
    (Fastly)
    Application
    ‘Post-flight’

    View Slide

  35. But we didn’t
    compromise on any of
    our development
    principles

    View Slide

  36. Resolving conflicts with perf, part 2:
    Loading and caching a distributed
    front end

    View Slide

  37. @wheresrhys
    ● Cache assets between visits
    ● Cache assets between pages
    ● Implement modern best practices
    ○ responsive images
    ○ lazy loading
    ○ inline critical CSS
    ○ ...
    Front-end perf fundamentals

    View Slide

  38. @wheresrhys
    Shared js/css across pages:
    0%
    Shared js/css between visits:
    ≈ 0%
    Inlined css:
    0%
    So, ft.com… ?

    View Slide

  39. @wheresrhys
    ● Frequent releases + components = asset churn
    ● Hard to share between independent builds
    ● Identifying CSS to inline is
    Why so bad?
    App1 App2
    Build1
    Assets1

    View Slide

  40. @wheresrhys
    ● But we semver
    ● In-house teams release
    responsibly
    ● Rewarded with
    consistency and efficiency
    Pinning versions would make
    it harder to release software
    Pinned versions?

    View Slide

  41. @wheresrhys
    Monolith front-end?
    ● Combines all gremlins into
    an unruly mass
    ● Increases the impact of
    ○ Blocked builds
    ○ Buggy releases
    A monolith would make it
    harder to release software

    View Slide

  42. github.com/Financial-Times/n-ui

    View Slide

  43. @wheresrhys
    n-ui
    CDN serving
    assets unique
    to each app
    Bundle of
    preconfigured
    components
    used in all our
    apps
    npm and Bower
    component
    Server with
    knowledge of all
    relevant assets
    and tools
    Build tool with
    rudimentary JS
    and stylesheet
    splitting
    App1
    CDN serving
    shared assets
    App2
    Templates and
    asset loading
    tools running in
    the browser
    Deploy tool for
    delivering
    assets to the
    CDN
    Inlined
    critical CSS

    View Slide

  44. Still no compromise

    View Slide

  45. Has anyone managed to
    recently successfully bower
    link/npm link n-ui in an app?
    problem started occurring
    after a n-ui update
    that’s presumably
    from an n-ui update?
    just pushing a bug fix
    in n-ui right now
    what I did see was n-ui as a
    dev-dependency, but I guess that
    is still going to lead to mind-melting
    what is going
    on with n-ui?
    Apologies on the n-ui issues all,
    am on it. Also added to my todo
    list to stop us breaking stuff
    I’ve *really* had
    enough of trying to get
    n-ui updates to work
    how to fix this motherf***ing
    `Projects using n-ui must maintain
    parity between versions` error?!
    could someone get a grip on
    whoever is f**king with the
    styling on the site this week
    If I can get the build to
    pass! Damn n-ui!!!

    View Slide

  46. Group therapy

    View Slide

  47. @wheresrhys
    We allowed a
    performance
    bubble to develop,
    and lost sight of SO
    many things

    View Slide

  48. @wheresrhys
    Complexity tax vs value delivered
    preflight n-ui

    View Slide

  49. @wheresrhys
    Speed
    Difficulty
    Shared JS
    bundle
    Inlined CSS
    Preflight
    Lazy loaded
    fonts
    Service worker
    Lazy loaded
    images

    View Slide

  50. Performance is not a
    core front-end skill
    It’s weird, complex and
    optional

    View Slide

  51. – Malte Ubl
    Designing very large (JavaScript) applications
    https://t.co/q7Lplkthnr
    “Empathy and experience is what
    enables you to choose the right
    abstractions for your application”

    View Slide

  52. @wheresrhys
    Before n-ui
    After n-ui

    View Slide

  53. @wheresrhys
    ● Optimisation is opt-in
    ● “Most straightforward” way still works
    ● Simple, low-level API more flexible and powerful

    View Slide

  54. @wheresrhys
    Bad Good
    Hide complexity Teach skills
    ‘Clever’ abstractions Syntactic sugar
    Speak to ninjas Speak to noobs
    Big bang Bit by bit

    View Slide

  55. @wheresrhys
    Weigh up the
    pros and cons
    for your whole
    team
    Speed
    Difficulty
    Shared JS
    bundle
    Lazy lo
    image
    Lazy loade
    fonts
    Service worker

    View Slide

  56. Conclusion

    View Slide

  57. Releasing small and
    often has great benefits

    View Slide

  58. Perf optimisation can
    clash with this

    View Slide

  59. Work with your team to
    find the right balance

    View Slide

  60. You don’t have to be
    perfect to get great
    results

    View Slide

  61. Thanks!
    Rhys Evans
    Principal Engineer, Financial Times
    @wheresrhys

    View Slide