$30 off During Our Annual Pro Sale. View Details »

A Field Guide to the Financial Times

Rhys Evans
March 26, 2019

A Field Guide to the Financial Times

The FT was a microservices pioneer, and our teams had a lot of freedom to pick the tools & processes they wanted. 5 years on, many people have moved on and those innovative projects are now legacy code. I’ll tell you about our journey, using neo4j & graphQL, towards keeping track of it all.

Rhys Evans

March 26, 2019
Tweet

More Decks by Rhys Evans

Other Decks in Technology

Transcript

  1. A field guide to the
    Financial Times
    Rhys Evans
    Principal Engineer, Financial Times
    @wheresrhys

    View Slide

  2. @wheresrhys
    Who I am
    ● Worked in tech 10+ years
    ● Gradually moved into tooling
    ● Co-lead the FT’s Reliability
    Engineering team
    ● Lifelong birdwatcher

    View Slide

  3. @wheresrhys
    From Wikipedia:
    A book designed to help the
    reader identify wildlife (plants
    or animals) or other objects of
    natural occurrence (e.g.
    minerals).
    What is a field guide

    View Slide

  4. ● Why the FT needs a
    field guide
    ● Organising our guide
    with neo4j and
    GraphQL
    ● Filling in the details

    View Slide

  5. Why the FT
    needs a
    field guide

    View Slide

  6. @wheresrhys
    Insert non dramatic screenshot

    View Slide

  7. @wheresrhys

    View Slide

  8. View Slide

  9. @wheresrhys

    View Slide

  10. @wheresrhys

    View Slide

  11. @wheresrhys

    View Slide

  12. @wheresrhys
    “A tool dating from before the
    trees that built the ark. Unowned,
    unknown, and worth £250k of
    business. One day it fell over. We
    founds docs dated 1999... which
    helped”
    Greg Cope, Tech Director, FT

    View Slide

  13. @wheresrhys
    Starting about 5 years ago, the
    range of tech we have to support
    exploded

    View Slide

  14. @wheresrhys
    Previously
    Centralised decision making
    Monolithic architectures
    Data centres
    Infrequent releases

    View Slide

  15. Move slow
    and achieve little

    View Slide

  16. @wheresrhys
    Microservices
    FT were early adopters of microservices
    architecture
    Lots of independently deployed services easier to
    ● Pick the right tool for the job
    ● Release and iterate
    ● Replace and decommission

    View Slide

  17. @wheresrhys
    Liberalisation
    Matt Chadburn
    http://matt.chadburn.co.uk/notes/teams-as-services.html
    “[...] follow the mechanics of
    free-market economy. Teams are
    allowed and encouraged to pick the
    best value tools for the job at hand”

    View Slide

  18. @wheresrhys
    OUT IN
    Data Centre Your favourite cloud
    ‘The FT Platform’ Pick your own SaaS
    Java, Java, Java I hear Rust’s good...
    Ivory tower What works

    View Slide

  19. @wheresrhys
    “The upside of this is teams, left
    to their own devices, and trusted
    to make responsible decisions will
    choose what is best for
    themselves and the business in
    the long-term.”
    Matt Chadburn
    http://matt.chadburn.co.uk/notes/teams-as-services.html

    View Slide

  20. Build stuff and
    disappear

    View Slide

  21. @wheresrhys
    Legacy is sooner than you think
    ● All images appearing on our websites relied on
    1 person... who left
    ● A vanity url service built by a feature team that
    disbanded shortly after
    ● Part of our membership platform built in a
    niche language
    ● And many, many more

    View Slide

  22. @wheresrhys
    5 years is a long time in tech
    Long enough for
    ● Shiny new things to become legacy
    ● Budgets and business priorities to move on
    ● People to leave

    View Slide

  23. @wheresrhys
    ● Have to keep lots of tech ticking over
    ● Generating more new stuff than ever before to
    keep track of
    ● Liberalising the tech department leads to
    ownership & maintenance problems
    Need a field guide to help us navigate the space
    In summary

    View Slide

  24. Unowned &
    unknown

    View Slide

  25. Owned &
    known

    View Slide

  26. Organising
    our guide
    with neo4j and
    GraphQL

    View Slide

  27. @wheresrhys
    ● Reaffirm who owns the various bits of FT tech
    ● Improve information about what is actually
    running and why
    ● Determine what state it’s in at any given time
    3 priorities to improve reliability

    View Slide

  28. @wheresrhys
    Who is our audience?
    Operations team
    ● Active 24/7
    ● Broad knowledge of our tech platforms
    ● Need to know which approaches can be
    applied to incident X
    ● If nothing works, who to call

    View Slide

  29. @wheresrhys
    CMDB versions 1 - 3 were:
    ● Too inert - Enter once and forget about it
    ● Too brittle - Chains of responsibility easily lost
    ● Too discrete - Hard to make important
    connections
    Not the first attempt

    View Slide

  30. @wheresrhys
    ● The natural question to ask when addressing a
    problem
    ● Links between people and things dotted all
    over our previous CMDBs
    ● Intuitive but brittle
    Who can help me with system X?

    View Slide

  31. @wheresrhys
    ● Hard to connect data, so get overly simplified
    models of reality
    ● Several degrees of separation is modelled as a
    systemOwner field
    ● Simple, but inaccurate and hard to maintain
    Relational databases constrain

    View Slide

  32. @wheresrhys
    ● Designed to model complex relationships
    ● No need to simplify and abstract away details
    that actually matter
    ● If person X is a stakeholder via 4 degrees of
    separation, represent them as such
    Graph databases liberate

    View Slide

  33. @wheresrhys
    A graph restatement of the
    problem
    ‘How can I ensure systems are assigned to the
    right people’

    ‘How can I ensure systems are connected
    somehow to the right people’

    View Slide

  34. @wheresrhys
    System
    ?
    ?
    ?
    ? ?
    ?
    ?
    ?

    View Slide

  35. Model the stable stuff first
    Model the stable
    stuff first

    View Slide

  36. @wheresrhys
    ● Pick a unique, human readable code
    ● Kill infrastructure not tagged with it
    ● In our graph, the System record must be
    connected to a Team
    When systems are created we:

    View Slide

  37. View Slide

  38. @wheresrhys
    ● Stable, manageable subdivisions of the
    organisation
    ● Tech director who is ultimately responsible
    On top of this stable foundation we can add the
    more ephemeral things
    Our tech connected to

    View Slide

  39. View Slide

  40. @wheresrhys
    BIZ-OPS MAN

    View Slide

  41. @wheresrhys
    ● Self-service
    ● No such thing as a power user
    ● Extensible
    ● API first, but UI a close second
    Data warehouse
    free

    View Slide

  42. @wheresrhys
    REST API
    ● OK when fetching a single record type
    ● Painful to traverse
    ‘Canned query’ endpoints
    ● Less generic
    ● Limited by our imagination
    Some poor API options

    View Slide

  43. @wheresrhys
    GraphQL to the rescue
    “GraphQL is a query language for
    APIs [...] gives clients the power to
    ask for exactly what they need [...]
    not just the properties of one
    resource but also smoothly follows
    references between them”

    View Slide

  44. @wheresrhys
    neo4j-graphql-js
    ● GraphQL normally talks to multiple APIs and
    combines the results
    ● neo4j-graphql-js converts GraphQL queries to
    cypher, and talks to neo4j directly

    View Slide

  45. @wheresrhys

    View Slide

  46. @wheresrhys
    GraphQL big wins
    ● User friendly: Single, grokable query to get
    unlimited connected info
    ● Future proof: Mirrors the neo4j graph as its
    complexity grows
    ● More efficient: Fewer API calls and fewer and
    faster DB calls

    View Slide

  47. @wheresrhys
    ● Hungry users: Allows unwitting construction
    of very expensive queries
    ● Caching: Not obvious what caching behaviour
    to implement
    ● To write or not to write: Not persuaded to
    move away from REST yet
    Pitfalls of GraphQL

    View Slide

  48. @wheresrhys
    An extensible UI

    View Slide

  49. View Slide

  50. @wheresrhys

    View Slide

  51. #GRANDstack
    GraphQL + React + Apollo + Neo4j Database
    https://grandstack.io/

    View Slide

  52. @wheresrhys
    In summary
    ● Some confidence that Biz Ops won’t degrade
    into a data graveyard
    ● Unlimited access to data for any person or
    machine
    But is the data actually any good?

    View Slide

  53. Filling in the
    details

    View Slide

  54. @wheresrhys
    Not the first attempt
    CMDB versions 1 -3 were
    ● Too inert - Enter once and forget about it
    ● Too brittle - Chains of responsibility easily lost
    ● Too discrete - Hard to make important
    connections

    View Slide

  55. @wheresrhys
    Don’t rely on good behaviour
    ● Automate
    ● More carrot, less stick
    ● Gamify
    ● UX

    View Slide

  56. @wheresrhys
    Automate
    ● Machines don’t forget to update information
    ● Restrict write access for certain records/types
    to privileged clients
    ○ people-api → Writes details of FT staff
    ○ github-importer → Writes details of repositories
    ○ …

    View Slide

  57. @wheresrhys
    More carrot, less stick

    View Slide

  58. @wheresrhys
    Gamify
    Teams respond
    well to seeing how
    they compare, and
    how they can
    improve

    View Slide

  59. @wheresrhys
    UX

    View Slide

  60. @wheresrhys

    View Slide

  61. @wheresrhys
    Not just visual design
    ● Understand your users
    ● Uncover sources of friction
    ● Learn about their existing/ideal workflow
    ● Don’t expect them to come to you
    ● “Good design is invisible”

    View Slide

  62. @wheresrhys
    ● System source code changes in Github,
    ● But runbook authorship in Biz Ops
    ● Bound to get out of step
    ● What if they happened concurrently?
    Example: runbook authorship

    View Slide

  63. @wheresrhys
    ● Runbooks written in RUNBOOK.md with front
    matter metadata
    ● Content pulled into Biz Ops when production
    code release detected
    ● Github PR integrations to follow
    Example: runbook authorship

    View Slide

  64. @wheresrhys
    ● Underpinning how we handle GDPR requests
    ● Quicker triaging of security incidents
    ● Integrating with leavers process
    More benefits → more incentives to improve data
    Beyond operational info

    View Slide

  65. What have
    we learned
    today?

    View Slide

  66. Model the stable stuff first
    Legacy code
    comes to us all

    View Slide

  67. Model the stable stuff first
    Documented legacy
    is good legacy

    View Slide

  68. Model the stable stuff first
    Graphs enable
    more powerful
    modelling

    View Slide

  69. Model the stable stuff first
    Using #GRANDstack
    is like being the
    film version of Mark
    Zuckerberg

    View Slide

  70. Model the stable stuff first
    Your data won’t
    update itself

    View Slide

  71. Model the stable stuff first
    UX and other
    feedback loops
    can keep it fresh

    View Slide

  72. Thank you
    The team:
    Geoff Thorpe, Laura Carvajal, Charlie Briggs,
    Katie Koschland, Simon Legg, Maggie Allen,
    Courtney Osborn, Kat Downes, Sentayhu
    Mekoonnali, David Balfour
    Images from:
    https://www.audubon.org/birds-of-america/
    @wheresrhys www.ft.com/dev/null

    View Slide