A Field Guide to the Financial Times

A Field Guide to the Financial Times

The FT was a microservices pioneer, and our teams had a lot of freedom to pick the tools & processes they wanted. 5 years on, many people have moved on and those innovative projects are now legacy code. I’ll tell you about our journey, using neo4j & graphQL, towards keeping track of it all.

6fe43e0038cf0e5579b549d417d4f3ec?s=128

Rhys Evans

March 26, 2019
Tweet

Transcript

  1. 1.

    A field guide to the Financial Times Rhys Evans Principal

    Engineer, Financial Times @wheresrhys
  2. 2.

    @wheresrhys Who I am • Worked in tech 10+ years

    • Gradually moved into tooling • Co-lead the FT’s Reliability Engineering team • Lifelong birdwatcher
  3. 3.

    @wheresrhys From Wikipedia: A book designed to help the reader

    identify wildlife (plants or animals) or other objects of natural occurrence (e.g. minerals). What is a field guide
  4. 4.

    • Why the FT needs a field guide • Organising

    our guide with neo4j and GraphQL • Filling in the details
  5. 8.
  6. 12.

    @wheresrhys “A tool dating from before the trees that built

    the ark. Unowned, unknown, and worth £250k of business. One day it fell over. We founds docs dated 1999... which helped” Greg Cope, Tech Director, FT
  7. 16.

    @wheresrhys Microservices FT were early adopters of microservices architecture Lots

    of independently deployed services easier to • Pick the right tool for the job • Release and iterate • Replace and decommission
  8. 17.

    @wheresrhys Liberalisation Matt Chadburn http://matt.chadburn.co.uk/notes/teams-as-services.html “[...] follow the mechanics of

    free-market economy. Teams are allowed and encouraged to pick the best value tools for the job at hand”
  9. 18.

    @wheresrhys OUT IN Data Centre Your favourite cloud ‘The FT

    Platform’ Pick your own SaaS Java, Java, Java I hear Rust’s good... Ivory tower What works
  10. 19.

    @wheresrhys “The upside of this is teams, left to their

    own devices, and trusted to make responsible decisions will choose what is best for themselves and the business in the long-term.” Matt Chadburn http://matt.chadburn.co.uk/notes/teams-as-services.html
  11. 21.

    @wheresrhys Legacy is sooner than you think • All images

    appearing on our websites relied on 1 person... who left • A vanity url service built by a feature team that disbanded shortly after • Part of our membership platform built in a niche language • And many, many more
  12. 22.

    @wheresrhys 5 years is a long time in tech Long

    enough for • Shiny new things to become legacy • Budgets and business priorities to move on • People to leave
  13. 23.

    @wheresrhys • Have to keep lots of tech ticking over

    • Generating more new stuff than ever before to keep track of • Liberalising the tech department leads to ownership & maintenance problems Need a field guide to help us navigate the space In summary
  14. 27.

    @wheresrhys • Reaffirm who owns the various bits of FT

    tech • Improve information about what is actually running and why • Determine what state it’s in at any given time 3 priorities to improve reliability
  15. 28.

    @wheresrhys Who is our audience? Operations team • Active 24/7

    • Broad knowledge of our tech platforms • Need to know which approaches can be applied to incident X • If nothing works, who to call
  16. 29.

    @wheresrhys CMDB versions 1 - 3 were: • Too inert

    - Enter once and forget about it • Too brittle - Chains of responsibility easily lost • Too discrete - Hard to make important connections Not the first attempt
  17. 30.

    @wheresrhys • The natural question to ask when addressing a

    problem • Links between people and things dotted all over our previous CMDBs • Intuitive but brittle Who can help me with system X?
  18. 31.

    @wheresrhys • Hard to connect data, so get overly simplified

    models of reality • Several degrees of separation is modelled as a systemOwner field • Simple, but inaccurate and hard to maintain Relational databases constrain
  19. 32.

    @wheresrhys • Designed to model complex relationships • No need

    to simplify and abstract away details that actually matter • If person X is a stakeholder via 4 degrees of separation, represent them as such Graph databases liberate
  20. 33.

    @wheresrhys A graph restatement of the problem ‘How can I

    ensure systems are assigned to the right people’ → ‘How can I ensure systems are connected somehow to the right people’
  21. 36.

    @wheresrhys • Pick a unique, human readable code • Kill

    infrastructure not tagged with it • In our graph, the System record must be connected to a Team When systems are created we:
  22. 37.
  23. 38.

    @wheresrhys • Stable, manageable subdivisions of the organisation • Tech

    director who is ultimately responsible On top of this stable foundation we can add the more ephemeral things Our tech connected to
  24. 39.
  25. 41.

    @wheresrhys • Self-service • No such thing as a power

    user • Extensible • API first, but UI a close second Data warehouse free
  26. 42.

    @wheresrhys REST API • OK when fetching a single record

    type • Painful to traverse ‘Canned query’ endpoints • Less generic • Limited by our imagination Some poor API options
  27. 43.

    @wheresrhys GraphQL to the rescue “GraphQL is a query language

    for APIs [...] gives clients the power to ask for exactly what they need [...] not just the properties of one resource but also smoothly follows references between them”
  28. 44.

    @wheresrhys neo4j-graphql-js • GraphQL normally talks to multiple APIs and

    combines the results • neo4j-graphql-js converts GraphQL queries to cypher, and talks to neo4j directly
  29. 46.

    @wheresrhys GraphQL big wins • User friendly: Single, grokable query

    to get unlimited connected info • Future proof: Mirrors the neo4j graph as its complexity grows • More efficient: Fewer API calls and fewer and faster DB calls
  30. 47.

    @wheresrhys • Hungry users: Allows unwitting construction of very expensive

    queries • Caching: Not obvious what caching behaviour to implement • To write or not to write: Not persuaded to move away from REST yet Pitfalls of GraphQL
  31. 49.
  32. 52.

    @wheresrhys In summary • Some confidence that Biz Ops won’t

    degrade into a data graveyard • Unlimited access to data for any person or machine But is the data actually any good?
  33. 54.

    @wheresrhys Not the first attempt CMDB versions 1 -3 were

    • Too inert - Enter once and forget about it • Too brittle - Chains of responsibility easily lost • Too discrete - Hard to make important connections
  34. 55.
  35. 56.

    @wheresrhys Automate • Machines don’t forget to update information •

    Restrict write access for certain records/types to privileged clients ◦ people-api → Writes details of FT staff ◦ github-importer → Writes details of repositories ◦ …
  36. 61.

    @wheresrhys Not just visual design • Understand your users •

    Uncover sources of friction • Learn about their existing/ideal workflow • Don’t expect them to come to you • “Good design is invisible”
  37. 62.

    @wheresrhys • System source code changes in Github, • But

    runbook authorship in Biz Ops • Bound to get out of step • What if they happened concurrently? Example: runbook authorship
  38. 63.

    @wheresrhys • Runbooks written in RUNBOOK.md with front matter metadata

    • Content pulled into Biz Ops when production code release detected • Github PR integrations to follow Example: runbook authorship
  39. 64.

    @wheresrhys • Underpinning how we handle GDPR requests • Quicker

    triaging of security incidents • Integrating with leavers process More benefits → more incentives to improve data Beyond operational info
  40. 69.
  41. 72.

    Thank you The team: Geoff Thorpe, Laura Carvajal, Charlie Briggs,

    Katie Koschland, Simon Legg, Maggie Allen, Courtney Osborn, Kat Downes, Sentayhu Mekoonnali, David Balfour Images from: https://www.audubon.org/birds-of-america/ @wheresrhys www.ft.com/dev/null