Scale Summit: Microservices and Scale

Microservices and scale Sarah Wells Principal Engineer, Financial Times @sarahjwells

@sarahjwells BUT - I’m not here to talk about that

@sarahjwells I’m not dealing with this kind of scaling challenge

The FT’s Universal Publishing Platform

Not publishing at huge scale: around 7000 publishes a day

@sarahjwells 14 million concepts in our graph database or ~20GB

1 2 3 4

Around 180,000 API requests an hour

@sarahjwells We’re not doing microservices to help with scale

@sarahjwells They let us move faster

Deploys to production last year

Deploys to production of the monolith

@sarahjwells Releasing nearly 190 times as often

@sarahjwells So - why am I talking about scale at
all?

@sarahjwells *Operating* microservices is a scale challenge

@sarahjwells 150+ microservices: we need to automate things

@sarahjwells The challenges: 1. Provisioning and deployment 2. Monitoring and
alerting 3. Logging 4. Service documentation

@sarahjwells Provisioning and deployment

Provisioning time scale

@sarahjwells Provisioning needs to take minutes

@sarahjwells Deployment must be (almost entirely) automated

@sarahjwells Our old process was very manual…

@sarahjwells Setting up new deployment pipelines has to be quick

@sarahjwells Need to be able to make global changes to
them

@sarahjwells Monitoring and alerting can be very noisy

@sarahjwells With resilience, we have 568 instances

@sarahjwells If we checked each service every minute…

@sarahjwells 817,920 checks per day

@sarahjwells One service per VM, 20 system checks, running every
minute…

@sarahjwells 16,358,400 checks per day

@sarahjwells “One-in-a-million” issues would hit us 16 times every day

@sarahjwells Which is why we don’t have one service per
VM…

@sarahjwells Running containers on shared VMs reduces this to 92,160
system checks per day

@sarahjwells Still a total of 910,080 checks per day

@sarahjwells Logging

@sarahjwells ~50,000 log lines per minute

@sarahjwells Service documentation

@sarahjwells The service registry… who owns what

@sarahjwells Lots of information per service

@sarahjwells Our GDPR process meant receiving 150 google forms…

@sarahjwells How can you solve the operational scale issues?

@sarahjwells 1. Provisioning and deployment 2. Monitoring and alerting 3.
Logging 4. Service documentation

@sarahjwells Provisioning and deployment: automation and tooling

@sarahjwells Invest in automation of provisioning

@sarahjwells Deployment: move away from Jenkins

@sarahjwells To set up deployment for a new service…

@sarahjwells 1. Configure CircleCI 2. Configure Docker hub 3. Add
service files to a services repo

@sarahjwells Not perfect…

@sarahjwells Looking at templated pipelines…

@sarahjwells Monitoring and alerting: focus on what matters

@sarahjwells It’s the business functionality you should care about

@sarahjwells Logging: log aggregation and transaction ids

@sarahjwells Eﬀective log aggregation needs a way to ﬁnd all
related logs

Transaction ids tie all microservices together

@sarahjwells

@sarahjwells Documentation: standards, templates, automation, tooling

@sarahjwells Executable documentation

@sarahjwells Healthchecks

The FT healthcheck standard GET http://{service}/__health

The FT healthcheck standard GET http://{service}/__health returns 200 if the
service can run the healthcheck

The FT healthcheck standard GET http://{service}/__health returns 200 if the
service can run the healthcheck each check will return "ok": true or "ok": false

@sarahjwells Healthchecks are unit tested

@sarahjwells Keeping information near to the code

@sarahjwells Update automatically on deploy

@sarahjwells Other teams need to adapt too

@sarahjwells Change and release management

@sarahjwells 2256 releases = 53 working days doing CRs

@sarahjwells Automation, again

@sarahjwells Github web hook for our CRs

@sarahjwells First line support

@sarahjwells There are many diﬀerent technologies for them to understand
now

@sarahjwells Our development teams don’t know the whole system either…

@sarahjwells Operating microservices *is* a challenge

@sarahjwells The beneﬁts can be worth it…

Deploys to production last year

Deploys to production of the monolith

@sarahjwells But you have to be prepared to pay the
cost

@sarahjwells Thank you!

Scale Summit: Microservices and Scale

Scale Summit: Microservices and Scale

More Decks by Sarah Wells

Other Decks in Technology

Featured

Transcript