The great microservices migration

The great microservices migration Charles-Axel Dein, Uber DevFest, Nantes, September
2017

What will you get from this talk?

Who am I? • Charles-Axel Dein - [email protected] • Payments
Engineering Manager at Uber in Amsterdam • Born and raised in Nantes :)

Joined Uber in July 2012 An incredible growth... July 2012
Oct 2017 Uber's age 2 7 Cities 10 600+ Engineers 20 2,000+

Uber's simple architecture in 2012

Today we'll be focusing on "API"

During this period, Uber grew from 2 to 1,000+ services

What are microservices?

This "great migration" was a 5-year adventure

This talk is: • Not exhaustive • Not from an
expert

Why did we split the monolith?

Reason #1 A large monolithic app slows down developers

Commits per day barely increased

Reason #2 A monolithic app suffers from tragedy of the
commons

Reason #3 A monolithic app is difficult to scale

API's scaling difficulties, circa 2015 • Running out of PostgreSQL
master DB connections • Running out of memory on machines (≈ 1.5 GB RAM) • Translations growing and using ≈ 1 GB RAM

I. Starting µservices II. Scaling µservices

How to start a µservices migration

Step 0: make a rough plan

You don't want to move from one monolith to a
distributed monolith

Any piece of software reflects the organizational structure that produced
it. — Conway's law

Design your architecture Then Design your organization

⚠ Too many plans look like [launching] a rocket ship.
[Yet] tiny errors in assumptions can lead to catastrophic outcomes. — Eric Ries, Lean Startup

Three prerequisites • Business monitoring • Feature flags • Repository
layer

Prerequisite 1: business monitoring and alerting • ❌ CPU utilization
• ❌ RAM • ✅ Number of signups per device • ✅ Number of signups per channel

Prerequisite 2: fast config rollout (or feature flags) def get_user(user_uuid):
if random.random() < config.get('use_new_flow_probability'): use_new_flow() else: use_old_flow()

Prerequisite 3: abstract storage layer class UsersSQLRepository(): def create(...): ...
def get(user_uuid): user = sql.connect(...).execute("select ...") return user class UsersServiceRepository(): def get(user_uuid): user = http.connect(...).get("/users/...") return user

Step 1: build a rope bridge

Start with one microservice and one use case

Let's take an example: Our Customer rope bridge

Step 2: migrate the data and keep it up-to-date

Migrate the data in batch and keep it up- to-date

Results after step 2 1. ✅ Data is migrated 2.
✅ Data is kept up to date

Step 3: migrate the storage layer to read from the
new service

Shadowing reads # In the monolith def get_user(user_uuid): monolith_user =
UsersSQLRepository.get(user_uuid) new_user = UsersNewServiceRepository.get(user_uuid) verify(monolith_user, new_user) # Verify that they match return monolith_user # ✅ we are returning the "safe" user

Reverse shadowing reads # In the monolith def get(user_uuid): ...
# read from both, verify if should_use_new_service(): # feature ﬂag return new_user else: return monolith_user

This requires productionization • Testing the new storage layer •
Distributed transactions • Data analytics • ...

Results after step 3 1. ✅ Data is migrateds 2.
✅ Data is kept up to date 3. ✅ All reads are going to the new service 4. ➡ We can delete the old data

Step 4: migrate the consumers to the new service

Migrating customers is an opportunity to redesign • Fix some
tech/product debt • Bring a fresh viewpoint • E.g. move to event sourcing • E.g. better separate offline/online queries • Make the interface micro-services aware

Results after step 4 1. ✅ Data is migrated 2.
✅ Data is kept up to date 3. ✅ All reads are going to the new service 4. ✅ All consumers are going to the new service 5. ➡ We can delete the old code

Summary: a bottom-up approach • Step 0: rough plan •
Step 1: rope bridge • Step 2: migrate the data (writes) • Step 3: migrate the storage layer (reads) • Step 4: migrate consumers • Iterate for all services!

How to scale a µservices architecture

There are so many decisions to make... 1. RPC (transport,
interface, sync/async, etc.) 2. Debugging (logs, tracing, etc.) 3. Security (authN, authZ, logging sensitive data, etc.) 4. ... too many topics, so we'll only chat about testing

Uber's testing strategies 1. Unit, integration, component testing 2. Staging
environment (few, very costly) 3. End-to-end tests (very few, anti-pattern) 4. Testing on production: canary deploys 5. Tenancies on production !

The usual testing on prod method does not work with
microservices • ❌ Require awareness of side effects • ❌ Difficult to share with other teams

A better way: tenancies

Test tenancies example def charge_trip(rider_uuid, trip): """Charge a rider for
a trip.""" if trip.tenancy == "test": time.sleep(0.5) # Mimics external call return ... # continue charge ﬂow for non-test users

Benefits of using a test tenancy • ✅ All the
advantages of testing on production • ✅ Allow teams to test autonomously • ❌ Is not suitable for all testing

... this is just one example of learning!

What to learn and how to learn it • What:
speed AND quality • What: resilience > intelligence • How: standardize! • How: schedule learning time

What: speed and quality, not speed vs. quality

What: focus your learning on resilience

How: standardization speeds up learning • Counter analysis paralysis! •
Example: programming languages • Example: RFC process

How: schedule time for learning! • Chaos testing • Blameless
incident reviews • External and internal blog • Informal "brown bag" lunch & learn • ...

Summary: scaling a microservices architecture means building a learning organization

New services tend to become monolith so... this never ends!

Thank you! • Feedback welcome at [email protected] • Slides will
be on blog.d3in.org

Annexes & references

Book recommendations • Release It!, Michael T. Nygard (lots of
great patterns, great discussions) • Scalability rules, Martin Lee Abbott, Michael T. Fisher (super concise) • Building Microservices, Sam Newman (quite complete discussion of microservices)

List of references • Service-Oriented Architecture: Scaling the Uber Engineering
Codebase As We Grow, Uber Engineering Blog • Lessons Learned from Scaling Uber to 2,000 Engineers, 1,000 Services, and 8,000 Git repositories, High Scalability • MonolithFirst, Martin Folwer • Testing Strategies in a Microservice Architecture, Toby Clemson, ThoughtWorks • charlax/professional-programming: a collection of full-stack resources for programmers.

Annexes: some topics I did not talk about • How
to create components within the monolith • Infra challenges: how to abstract the architecture away from developers • Org: SRE vs. development/operations team • Safe deployment: staging, canarying, prod • Other ways to keep the data consistent between the two services.

Annexes: some topics I did not talk about (cont.) •
Resource requirements and capacity planning • Service discovery • Multiple repos vs. mono repo • Managing configuration at scale • Hardware efficiency and resource quotas • Application platform: build and release, etc. • MTBR > MTBF

Credits for image (cont.) • Rope bridge: Carrick-a-rede, Rope Bridge,
Ballintoy, Antrim | La salvaje … | Flickr • Fischli/Weiss, Installation view, Rock on Top of Another Rock 2013, Serpentine Gallery, London, © Peter Fischli David Weiss, Photo: 2013 Morley von Sternberg • Cheetah: File:Sarah (cheetah).jpg - Wikimedia Commons.jpg), Gregory Wilson • Bent tree: Resilience | Captured at Inks Lake State Park View On Black | Anne Worner | Flickr

Colophon Slides made with Markdown and Deckset, Titillium theme.

The great microservices migration

The great microservices migration

More Decks by Charles-Axel Dein

Other Decks in Programming

Featured

Transcript