Slide 1

Slide 1 text

@mipsytipsy Perils, Pitfalls and Pratfalls of Platform Engineering QCon NYC 2023

Slide 2

Slide 2 text

@mipsytipsy engineer/cofounder/CTO https://charity.wtf real observability for complex systems new!

Slide 3

Slide 3 text

Platform engineering isn’t “new” 🙄 Heroku, Facebook, and others had “platform engineering” teams a decade ago Platform engineering is actually quite new. 🤓 Until recently, “platform engineering” could mean ~anything. Now it has been defined.

Slide 4

Slide 4 text

Platform Engineering Platform Org Platform Team … what’s the difference?

Slide 5

Slide 5 text

Platform Engineering Formerly known as “infrastructure”. The software you have to run in order to run the software you want to run. Platform Org Umbrella org for security, devex, SRE… engineering teams that don’t work on core product Platform Team Team most responsible for enabling product engineering teams to own their code in production.

Slide 6

Slide 6 text

1. An idea whose time has come 2. A company trying to sell you things

Slide 7

Slide 7 text

“DevOps is dead” is just a stupid thing to say… clickbait marketing🤮 But there is a kernel of truth there DevOps is not eternal. It will be superseded.

Slide 8

Slide 8 text

Operating that software is eternal. Developing software is eternal. but “DevOps”? What happens when there are no more “dev” teams and “ops teams”? 🤔 🤔

Slide 9

Slide 9 text

The long arc of software careers 1990 Write code and run what you write 1995 Devs write code, Ops runs code. Friction ensues. 2007 DevOps emerges; devs + ops Empathy, #hugops blah blah 2023 Write code and run what you write

Slide 10

Slide 10 text

1. Every engineer writes code. 2. Every engineer runs the code that they write, and operates it in production. These days:

Slide 11

Slide 11 text

Systems are becoming rapidly more complex. They can’t really be operated like black boxes anymore. You need to build them to run them. And you can’t do a good job of building them unless you are regularly exposed to the feedback loops of operating them.

Slide 12

Slide 12 text

1. Software ownership (you write it, you run it) 2. We are all moving up the stack. Infrastructure is becoming boring. Two big trends are converging:

Slide 13

Slide 13 text

We are decoupling “infrastructure” from “operating software” Standalone ops teams are spinning down But operational expertise is more critical than ever before.

Slide 14

Slide 14 text

Platform engineering has emerged from this realignment.

Slide 15

Slide 15 text

No content

Slide 16

Slide 16 text

No. You have a platform engineering org, which wraps & packages your infrastructure needs by running as little infra as possible.

Slide 17

Slide 17 text

Infrastructure is a cost center. It may be a competitive differentiator, but it is still a cost center. which means you want as little as possible. Infrastructure(n): the code you have to run, in order to run the code you want to run.

Slide 18

Slide 18 text

which builds infra composes architecture as a product. Within your platform organization, you may have a platform team

Slide 19

Slide 19 text

⛔ Infrastructure Org ✅ Platform Org • SRE • Deep subsystem teams • “Pure” platform teams • Security • Release engineering • Developer tools • Front-end developer experience

Slide 20

Slide 20 text

No content

Slide 21

Slide 21 text

Pitfall #1 Running too much software Doing less is much harder than doing more.

Slide 22

Slide 22 text

Be crystal clear on what “infra” means to you. Leverage vendors as much as possible. You will NEVER be able to outsource your core differentiators

Slide 23

Slide 23 text

The best code is the code that doesn’t exist. The second best code is code someone else writes and maintains, and you get to use. The worst code is… literally anything else.

Slide 24

Slide 24 text

Pitfall #2 Writing too much software You cannot own too much software surface area Or you will grind to a halt

Slide 25

Slide 25 text

If your platform team spends a lot of time writing software, something is wrong.

Slide 26

Slide 26 text

Pitfall #3 Not letting product teams own their own reliability. Software engineers need to own their code in production. This means being on call for it, too.

Slide 27

Slide 27 text

You are NOT a rebranded SRE team. Don’t let this become your reality.

Slide 28

Slide 28 text

Pitfall #4 Not giving engineers enough tooling to understand their code as well as operate it. Or giving them “ownership” without empowerment

Slide 29

Slide 29 text

Pitfall #5 Being confused about who your customer is. Your customer is internal software engineering teams who work on the core product.

Slide 30

Slide 30 text

Pitfall #6 Not running your team like a product team The Promised Land beyond firefighting is … working like a product org. ALL engineering teams.

Slide 31

Slide 31 text

Your platform team *should* spend time on: • Doing discovery • Building champions • Baking in feedback cycles • Working with product managers • Working with design (!) • Figuring out the golden path • Practicing change management • Building a roadmap • Talking with focus groups • Building internal APIs

Slide 32

Slide 32 text

Pitfall #7 Not paying enough attention to cost & spend as part of architecture & planning. Educating others about cost counts too :)

Slide 33

Slide 33 text

Cost is an essential part of architecture. Build vs Buy is not the only time we need to think about this!!

Slide 34

Slide 34 text

Pitfall #8 Not constantly looking for ways to deprecate, delete, and shed responsibilities. Managing your workload is like being a juggler. Success is in managing capacity.

Slide 35

Slide 35 text

How to tell if your “platform team” is really a platform team or not: Is the team responsible for SLOs, service uptime, and a reliable customer experience? ✅ platform team NO ⛔ platform team YES

Slide 36

Slide 36 text

Platform teams are responsible for developer productivity. Product engineering teams and SREs are responsible for customer experience.

Slide 37

Slide 37 text

“If you build it, they will come”?? No, they fucking won’t. Make sure you are building a platform that people actually want and need!

Slide 38

Slide 38 text

“Vendor engineering” is a large share of any platform team’s remit Cost is part of architecture. Platform teams are super high leverage.

Slide 39

Slide 39 text

If you’re an infra/devops/ops engineer, and you haven’t learned to work on product: Learn. Once you dig your way out of firefighting, product is what comes next.

Slide 40

Slide 40 text

The hardest part of software is operating it. Always has been. Always will be.

Slide 41

Slide 41 text

In conclusion, computers are terrible Everything dies Have fun!

Slide 42

Slide 42 text

Charity Majors @mipsytipsy