Slide 1

Slide 1 text

Engineering Resilience into the Foundations: Key Strategies for Building Robust Internal Development Platforms (IDPs) Daniel Bryant Platform Engineer and PMM

Slide 2

Slide 2 text

tl;dr A well-designed and implemented platform enables resilience throughout the SDLC: ● Simplifying platform interactions reduces errors ● Graceful degradation supports business goals ● Shared responsibility promotes resilience

Slide 3

Slide 3 text

linktr.ee/danielbryantuk @danielbryantuk (he/him)

Slide 4

Slide 4 text

The “what” of platforms 🏗

Slide 5

Slide 5 text

A digital platform is a foundation of self-service APIs, tools, services, knowledge and support which are arranged as a compelling internal product. Autonomous delivery teams can make use of the platform to deliver product features at a higher pace, with reduced coordination. Evan Bottcher martinfowler.com/articles/talk-about-platforms.html What is a platform, anyway?

Slide 6

Slide 6 text

A digital platform is a foundation of self-service APIs, tools, services, knowledge and support which are arranged as a compelling internal product . Autonomous delivery teams can make use of the platform to deliver product features at a higher pace, with reduced coordination . Evan Bottcher martinfowler.com/articles/talk-about-platforms.html What is a platform, anyway?

Slide 7

Slide 7 text

“Platform engineering improves developer experience and productivity by providing self-service capabilities with automated infrastructure operations . It is trending because of its promise to optimise the developer experience and accelerate product teams’ delivery of customer value.” gartner.com/en/articles/what-is-platform-engineering Gartner: What is platform engineering?

Slide 8

Slide 8 text

Building on resilient foundations 🏗

Slide 9

Slide 9 text

My hypothesis for today A well designed and implemented platform enables resilience throughout the SDLC: ● Simplifying platform interactions reduces errors ○ Developer-centric design (focus on customer) ● Graceful degradation supports business goals ○ Built-in policy and fault tolerance ● Shared responsibility promotes resilience ○ Strong collaboration and governance

Slide 10

Slide 10 text

Our case studies ● Large software company headquartered in the UK ○ Provides business management software to SMBs across the UK, USA, and APAC ○ Aimed to improve resilience for developers by reducing cognitive load ● Not on the High Street ○ Two-sided ecommerce marketplace focusing on crafts and gifts ○ Focused on ensuring resilient tech and processes for Black Friday ● NatWest Group ○ British banking and insurance holding company, based in Edinburgh, Scotland ○ Desire to promote resiliency through continuous improvement of platform

Slide 11

Slide 11 text

Large UK business management software company 󰳕

Slide 12

Slide 12 text

Large UK business management software company ● Context ○ Business growing rapidly ○ Looking to scale software delivery across teams and geographies ● Challenge ○ Diverse range of technologies due to M&A ○ Software delivery teams experiencing cognitive overload with cloud native tech ○ Small platform team

Slide 13

Slide 13 text

Large UK business management software company ● Platform solution ○ Created “golden path” for delivery ○ Used Kubernetes + Kratix platform ○ Devs interact with abstraction rather than K8s ● Learnings ○ APIs, abstraction, and automation reduce developer cognitive load ○ Hackathons can be valuable for driving adoption and experimentation docs.kratix.io/main/quick-start

Slide 14

Slide 14 text

Not on the High Street 🛍

Slide 15

Slide 15 text

Not on the High Street ● Context ○ Struggling to meet “Black Friday” demand ○ Recently migrated to the cloud ○ Looking to automate scaling and DR/BC ● Challenge ○ Software delivery teams experiencing cognitive overload with cloud native tech ○ Knowledge in static runbooks ○ Ops not involved with testing youtube.com/watch?v=g-1oAKSBBJM

Slide 16

Slide 16 text

Not on the High Street ● Platform solution ○ Module architecture and platform ○ Mesos-based platform with CLI tools and Jenkins ○ Load testing scenarios with previous year’s data ● Learnings ○ Form a “guiding coalition” with clear goals/KPIs ○ Automate failover and DR/BC into the platform ○ “Platform team” is more resilient that “ops people”

Slide 17

Slide 17 text

NatWest Group 🏦

Slide 18

Slide 18 text

NatWest Group ● Context ○ Time to market increasingly important in FinTech space ○ Desire to increase contributions to the platform ● Challenge ○ Developer environment provisioning a limiting factor ○ Lots of manual processes and patterns ○ Limited resources (and need to “keep the lights on”) youtube.com/watch?v=RgAutqzxw5U

Slide 19

Slide 19 text

NatWest Group ● Platform solution ○ Implement “platform as a product” with Kubernetes, GitLab, flux, Kratix, and more ○ Enabling teams facilitate platform contributions ● Learnings ○ A composable platform enabled flexibility ○ Automate manual processes improved resilience ○ “Inner sourcing” enabled scalability and promoted ownership

Slide 20

Slide 20 text

Wrapping up

Slide 21

Slide 21 text

● Well-designed platforms enable resilience throughout the SDLC ● APIs, abstraction, and automation reduce developer cognitive load ● Software architecture and platform architecture are symbiotic ● Form a “guiding coalition” with KPIs: Platform team > Ops people ● Inner sourcing pools knowledge and increases ownership Conclusion

Slide 22

Slide 22 text

syntasso.io/post/platform-engineering-orchestrating-applications-platfo rms-and-infrastructure docs.kratix.io/main/quick-start speakerdeck.com/syntasso Thank you!