Deliver Results, Not Just Releases (Manage, Monitor, Experiment)

Deliver Results, Not Just Releases How Leaders Do Continuous Delivery
To Get Ahead, Not Just Go Faster @davekarow

The future is already here — it's just not very
evenly distributed. William Gibson @davekarow

What Do These Leaders All Have In Common? @davekarow

Progressive Delivery: - Control of Release - Observability of Impact
At the user level @davekarow

Three Pillars of Progressive Delivery @davekarow

Booking.com: A well documented example of this pattern. @davekarow

Booking.com: Let’s see how they describe it. @davekarow

Great read https://medium.com/booking-com-development/moving-fast-breaking-things-and-ﬁxing-them-as-quickly-as-possible-a6c16c5a1185 @davekarow

Manage @davekarow

Manage is About Control of Exposure ...grant early access ...limit
the blast radius ...deﬁne surface area for learning How do we decouple deploy from release? And decouple revert from rollback? @davekarow

11 Feature Flag Managing exposure like a dimmer or light
board 0% 10% 20% 50% 100% @davekarow

https://medium.com/booking-com-development/moving-fast-breaking-things-and-fixing-them-as-quickly-as-possible-a6c16c5a1185 @davekarow

Booking.com’s experience with “asynchronous feature release” • Deploying has no
impact on user experience • Deploy more frequently with less risk to business and users • The big win is Agility

Monitor @davekarow

Booking.com Monitoring: “Experimentation as a safety net” • Each new
feature is wrapped in its own experiment • Allows: monitoring and stopping of individual changes • The developer or team responsible for the feature can enable and disable it... • ...regardless of who deployed the new code that contained it. @davekarow

Monitoring the needle in the haystack If you roll out
a change to just 5% of your population ...and 20% (1 in 5) of the exposed users get an error, that’s a HUGE problem! But, what % of your total user population is getting that error? 1% @davekarow

@davekarow

Booking.com safety net automated: “circuit breaker” • Active for the
ﬁrst three minutes of feature release • Severe degradation → automatic abort of that feature • Acceptable divergence from core value of local ownership and responsibility where it’s a “no brainer” that users are being negatively impacted @davekarow

Booking.com Circuit Breaker @davekarow

Experim @davekarow

Booking.com Experimentation: “Experimentation as a way to validate ideas” •
Measure (in a controlled manner) the impact changes have on user behaviour • Every change has a clear objective (explicitly stated hypothesis on how it will improve user experience) • Measuring allows validation that desired outcome is achieved @davekarow

23 Feature Flag Experimentation Example 50% 50% @davekarow

Guardrail metrics @davekarow

The quicker we manage to validate new ideas the less
time is wasted on things that don’t work and the more time is left to work on things that make a difference. https://medium.com/booking-com-development/moving-fast-breaking-things-and-ﬁxing-them-as-quickly-as-possible-a6c16c5a1185 @davekarow

An example of the journey from MVP to trusted system:
Linked In XLNT @davekarow

• Built a targeting engine that could “split” traﬃc between
existing and new code • Impact analysis was by hand (~2 weeks), so nobody did it :-( Essentially just feature ﬂags without automated feedback. LinkedIn early days: a modest start for XLNT @davekarow

LinkedIn XLNT Today A controlled release (with built-in observability) every
5 minutes 100 releases per day 6000 metrics that can be “followed” by any stakeholder: “What releases are moving the numbers I care about?”

Lessons learned at LinkedIn • Build for scale: no more
coordinating over email • Make it trustworthy: targeting & analysis must be rock solid • Design for diverse teams, not just data scientists Ya Xu Head of Data Science, LinkedIn Decisions Conference 10/2/2018 @davekarow

Step 1 Feature ﬂags Step 2 Sensors Correlation Step 3
Stats Engine Causation “Holy Grail” Mgmt console System of record Alerting | > $50M annual cost | > $30M annual cost | > $25M annual cost Increasing functionality & company adoption Cost to build and maintain Experimentation platforms are pervasive in modern development, but homegrown and costly @davekarow

That’s why we created Split Adil’s Story: 1. Used XLNT
at LinkedIn 2. Left LinkedIn & Missed XLNT 3. Built MVP @ RelateIQ 4. Saved white screen of death 5. October 2015: Founded Split @davekarow

@davekarow

Community… this is bigger than any of us @davekarow

https://www.youtube.com/watch?v=Y_D9t98dGh4 Google “Talia Nassi Testing In Production” @davekarow

https://www.split.io/blog/how-to-avoid-lying-to-yourself-with-statistics/ @davekarow

Sustainability: We are killing the release night https://www.split.io/blog/on-a-mission-to-kill-release-nights/ @davekarow

Whatever you are, try to be a good one. William
Makepeace Thackeray @davekarow @davekarow

Appendix

“Our success at Amazon is a function of how many
experiments we do per year, per month, per week, per day.” 41

Leaders innovate in a fast, safe and smart iteration loop
Iterate safe Releases inevitably fail; they invest in detecting and recovering quickly from failure Iterate smart Many new features are ineﬀective; they measure which features work and focus there Iterate fast Code deployment is fast with code commits deployed to prod within the day 3min window 1deploy / second across hundreds of services to detect and kill bad code 20% of features move the business forward

Deliver Results, Not Just Releases (Manage, Mon...

Deliver Results, Not Just Releases (Manage, Monitor, Experiment)

More Decks by Dave Karow

Other Decks in Programming

Featured

Transcript