Slide 1

Slide 1 text

Reliably Absorbing A Go Release: Learnings From The Kubernetes Community Madhav Jivrajani

Slide 2

Slide 2 text

$ whoami ● From India, work @ VMware. ● I help maintain parts of the Kubernetes project. ● Mostly involved with Architecture, API Machinery, Scalability and Contributor Experience.

Slide 3

Slide 3 text

Slide 4

Slide 4 text

Agenda ● Why are we talking about this? ● What does “absorbing” a Go release mean for Kubernetes? ● What goes into reliably absorbing a Go release?

Slide 5

Slide 5 text

“Knowledge Is The Dual of Possibility.” J. Halpern et al. Knowledge and Common Knowledge In A Distributed Environment

Slide 6

Slide 6 text

“With a sufficient number of users of an API, it does not ma8er what you promise in the contract: all observable behaviours of your system will be depended on by somebody.” h"ps://www.hyrumslaw.com/

Slide 7

Slide 7 text

No content

Slide 8

Slide 8 text

No content

Slide 9

Slide 9 text

No content

Slide 10

Slide 10 text

What does absorbing a Go release mean for Kubernetes?

Slide 11

Slide 11 text

What Does Absorbing A Go Release Mean For Kubernetes?

Slide 12

Slide 12 text

What Does Absorbing A Go Release Mean For Kubernetes? 1. Working towards making sure the CI is happy: builds and tests pass.

Slide 13

Slide 13 text

What Does Absorbing A Go Release Mean For Kubernetes? 1. Working towards making sure the CI is happy: builds and tests pass. 2. Trying to make sure users don’t break!

Slide 14

Slide 14 text

What goes into reliably absorbing a Go release?

Slide 15

Slide 15 text

What Goes Into Reliably Absorbing A Go Release?

Slide 16

Slide 16 text

What Goes Into Reliably Absorbing A Go Release? 1. Gauging the surface area of what can break.

Slide 17

Slide 17 text

What Goes Into Reliably Absorbing A Go Release? 1. Gauging the surface area of what can break. 2. Answering the quesCon: what’s the best way to “miCgate a breaking change”?

Slide 18

Slide 18 text

What Goes Into Reliably Absorbing A Go Release? 1. Gauging the surface area of what can break. 2. Answering the question: what’s the best way to “mitigate a breaking change”? 3. Understanding how the release and support cycles of Go align with your release and support cycles.

Slide 19

Slide 19 text

What Goes Into Reliably Absorbing A Go Release? 1. Gauging the surface area of what can break. 2. Answering the quesCon: what’s the best way to “miCgate a breaking change”? 3. Understanding how the release and support cycles of Go align with your release and support cycles. 4. Help users reconcile with default Go behaviour.

Slide 20

Slide 20 text

What Goes Into Reliably Absorbing A Go Release? 1. Gauging the surface area of what can break. 2. Answering the question: what’s the best way to “mitigate a breaking change”? 3. Understanding how the release and support cycles of Go align with your release and support cycles. 4. Help users reconcile with default Go behaviour. 5. Actually absorbing a Go release.

Slide 21

Slide 21 text

What Goes Into Reliably Absorbing A Go Release? 1. Gauging the surface area of what can break. 2. Answering the quesCon: what’s the best way to “miCgate a breaking change”? 3. Understanding how the release and support cycles of Go align with your release and support cycles. 4. Help users reconcile with default Go behaviour. 5. Actually absorbing a Go release.

Slide 22

Slide 22 text

What Goes Into Reliably Absorbing A Go Release? 1. Gauging the surface area of what can break. 2. Answering the question: what’s the best way to “mitigate a breaking change”? 3. Understanding how the release and support cycles of Go align with your release and support cycles. 4. Help users reconcile with default Go behaviour. 5. Actually absorbing a Go release. For CI

Slide 23

Slide 23 text

What Goes Into Reliably Absorbing A Go Release? 1. Gauging the surface area of what can break. 2. Answering the question: what’s the best way to “mitigate a breaking change”? 3. Understanding how the release and support cycles of Go align with your release and support cycles. 4. Help users reconcile with default Go behaviour. 5. Actually absorbing a Go release. For users For CI

Slide 24

Slide 24 text

1. Gauging The Surface Area of What Can Break.

Slide 25

Slide 25 text

What does the “Go surface area” of Kubernetes look like?

Slide 26

Slide 26 text

Some Stats 1. Kubernetes is ~2.2 million lines of Go code and about ~240 dependencies on other modules (direct + indirect). a. And then some more for our CI. h"ps://deps.dev/go/k8s.io%2Fkubernetes/v1.22.0-alpha.2/dependencies/graph

Slide 27

Slide 27 text

Some Stats 1. Kubernetes is ~2.2 million lines of Go code and about ~240 dependencies on other modules (direct + indirect). a. And then some more for our CI. 2. Surface area categories: static analysis tooling, dependency management tooling, tests (unit, integration, e2e, scale etc). https://deps.dev/go/k8s.io%2Fkubernetes/v1.22.0-alpha.2/dependencies/graph

Slide 28

Slide 28 text

Different Ways Things Break

Slide 29

Slide 29 text

Different Ways Things Break 1. Code in dependencies can break

Slide 30

Slide 30 text

Different Ways Things Break 1. Code in dependencies can break

Slide 31

Slide 31 text

Different Ways Things Break 1. Code in dependencies can break 2. Your code itself can break

Slide 32

Slide 32 text

Different Ways Things Break 1. Code in dependencies can break 2. Your code itself can break

Slide 33

Slide 33 text

Different Ways Things Break 1. Code in dependencies can break 2. Your code itself can break 3. Static analysis tooling can break

Slide 34

Slide 34 text

Different Ways Things Break 1. Code in dependencies can break 2. Your code itself can break 3. StaCc analysis tooling can break

Slide 35

Slide 35 text

Different Ways Things Break 1. Code in dependencies can break 2. Your code itself can break 3. Static analysis tooling can break 4. The runtime behaviour of existing programs can change

Slide 36

Slide 36 text

Different Ways Things Break 1. Code in dependencies can break 2. Your code itself can break 3. StaCc analysis tooling can break 4. The run-me behaviour of exisCng programs can change

Slide 37

Slide 37 text

A release is only as backwards compa2ble as its least backwards compa2ble change.

Slide 38

Slide 38 text

2. What’s The Best Way To Mitigate A Breaking Change?

Slide 39

Slide 39 text

Mitigating A Breaking Change 1. Some breaking changes are isolated enough with minimally invasive fixes to miCgate.

Slide 40

Slide 40 text

Mitigating A Breaking Change 1. Some breaking changes are isolated enough with minimally invasive fixes to mitigate. 2. Some breaking changes require invasive changes to your codebase.

Slide 41

Slide 41 text

Mitigating A Breaking Change 1. Some breaking changes are isolated enough needing only minimally invasive fixes. 2. Some breaking changes require invasive changes to your codebase. You have control over the .meline of when these fixes happen!

Slide 42

Slide 42 text

Mi@ga@ng A Breaking Change 1. Some breaking changes are isolated enough, needing only minimally invasive fixes. 2. Some breaking changes require invasive changes to your codebase. 3. Your code is fine, but a dependency you rely on suffers from a breaking change.

Slide 43

Slide 43 text

Mitigating A Breaking Change 1. Some breaking changes are isolated enough, needing only minimally invasive fixes. 2. Some breaking changes require invasive changes to your codebase. 3. Your code is fine, but a dependency you rely on suffers from a breaking change. 4. SomeCmes there’s a regression in Go.

Slide 44

Slide 44 text

Mitigating A Breaking Change 1. Some breaking changes are isolated enough, needing only minimally invasive fixes. 2. Some breaking changes require invasive changes to your codebase. 3. Your code is fine, but a dependency you rely on suffers from a breaking change. 4. Sometimes there’s a regression in Go. You may not have control over the timelines of these fixes!

Slide 45

Slide 45 text

Mitigating A Breaking Change 1. Some breaking changes are isolated enough, needing only minimally invasive fixes. 2. Some breaking changes require invasive changes to your codebase. 3. Your code is fine, but a dependency you rely on suffers from a breaking change. 4. SomeCmes there’s a regression in Go. The best way to insulate against any of these scenarios is to try and start tes-ng Go versions really early! go1.Xrc1, go1.Xrc2…

Slide 46

Slide 46 text

Mi@ga@ng A Breaking Change 1. Some breaking changes are isolated enough, needing only minimally invasive fixes. 2. Some breaking changes require invasive changes to your codebase. 3. Your code is fine, but a dependency you rely on suffers from a breaking change. 4. Sometimes there’s a regression in Go. Opportunity to establish timely feedback loops leads to increased reliability.

Slide 47

Slide 47 text

Mitigating A Breaking Change 1. Some breaking changes are isolated enough, needing only minimally invasive fixes. 2. Some breaking changes require invasive changes to your codebase. 3. Your code is fine, but a dependency you rely on suffers from a breaking change. 4. SomeCmes there’s a regression in Go. TesCng early gives your changes enough soak Cme in the CI.

Slide 48

Slide 48 text

Mitigating A Breaking Change 1. Some breaking changes are isolated enough, needing only minimally invasive fixes. 2. Some breaking changes require invasive changes to your codebase. 3. Your code is fine, but a dependency you rely on suffers from a breaking change. 4. Sometimes there’s a regression in Go. Testing early gives you much-needed time to collaborate and work with with other communities.

Slide 49

Slide 49 text

Mitigating A Breaking Change 1. Some breaking changes are isolated enough, needing only minimally invasive fixes. 2. Some breaking changes require invasive changes to your codebase. 3. Your code is fine, but a dependency you rely on suffers from a breaking change. 4. SomeCmes there’s a regression in Go. go1.21 makes it easier for users to on-the-fly pull different versions of the Go toolchain now!

Slide 50

Slide 50 text

Mi@ga@ng A Breaking Change ❯ go version go version go1.21.1 linux/amd64 ❯ GOTOOLCHAIN=go1.22rc2 make test-integration ❯ GOTOOLCHAIN=local go test ./…

Slide 51

Slide 51 text

3. Understanding how the release and support cycles of Go align with your release and support cycles. The Misalignment Alignment

Slide 52

Slide 52 text

No content

Slide 53

Slide 53 text

No content

Slide 54

Slide 54 text

No content

Slide 55

Slide 55 text

No content

Slide 56

Slide 56 text

No content

Slide 57

Slide 57 text

No content

Slide 58

Slide 58 text

No content

Slide 59

Slide 59 text

No content

Slide 60

Slide 60 text

No content

Slide 61

Slide 61 text

No content

Slide 62

Slide 62 text

No content

Slide 63

Slide 63 text

No content

Slide 64

Slide 64 text

No content

Slide 65

Slide 65 text

No content

Slide 66

Slide 66 text

No content

Slide 67

Slide 67 text

No content

Slide 68

Slide 68 text

No content

Slide 69

Slide 69 text

No content

Slide 70

Slide 70 text

No content

Slide 71

Slide 71 text

No content

Slide 72

Slide 72 text

But hold on… here’s an idea – why don’t we ship K8s 1.X.Y on a newer Go major version?

Slide 73

Slide 73 text

Historically, Kubernetes release branches have stayed on a single Go major version.

Slide 74

Slide 74 text

Historically, Kubernetes release branches have stayed on a single Go major version. But why?

Slide 75

Slide 75 text

To answer this, we first need to look at what a Kubernetes patch release should NOT be.

Slide 76

Slide 76 text

A Kubernetes Patch Release No “de-stabilising” changes: ● No regressions. ● No new features. ● No new bugs. ● Should not require excessive user intervenCon to upgrade successfully.

Slide 77

Slide 77 text

How can a Go major release bring about de-stabilising changes?

Slide 78

Slide 78 text

Kubernetes Release Branches Staying On A Single Major Go Version

Slide 79

Slide 79 text

Kubernetes Release Branches Staying On A Single Major Go Version 1. Breaking stdlib changes without sufficiently long GODEBUG opt-out.

Slide 80

Slide 80 text

Kubernetes Release Branches Staying On A Single Major Go Version 1. Breaking stdlib changes without sufficiently long GODEBUG opt-out. go1.12: Added GODEBUG=tls13=1

Slide 81

Slide 81 text

Kubernetes Release Branches Staying On A Single Major Go Version 1. Breaking stdlib changes without sufficiently long GODEBUG opt-out. go1.12: Added GODEBUG=tls13=1 go1.13: Added GODEBUG=tls13=0

Slide 82

Slide 82 text

Kubernetes Release Branches Staying On A Single Major Go Version 1. Breaking stdlib changes without sufficiently long GODEBUG opt-out. go1.12: Added GODEBUG=tls13=1 go1.13: Added GODEBUG=tls13=0 go1.14: Removed GODEBUG tls13

Slide 83

Slide 83 text

Kubernetes Release Branches Staying On A Single Major Go Version 1. Breaking stdlib changes without sufficiently long GODEBUG opt-out. If K8s 1.X.Y is on go1.13 and K8s 1.X.Y+1 is bumped to go1.14, users reliant on the opt-out will break within 1 Kubernetes patch release! De- stabilising. go1.12: Added GODEBUG=tls13=1 go1.13: Added GODEBUG=tls13=0 go1.14: Removed GODEBUG tls13

Slide 84

Slide 84 text

Kubernetes Release Branches Staying On A Single Major Go Version 1. Breaking stdlib changes without sufficiently long GODEBUG opt-out. 2. Breaking stdlib changes with GODEBUG opt- out which is subject to change.

Slide 85

Slide 85 text

Kubernetes Release Branches Staying On A Single Major Go Version 1. Breaking stdlib changes without sufficiently long GODEBUG opt-out. 2. Breaking stdlib changes with GODEBUG opt- out which is subject to change.

Slide 86

Slide 86 text

Kubernetes Release Branches Staying On A Single Major Go Version 1. Breaking stdlib changes without sufficiently long GODEBUG opt-out. 2. Breaking stdlib changes with GODEBUG opt- out which is subject to change. Possible to set using os.Setenv(), but you’re pollu1ng the execu1on environment of the user and default values of GODEBUGs can change! De-stabilising.

Slide 87

Slide 87 text

Kubernetes Release Branches Staying On A Single Major Go Version 1. Breaking stdlib changes without sufficiently long GODEBUG opt-out. 2. Breaking stdlib changes with GODEBUG opt- out which is subject to change. 3. Breaking Go runtime changes with GODEBUG opt-out.

Slide 88

Slide 88 text

Kubernetes Release Branches Staying On A Single Major Go Version 1. Breaking stdlib changes without sufficiently long GODEBUG opt-out. 2. Breaking stdlib changes with GODEBUG opt- out which is subject to change. 3. Breaking Go runCme changes with GODEBUG opt-out.

Slide 89

Slide 89 text

Kubernetes Release Branches Staying On A Single Major Go Version 1. Breaking stdlib changes without sufficiently long GODEBUG opt-out. 2. Breaking stdlib changes with GODEBUG opt- out which is subject to change. 3. Breaking Go runtime changes with GODEBUG opt-out. The runtime reads vars before user programs start. Cannot set in func init() or using os.Setenv(), too late! Users need to intervene and set env var. De-stabilising.

Slide 90

Slide 90 text

How Does go1.21 Help?

Slide 91

Slide 91 text

How Does go1.21 Help? 1. Breaking stdlib changes without sufficiently long GODEBUG opt-out. “GODEBUG settings added for compatibility will be maintained for a minimum of two years (four Go releases).” https://go.dev/blog/compat

Slide 92

Slide 92 text

How Does go1.21 Help? 1. Breaking stdlib changes without sufficiently long GODEBUG opt-out. Min. 2 years means each Kubernetes version is guaranteed to have the GODEBUG setting for its entire support period.

Slide 93

Slide 93 text

How Does go1.21 Help? 1. Breaking stdlib changes without sufficiently long GODEBUG opt-out. Min. 2 years means each Kubernetes version is guaranteed to have the GODEBUG sePng for its enQre support period. Stabilised.

Slide 94

Slide 94 text

How Does go1.21 Help? 1. Breaking stdlib changes without sufficiently long GODEBUG opt-out. 2. Breaking stdlib changes with GODEBUG opt- out which is subject to change. “A program’s GODEBUG settings are configured to match the Go version listed in the main package’s go.mod file.” https://go.dev/blog/compat

Slide 95

Slide 95 text

How Does go1.21 Help? 1. Breaking stdlib changes without sufficiently long GODEBUG opt-out. 2. Breaking stdlib changes with GODEBUG opt- out which is subject to change. Users don’t need to intervene if the value of a GODEBUG setting changes.

Slide 96

Slide 96 text

How Does go1.21 Help? 1. Breaking stdlib changes without sufficiently long GODEBUG opt-out. 2. Breaking stdlib changes with GODEBUG opt- out which is subject to change. Users don’t need to intervene if the value of a GODEBUG se5ng changes. Stabilised.

Slide 97

Slide 97 text

How Does go1.21 Help? 1. Breaking stdlib changes without sufficiently long GODEBUG opt-out. 2. Breaking stdlib changes with GODEBUG opt- out which is subject to change. 3. Breaking Go runtime changes with GODEBUG opt-out. “A program can change individual GODEBUG se>ngs by using //go:debug lines in package main.” h"ps://go.dev/blog/compat

Slide 98

Slide 98 text

How Does go1.21 Help? 1. Breaking stdlib changes without sufficiently long GODEBUG opt-out. 2. Breaking stdlib changes with GODEBUG opt- out which is subject to change. 3. Breaking Go runtime changes with GODEBUG opt-out. “[...] it‘s not okay to make end users set an environment variable to run a program and setting the variable in main.main or even main’s init can be too late. The //go:debug lines provide a clear way to set those specific GODEBUGs” https://go.googlesource.com/proposal/+/master/design/56986-godebug.md#rationale

Slide 99

Slide 99 text

How Does go1.21 Help? 1. Breaking stdlib changes without sufficiently long GODEBUG opt-out. 2. Breaking stdlib changes with GODEBUG opt- out which is subject to change. 3. Breaking Go runQme changes with GODEBUG opt-out. We now have a way of granularly toggling GODEBUG settings at build time.

Slide 100

Slide 100 text

How Does go1.21 Help? 1. Breaking stdlib changes without sufficiently long GODEBUG opt-out. 2. Breaking stdlib changes with GODEBUG opt- out which is subject to change. 3. Breaking Go runQme changes with GODEBUG opt-out. We now have a way of granularly toggling GODEBUG settings at build time. Stabilised.

Slide 101

Slide 101 text

We can now bump Go versions on release branches! 🎉

Slide 102

Slide 102 text

4. Help users reconcile with default Go behaviour.

Slide 103

Slide 103 text

Let’s take an example.

Slide 104

Slide 104 text

No content

Slide 105

Slide 105 text

No content

Slide 106

Slide 106 text

No content

Slide 107

Slide 107 text

But wait… how does the user know when a GODEBUG seSng (like x509sha1) is going to be removed?

Slide 108

Slide 108 text

GODEBUG History “This section documents the GODEBUG settings introduced and removed in each major Go release for compatibility reasons.” https://go.dev/doc/godebug#history

Slide 109

Slide 109 text

How do you know if you’re relying on non-default behaviour?

Slide 110

Slide 110 text

How do you know if you’re relying on non-default behaviour? Need to sprinkle some observability ✨

Slide 111

Slide 111 text

Helping Users Reconcile With Default Go Behaviour For the x509sha1 example, we added our own observability in terms of metrics and Kubernetes audit logging annotations.

Slide 112

Slide 112 text

Helping Users Reconcile With Default Go Behaviour For the x509sha1 example, we added our own observability in terms of metrics and Kubernetes audit logging annotaCons. ❯ kubectl get --raw '/metrics' | prom2json \ | jq '.[] | select(.name | test("x509_insecure_sha1_total"))'

Slide 113

Slide 113 text

A consideration with this approach is that these are metrics that the project now has to maintain and evolve.

Slide 114

Slide 114 text

A consideration with this approach is that these are metrics that the project now has to maintain and evolve. Lucky for us…

Slide 115

Slide 115 text

StarVng go1.21, Go programs can monitor their own non- default behaviour!

Slide 116

Slide 116 text

“When possible, each GODEBUG setting has an associated runtime/metrics counter named /godebug/non-default-behavior/:events that counts the number of times a particular program’s behavior has changed based on a non-default value for that setting.” https://go.dev/doc/godebug

Slide 117

Slide 117 text

The Kubernetes /metrics endpoint by default exports all Go runtime metrics!

Slide 118

Slide 118 text

❯ kubectl get --raw '/metrics' \ | prom2json \ | jq '.[] | select(.name=="go_godebug_non_default_behavior_x509sha1_events_total")'

Slide 119

Slide 119 text

5. Actually Absorbing A Go Release

Slide 120

Slide 120 text

Let’s assume: currently development and release branches are on go1.N and we’d like to move to go1.N+1

Slide 121

Slide 121 text

Actually Absorbing A Go Release

Slide 122

Slide 122 text

Actually Absorbing A Go Release

Slide 123

Slide 123 text

Actually Absorbing A Go Release

Slide 124

Slide 124 text

Actually Absorbing A Go Release

Slide 125

Slide 125 text

Actually Absorbing A Go Release Come back to “different ways things can break”. Fix dependencies, code and behaviours.

Slide 126

Slide 126 text

Actually Absorbing A Go Release Most importantly: ensure any fix you do is validated against both go1.N and go1.N+1.

Slide 127

Slide 127 text

Actually Absorbing A Go Release At this point, the development branch is ready to be bumped to go1.N+1

Slide 128

Slide 128 text

Actually Absorbing A Go Release

Slide 129

Slide 129 text

Actually Absorbing A Go Release Give preference to collaborating with dependency maintainers and scoping the fix as much as possible.

Slide 130

Slide 130 text

Actually Absorbing A Go Release

Slide 131

Slide 131 text

Actually Absorbing A Go Release Update release branches to go1.N+1 iff: ● go1.N+1 has been released for ~3 months (go-release-cycle / 2).

Slide 132

Slide 132 text

Actually Absorbing A Go Release Update release branches to go1.N+1 iff: ● go1.N+1 has been released for ~3 months (go-release-cycle / 2). ● A released Kubernetes version uses go1.N+1 for at least a month.

Slide 133

Slide 133 text

Actually Absorbing A Go Release Update release branches to go1.N+1 iff: ● go1.N+1 has been released for ~3 months (go-release-cycle / 2). ● A released Kubernetes version uses go1.N+1 for at least a month. ● Backported changes continue to pass compatibility checks between go1.N and go1.N+1.

Slide 134

Slide 134 text

We’ve successfully absorbed a Go release!

Slide 135

Slide 135 text

Acknowledgements Huge shoutout to Jordan Liggia and folks over at SIGs Architecture, Release and TesCng who make this happen release aber release!

Slide 136

Slide 136 text

References 1. KEP-3744: Stay on supported go versions 2. Design Proposal: Extended backwards compatibility for Go 3. Backward Compatibility, Go 1.21, and Go 2 4. Design Proposal: Extended forwards compatibility for Go 5. Go, Backwards Compatibility, and GODEBUG

Slide 137

Slide 137 text

Thank you!