Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Reliably Absorbing A Go Release: Learnings From...

Reliably Absorbing A Go Release: Learnings From The Kubernetes Community

Madhav Jivrajani

October 09, 2023
Tweet

More Decks by Madhav Jivrajani

Other Decks in Technology

Transcript

  1. $ whoami • From India, work @ VMware. • I

    help maintain parts of the Kubernetes project. • Mostly involved with Architecture, API Machinery, Scalability and Contributor Experience.
  2. Agenda • Why are we talking about this? • What

    does “absorbing” a Go release mean for Kubernetes? • What goes into reliably absorbing a Go release?
  3. “Knowledge Is The Dual of Possibility.” J. Halpern et al.

    Knowledge and Common Knowledge In A Distributed Environment
  4. “With a sufficient number of users of an API, it

    does not ma8er what you promise in the contract: all observable behaviours of your system will be depended on by somebody.” h"ps://www.hyrumslaw.com/
  5. What Does Absorbing A Go Release Mean For Kubernetes? 1.

    Working towards making sure the CI is happy: builds and tests pass.
  6. What Does Absorbing A Go Release Mean For Kubernetes? 1.

    Working towards making sure the CI is happy: builds and tests pass. 2. Trying to make sure users don’t break!
  7. What Goes Into Reliably Absorbing A Go Release? 1. Gauging

    the surface area of what can break. 2. Answering the quesCon: what’s the best way to “miCgate a breaking change”?
  8. What Goes Into Reliably Absorbing A Go Release? 1. Gauging

    the surface area of what can break. 2. Answering the question: what’s the best way to “mitigate a breaking change”? 3. Understanding how the release and support cycles of Go align with your release and support cycles.
  9. What Goes Into Reliably Absorbing A Go Release? 1. Gauging

    the surface area of what can break. 2. Answering the quesCon: what’s the best way to “miCgate a breaking change”? 3. Understanding how the release and support cycles of Go align with your release and support cycles. 4. Help users reconcile with default Go behaviour.
  10. What Goes Into Reliably Absorbing A Go Release? 1. Gauging

    the surface area of what can break. 2. Answering the question: what’s the best way to “mitigate a breaking change”? 3. Understanding how the release and support cycles of Go align with your release and support cycles. 4. Help users reconcile with default Go behaviour. 5. Actually absorbing a Go release.
  11. What Goes Into Reliably Absorbing A Go Release? 1. Gauging

    the surface area of what can break. 2. Answering the quesCon: what’s the best way to “miCgate a breaking change”? 3. Understanding how the release and support cycles of Go align with your release and support cycles. 4. Help users reconcile with default Go behaviour. 5. Actually absorbing a Go release.
  12. What Goes Into Reliably Absorbing A Go Release? 1. Gauging

    the surface area of what can break. 2. Answering the question: what’s the best way to “mitigate a breaking change”? 3. Understanding how the release and support cycles of Go align with your release and support cycles. 4. Help users reconcile with default Go behaviour. 5. Actually absorbing a Go release. For CI
  13. What Goes Into Reliably Absorbing A Go Release? 1. Gauging

    the surface area of what can break. 2. Answering the question: what’s the best way to “mitigate a breaking change”? 3. Understanding how the release and support cycles of Go align with your release and support cycles. 4. Help users reconcile with default Go behaviour. 5. Actually absorbing a Go release. For users For CI
  14. Some Stats 1. Kubernetes is ~2.2 million lines of Go

    code and about ~240 dependencies on other modules (direct + indirect). a. And then some more for our CI. h"ps://deps.dev/go/k8s.io%2Fkubernetes/v1.22.0-alpha.2/dependencies/graph
  15. Some Stats 1. Kubernetes is ~2.2 million lines of Go

    code and about ~240 dependencies on other modules (direct + indirect). a. And then some more for our CI. 2. Surface area categories: static analysis tooling, dependency management tooling, tests (unit, integration, e2e, scale etc). https://deps.dev/go/k8s.io%2Fkubernetes/v1.22.0-alpha.2/dependencies/graph
  16. Different Ways Things Break 1. Code in dependencies can break

    2. Your code itself can break 3. Static analysis tooling can break
  17. Different Ways Things Break 1. Code in dependencies can break

    2. Your code itself can break 3. StaCc analysis tooling can break
  18. Different Ways Things Break 1. Code in dependencies can break

    2. Your code itself can break 3. Static analysis tooling can break 4. The runtime behaviour of existing programs can change
  19. Different Ways Things Break 1. Code in dependencies can break

    2. Your code itself can break 3. StaCc analysis tooling can break 4. The run-me behaviour of exisCng programs can change
  20. Mitigating A Breaking Change 1. Some breaking changes are isolated

    enough with minimally invasive fixes to miCgate.
  21. Mitigating A Breaking Change 1. Some breaking changes are isolated

    enough with minimally invasive fixes to mitigate. 2. Some breaking changes require invasive changes to your codebase.
  22. Mitigating A Breaking Change 1. Some breaking changes are isolated

    enough needing only minimally invasive fixes. 2. Some breaking changes require invasive changes to your codebase. You have control over the .meline of when these fixes happen!
  23. Mi@ga@ng A Breaking Change 1. Some breaking changes are isolated

    enough, needing only minimally invasive fixes. 2. Some breaking changes require invasive changes to your codebase. 3. Your code is fine, but a dependency you rely on suffers from a breaking change.
  24. Mitigating A Breaking Change 1. Some breaking changes are isolated

    enough, needing only minimally invasive fixes. 2. Some breaking changes require invasive changes to your codebase. 3. Your code is fine, but a dependency you rely on suffers from a breaking change. 4. SomeCmes there’s a regression in Go.
  25. Mitigating A Breaking Change 1. Some breaking changes are isolated

    enough, needing only minimally invasive fixes. 2. Some breaking changes require invasive changes to your codebase. 3. Your code is fine, but a dependency you rely on suffers from a breaking change. 4. Sometimes there’s a regression in Go. You may not have control over the timelines of these fixes!
  26. Mitigating A Breaking Change 1. Some breaking changes are isolated

    enough, needing only minimally invasive fixes. 2. Some breaking changes require invasive changes to your codebase. 3. Your code is fine, but a dependency you rely on suffers from a breaking change. 4. SomeCmes there’s a regression in Go. The best way to insulate against any of these scenarios is to try and start tes-ng Go versions really early! go1.Xrc1, go1.Xrc2…
  27. Mi@ga@ng A Breaking Change 1. Some breaking changes are isolated

    enough, needing only minimally invasive fixes. 2. Some breaking changes require invasive changes to your codebase. 3. Your code is fine, but a dependency you rely on suffers from a breaking change. 4. Sometimes there’s a regression in Go. Opportunity to establish timely feedback loops leads to increased reliability.
  28. Mitigating A Breaking Change 1. Some breaking changes are isolated

    enough, needing only minimally invasive fixes. 2. Some breaking changes require invasive changes to your codebase. 3. Your code is fine, but a dependency you rely on suffers from a breaking change. 4. SomeCmes there’s a regression in Go. TesCng early gives your changes enough soak Cme in the CI.
  29. Mitigating A Breaking Change 1. Some breaking changes are isolated

    enough, needing only minimally invasive fixes. 2. Some breaking changes require invasive changes to your codebase. 3. Your code is fine, but a dependency you rely on suffers from a breaking change. 4. Sometimes there’s a regression in Go. Testing early gives you much-needed time to collaborate and work with with other communities.
  30. Mitigating A Breaking Change 1. Some breaking changes are isolated

    enough, needing only minimally invasive fixes. 2. Some breaking changes require invasive changes to your codebase. 3. Your code is fine, but a dependency you rely on suffers from a breaking change. 4. SomeCmes there’s a regression in Go. go1.21 makes it easier for users to on-the-fly pull different versions of the Go toolchain now!
  31. Mi@ga@ng A Breaking Change ❯ go version go version go1.21.1

    linux/amd64 ❯ GOTOOLCHAIN=go1.22rc2 make test-integration ❯ GOTOOLCHAIN=local go test ./…
  32. 3. Understanding how the release and support cycles of Go

    align with your release and support cycles. The Misalignment Alignment
  33. But hold on… here’s an idea – why don’t we

    ship K8s 1.X.Y on a newer Go major version?
  34. To answer this, we first need to look at what

    a Kubernetes patch release should NOT be.
  35. A Kubernetes Patch Release No “de-stabilising” changes: • No regressions.

    • No new features. • No new bugs. • Should not require excessive user intervenCon to upgrade successfully.
  36. Kubernetes Release Branches Staying On A Single Major Go Version

    1. Breaking stdlib changes without sufficiently long GODEBUG opt-out.
  37. Kubernetes Release Branches Staying On A Single Major Go Version

    1. Breaking stdlib changes without sufficiently long GODEBUG opt-out. go1.12: Added GODEBUG=tls13=1
  38. Kubernetes Release Branches Staying On A Single Major Go Version

    1. Breaking stdlib changes without sufficiently long GODEBUG opt-out. go1.12: Added GODEBUG=tls13=1 go1.13: Added GODEBUG=tls13=0
  39. Kubernetes Release Branches Staying On A Single Major Go Version

    1. Breaking stdlib changes without sufficiently long GODEBUG opt-out. go1.12: Added GODEBUG=tls13=1 go1.13: Added GODEBUG=tls13=0 go1.14: Removed GODEBUG tls13
  40. Kubernetes Release Branches Staying On A Single Major Go Version

    1. Breaking stdlib changes without sufficiently long GODEBUG opt-out. If K8s 1.X.Y is on go1.13 and K8s 1.X.Y+1 is bumped to go1.14, users reliant on the opt-out will break within 1 Kubernetes patch release! De- stabilising. go1.12: Added GODEBUG=tls13=1 go1.13: Added GODEBUG=tls13=0 go1.14: Removed GODEBUG tls13
  41. Kubernetes Release Branches Staying On A Single Major Go Version

    1. Breaking stdlib changes without sufficiently long GODEBUG opt-out. 2. Breaking stdlib changes with GODEBUG opt- out which is subject to change.
  42. Kubernetes Release Branches Staying On A Single Major Go Version

    1. Breaking stdlib changes without sufficiently long GODEBUG opt-out. 2. Breaking stdlib changes with GODEBUG opt- out which is subject to change.
  43. Kubernetes Release Branches Staying On A Single Major Go Version

    1. Breaking stdlib changes without sufficiently long GODEBUG opt-out. 2. Breaking stdlib changes with GODEBUG opt- out which is subject to change. Possible to set using os.Setenv(), but you’re pollu1ng the execu1on environment of the user and default values of GODEBUGs can change! De-stabilising.
  44. Kubernetes Release Branches Staying On A Single Major Go Version

    1. Breaking stdlib changes without sufficiently long GODEBUG opt-out. 2. Breaking stdlib changes with GODEBUG opt- out which is subject to change. 3. Breaking Go runtime changes with GODEBUG opt-out.
  45. Kubernetes Release Branches Staying On A Single Major Go Version

    1. Breaking stdlib changes without sufficiently long GODEBUG opt-out. 2. Breaking stdlib changes with GODEBUG opt- out which is subject to change. 3. Breaking Go runCme changes with GODEBUG opt-out.
  46. Kubernetes Release Branches Staying On A Single Major Go Version

    1. Breaking stdlib changes without sufficiently long GODEBUG opt-out. 2. Breaking stdlib changes with GODEBUG opt- out which is subject to change. 3. Breaking Go runtime changes with GODEBUG opt-out. The runtime reads vars before user programs start. Cannot set in func init() or using os.Setenv(), too late! Users need to intervene and set env var. De-stabilising.
  47. How Does go1.21 Help? 1. Breaking stdlib changes without sufficiently

    long GODEBUG opt-out. “GODEBUG settings added for compatibility will be maintained for a minimum of two years (four Go releases).” https://go.dev/blog/compat
  48. How Does go1.21 Help? 1. Breaking stdlib changes without sufficiently

    long GODEBUG opt-out. Min. 2 years means each Kubernetes version is guaranteed to have the GODEBUG setting for its entire support period.
  49. How Does go1.21 Help? 1. Breaking stdlib changes without sufficiently

    long GODEBUG opt-out. Min. 2 years means each Kubernetes version is guaranteed to have the GODEBUG sePng for its enQre support period. Stabilised.
  50. How Does go1.21 Help? 1. Breaking stdlib changes without sufficiently

    long GODEBUG opt-out. 2. Breaking stdlib changes with GODEBUG opt- out which is subject to change. “A program’s GODEBUG settings are configured to match the Go version listed in the main package’s go.mod file.” https://go.dev/blog/compat
  51. How Does go1.21 Help? 1. Breaking stdlib changes without sufficiently

    long GODEBUG opt-out. 2. Breaking stdlib changes with GODEBUG opt- out which is subject to change. Users don’t need to intervene if the value of a GODEBUG setting changes.
  52. How Does go1.21 Help? 1. Breaking stdlib changes without sufficiently

    long GODEBUG opt-out. 2. Breaking stdlib changes with GODEBUG opt- out which is subject to change. Users don’t need to intervene if the value of a GODEBUG se5ng changes. Stabilised.
  53. How Does go1.21 Help? 1. Breaking stdlib changes without sufficiently

    long GODEBUG opt-out. 2. Breaking stdlib changes with GODEBUG opt- out which is subject to change. 3. Breaking Go runtime changes with GODEBUG opt-out. “A program can change individual GODEBUG se>ngs by using //go:debug lines in package main.” h"ps://go.dev/blog/compat
  54. How Does go1.21 Help? 1. Breaking stdlib changes without sufficiently

    long GODEBUG opt-out. 2. Breaking stdlib changes with GODEBUG opt- out which is subject to change. 3. Breaking Go runtime changes with GODEBUG opt-out. “[...] it‘s not okay to make end users set an environment variable to run a program and setting the variable in main.main or even main’s init can be too late. The //go:debug lines provide a clear way to set those specific GODEBUGs” https://go.googlesource.com/proposal/+/master/design/56986-godebug.md#rationale
  55. How Does go1.21 Help? 1. Breaking stdlib changes without sufficiently

    long GODEBUG opt-out. 2. Breaking stdlib changes with GODEBUG opt- out which is subject to change. 3. Breaking Go runQme changes with GODEBUG opt-out. We now have a way of granularly toggling GODEBUG settings at build time.
  56. How Does go1.21 Help? 1. Breaking stdlib changes without sufficiently

    long GODEBUG opt-out. 2. Breaking stdlib changes with GODEBUG opt- out which is subject to change. 3. Breaking Go runQme changes with GODEBUG opt-out. We now have a way of granularly toggling GODEBUG settings at build time. Stabilised.
  57. But wait… how does the user know when a GODEBUG

    seSng (like x509sha1) is going to be removed?
  58. GODEBUG History “This section documents the GODEBUG settings introduced and

    removed in each major Go release for compatibility reasons.” https://go.dev/doc/godebug#history
  59. How do you know if you’re relying on non-default behaviour?

    Need to sprinkle some observability ✨
  60. Helping Users Reconcile With Default Go Behaviour For the x509sha1

    example, we added our own observability in terms of metrics and Kubernetes audit logging annotations.
  61. Helping Users Reconcile With Default Go Behaviour For the x509sha1

    example, we added our own observability in terms of metrics and Kubernetes audit logging annotaCons. ❯ kubectl get --raw '/metrics' | prom2json \ | jq '.[] | select(.name | test("x509_insecure_sha1_total"))'
  62. A consideration with this approach is that these are metrics

    that the project now has to maintain and evolve.
  63. A consideration with this approach is that these are metrics

    that the project now has to maintain and evolve. Lucky for us…
  64. “When possible, each GODEBUG setting has an associated runtime/metrics counter

    named /godebug/non-default-behavior/<name>:events that counts the number of times a particular program’s behavior has changed based on a non-default value for that setting.” https://go.dev/doc/godebug
  65. ❯ kubectl get --raw '/metrics' \ | prom2json \ |

    jq '.[] | select(.name=="go_godebug_non_default_behavior_x509sha1_events_total")'
  66. Actually Absorbing A Go Release Come back to “different ways

    things can break”. Fix dependencies, code and behaviours.
  67. Actually Absorbing A Go Release Most importantly: ensure any fix

    you do is validated against both go1.N and go1.N+1.
  68. Actually Absorbing A Go Release At this point, the development

    branch is ready to be bumped to go1.N+1
  69. Actually Absorbing A Go Release Give preference to collaborating with

    dependency maintainers and scoping the fix as much as possible.
  70. Actually Absorbing A Go Release Update release branches to go1.N+1

    iff: • go1.N+1 has been released for ~3 months (go-release-cycle / 2).
  71. Actually Absorbing A Go Release Update release branches to go1.N+1

    iff: • go1.N+1 has been released for ~3 months (go-release-cycle / 2). • A released Kubernetes version uses go1.N+1 for at least a month.
  72. Actually Absorbing A Go Release Update release branches to go1.N+1

    iff: • go1.N+1 has been released for ~3 months (go-release-cycle / 2). • A released Kubernetes version uses go1.N+1 for at least a month. • Backported changes continue to pass compatibility checks between go1.N and go1.N+1.
  73. Acknowledgements Huge shoutout to Jordan Liggia and folks over at

    SIGs Architecture, Release and TesCng who make this happen release aber release!
  74. References 1. KEP-3744: Stay on supported go versions 2. Design

    Proposal: Extended backwards compatibility for Go 3. Backward Compatibility, Go 1.21, and Go 2 4. Design Proposal: Extended forwards compatibility for Go 5. Go, Backwards Compatibility, and GODEBUG