Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Reliably Absorbing A Go Release: Learnings From The Kubernetes Community

Reliably Absorbing A Go Release: Learnings From The Kubernetes Community

Madhav Jivrajani

October 09, 2023
Tweet

More Decks by Madhav Jivrajani

Other Decks in Technology

Transcript

  1. Reliably Absorbing A Go Release: Learnings From The
    Kubernetes Community
    Madhav Jivrajani

    View full-size slide

  2. $ whoami
    ● From India, work @ VMware.
    ● I help maintain parts of the Kubernetes project.
    ● Mostly involved with Architecture, API Machinery, Scalability and Contributor
    Experience.

    View full-size slide

  3. Agenda
    ● Why are we talking about this?
    ● What does “absorbing” a Go release mean for Kubernetes?
    ● What goes into reliably absorbing a Go release?

    View full-size slide

  4. “Knowledge Is The Dual of Possibility.”
    J. Halpern et al. Knowledge and Common Knowledge In A Distributed Environment

    View full-size slide

  5. “With a sufficient number of users of an API, it does not ma8er what you
    promise in the contract: all observable behaviours of your system will be
    depended on by somebody.”
    h"ps://www.hyrumslaw.com/

    View full-size slide

  6. What does absorbing a Go release mean for Kubernetes?

    View full-size slide

  7. What Does Absorbing A Go Release Mean For Kubernetes?

    View full-size slide

  8. What Does Absorbing A Go Release Mean For Kubernetes?
    1. Working towards making sure the CI is
    happy: builds and tests pass.

    View full-size slide

  9. What Does Absorbing A Go Release Mean For Kubernetes?
    1. Working towards making sure the CI is
    happy: builds and tests pass.
    2. Trying to make sure users don’t break!

    View full-size slide

  10. What goes into reliably absorbing a Go release?

    View full-size slide

  11. What Goes Into Reliably Absorbing A Go Release?

    View full-size slide

  12. What Goes Into Reliably Absorbing A Go Release?
    1. Gauging the surface area of what can break.

    View full-size slide

  13. What Goes Into Reliably Absorbing A Go Release?
    1. Gauging the surface area of what can break.
    2. Answering the quesCon: what’s the best way to “miCgate a breaking change”?

    View full-size slide

  14. What Goes Into Reliably Absorbing A Go Release?
    1. Gauging the surface area of what can break.
    2. Answering the question: what’s the best way to “mitigate a breaking change”?
    3. Understanding how the release and support cycles of Go align with your release and support
    cycles.

    View full-size slide

  15. What Goes Into Reliably Absorbing A Go Release?
    1. Gauging the surface area of what can break.
    2. Answering the quesCon: what’s the best way to “miCgate a breaking change”?
    3. Understanding how the release and support cycles of Go align with your release and support
    cycles.
    4. Help users reconcile with default Go behaviour.

    View full-size slide

  16. What Goes Into Reliably Absorbing A Go Release?
    1. Gauging the surface area of what can break.
    2. Answering the question: what’s the best way to “mitigate a breaking change”?
    3. Understanding how the release and support cycles of Go align with your release and support
    cycles.
    4. Help users reconcile with default Go behaviour.
    5. Actually absorbing a Go release.

    View full-size slide

  17. What Goes Into Reliably Absorbing A Go Release?
    1. Gauging the surface area of what can break.
    2. Answering the quesCon: what’s the best way to “miCgate a breaking change”?
    3. Understanding how the release and support cycles of Go align with your release and support
    cycles.
    4. Help users reconcile with default Go behaviour.
    5. Actually absorbing a Go release.

    View full-size slide

  18. What Goes Into Reliably Absorbing A Go Release?
    1. Gauging the surface area of what can break.
    2. Answering the question: what’s the best way to “mitigate a breaking change”?
    3. Understanding how the release and support cycles of Go align with your release and support
    cycles.
    4. Help users reconcile with default Go behaviour.
    5. Actually absorbing a Go release.
    For CI

    View full-size slide

  19. What Goes Into Reliably Absorbing A Go Release?
    1. Gauging the surface area of what can break.
    2. Answering the question: what’s the best way to “mitigate a breaking change”?
    3. Understanding how the release and support cycles of Go align with your release and support
    cycles.
    4. Help users reconcile with default Go behaviour.
    5. Actually absorbing a Go release.
    For users
    For CI

    View full-size slide

  20. 1. Gauging The Surface Area of What Can Break.

    View full-size slide

  21. What does the “Go surface area” of Kubernetes look like?

    View full-size slide

  22. Some Stats
    1. Kubernetes is ~2.2 million lines of Go code
    and about ~240 dependencies on other
    modules (direct + indirect).
    a. And then some more for our CI.
    h"ps://deps.dev/go/k8s.io%2Fkubernetes/v1.22.0-alpha.2/dependencies/graph

    View full-size slide

  23. Some Stats
    1. Kubernetes is ~2.2 million lines of Go code
    and about ~240 dependencies on other
    modules (direct + indirect).
    a. And then some more for our CI.
    2. Surface area categories: static analysis tooling,
    dependency management tooling, tests (unit,
    integration, e2e, scale etc).
    https://deps.dev/go/k8s.io%2Fkubernetes/v1.22.0-alpha.2/dependencies/graph

    View full-size slide

  24. Different Ways Things Break

    View full-size slide

  25. Different Ways Things Break
    1. Code in dependencies can break

    View full-size slide

  26. Different Ways Things Break
    1. Code in dependencies can break

    View full-size slide

  27. Different Ways Things Break
    1. Code in dependencies can break
    2. Your code itself can break

    View full-size slide

  28. Different Ways Things Break
    1. Code in dependencies can break
    2. Your code itself can break

    View full-size slide

  29. Different Ways Things Break
    1. Code in dependencies can break
    2. Your code itself can break
    3. Static analysis tooling can break

    View full-size slide

  30. Different Ways Things Break
    1. Code in dependencies can break
    2. Your code itself can break
    3. StaCc analysis tooling can break

    View full-size slide

  31. Different Ways Things Break
    1. Code in dependencies can break
    2. Your code itself can break
    3. Static analysis tooling can break
    4. The runtime behaviour of existing programs
    can change

    View full-size slide

  32. Different Ways Things Break
    1. Code in dependencies can break
    2. Your code itself can break
    3. StaCc analysis tooling can break
    4. The run-me behaviour of exisCng programs
    can change

    View full-size slide

  33. A release is only as backwards
    compa2ble as its least backwards
    compa2ble change.

    View full-size slide

  34. 2. What’s The Best Way To Mitigate A Breaking Change?

    View full-size slide

  35. Mitigating A Breaking Change
    1. Some breaking changes are isolated enough with minimally invasive fixes to miCgate.

    View full-size slide

  36. Mitigating A Breaking Change
    1. Some breaking changes are isolated enough with minimally invasive fixes to mitigate.
    2. Some breaking changes require invasive changes to your codebase.

    View full-size slide

  37. Mitigating A Breaking Change
    1. Some breaking changes are isolated enough needing only minimally invasive fixes.
    2. Some breaking changes require invasive changes to your codebase.
    You have control over the .meline of when these fixes happen!

    View full-size slide

  38. Mi@ga@ng A Breaking Change
    1. Some breaking changes are isolated enough, needing only minimally invasive fixes.
    2. Some breaking changes require invasive changes to your codebase.
    3. Your code is fine, but a dependency you rely on suffers from a breaking change.

    View full-size slide

  39. Mitigating A Breaking Change
    1. Some breaking changes are isolated enough, needing only minimally invasive fixes.
    2. Some breaking changes require invasive changes to your codebase.
    3. Your code is fine, but a dependency you rely on suffers from a breaking change.
    4. SomeCmes there’s a regression in Go.

    View full-size slide

  40. Mitigating A Breaking Change
    1. Some breaking changes are isolated enough, needing only minimally invasive fixes.
    2. Some breaking changes require invasive changes to your codebase.
    3. Your code is fine, but a dependency you rely on suffers from a breaking change.
    4. Sometimes there’s a regression in Go.
    You may not have control over the timelines of these fixes!

    View full-size slide

  41. Mitigating A Breaking Change
    1. Some breaking changes are isolated enough, needing only minimally invasive fixes.
    2. Some breaking changes require invasive changes to your codebase.
    3. Your code is fine, but a dependency you rely on suffers from a breaking change.
    4. SomeCmes there’s a regression in Go.
    The best way to insulate against any of these scenarios is to try and start tes-ng Go versions really early!
    go1.Xrc1, go1.Xrc2…

    View full-size slide

  42. Mi@ga@ng A Breaking Change
    1. Some breaking changes are isolated enough, needing only minimally invasive fixes.
    2. Some breaking changes require invasive changes to your codebase.
    3. Your code is fine, but a dependency you rely on suffers from a breaking change.
    4. Sometimes there’s a regression in Go.
    Opportunity to establish timely feedback loops leads to increased reliability.

    View full-size slide

  43. Mitigating A Breaking Change
    1. Some breaking changes are isolated enough, needing only minimally invasive fixes.
    2. Some breaking changes require invasive changes to your codebase.
    3. Your code is fine, but a dependency you rely on suffers from a breaking change.
    4. SomeCmes there’s a regression in Go.
    TesCng early gives your changes enough soak Cme in the CI.

    View full-size slide

  44. Mitigating A Breaking Change
    1. Some breaking changes are isolated enough, needing only minimally invasive fixes.
    2. Some breaking changes require invasive changes to your codebase.
    3. Your code is fine, but a dependency you rely on suffers from a breaking change.
    4. Sometimes there’s a regression in Go.
    Testing early gives you much-needed time to collaborate and work with with other communities.

    View full-size slide

  45. Mitigating A Breaking Change
    1. Some breaking changes are isolated enough, needing only minimally invasive fixes.
    2. Some breaking changes require invasive changes to your codebase.
    3. Your code is fine, but a dependency you rely on suffers from a breaking change.
    4. SomeCmes there’s a regression in Go.
    go1.21 makes it easier for users to on-the-fly pull different versions of the Go toolchain now!

    View full-size slide

  46. Mi@ga@ng A Breaking Change
    ❯ go version
    go version go1.21.1 linux/amd64
    ❯ GOTOOLCHAIN=go1.22rc2 make test-integration
    ❯ GOTOOLCHAIN=local go test ./…

    View full-size slide

  47. 3. Understanding how the release and support cycles of Go
    align with your release and support cycles.
    The Misalignment Alignment

    View full-size slide

  48. But hold on… here’s an idea – why don’t we ship K8s 1.X.Y
    on a newer Go major version?

    View full-size slide

  49. Historically, Kubernetes release branches have stayed on a
    single Go major version.

    View full-size slide

  50. Historically, Kubernetes release branches have stayed on a
    single Go major version.
    But why?

    View full-size slide

  51. To answer this, we first need to look at what a Kubernetes patch
    release should NOT be.

    View full-size slide

  52. A Kubernetes Patch Release
    No “de-stabilising” changes:
    ● No regressions.
    ● No new features.
    ● No new bugs.
    ● Should not require excessive user intervenCon to upgrade successfully.

    View full-size slide

  53. How can a Go major release bring about de-stabilising changes?

    View full-size slide

  54. Kubernetes Release Branches Staying On A Single Major Go Version

    View full-size slide

  55. Kubernetes Release Branches Staying On A Single Major Go Version
    1. Breaking stdlib changes without sufficiently
    long GODEBUG opt-out.

    View full-size slide

  56. Kubernetes Release Branches Staying On A Single Major Go Version
    1. Breaking stdlib changes without sufficiently
    long GODEBUG opt-out.
    go1.12:
    Added GODEBUG=tls13=1

    View full-size slide

  57. Kubernetes Release Branches Staying On A Single Major Go Version
    1. Breaking stdlib changes without sufficiently
    long GODEBUG opt-out.
    go1.12:
    Added GODEBUG=tls13=1
    go1.13:
    Added GODEBUG=tls13=0

    View full-size slide

  58. Kubernetes Release Branches Staying On A Single Major Go Version
    1. Breaking stdlib changes without sufficiently
    long GODEBUG opt-out.
    go1.12:
    Added GODEBUG=tls13=1
    go1.13:
    Added GODEBUG=tls13=0
    go1.14:
    Removed GODEBUG tls13

    View full-size slide

  59. Kubernetes Release Branches Staying On A Single Major Go Version
    1. Breaking stdlib changes without sufficiently
    long GODEBUG opt-out.
    If K8s 1.X.Y is on go1.13 and K8s 1.X.Y+1 is
    bumped to go1.14, users reliant on the opt-out will
    break within 1 Kubernetes patch release! De-
    stabilising.
    go1.12:
    Added GODEBUG=tls13=1
    go1.13:
    Added GODEBUG=tls13=0
    go1.14:
    Removed GODEBUG tls13

    View full-size slide

  60. Kubernetes Release Branches Staying On A Single Major Go Version
    1. Breaking stdlib changes without sufficiently
    long GODEBUG opt-out.
    2. Breaking stdlib changes with GODEBUG opt-
    out which is subject to change.

    View full-size slide

  61. Kubernetes Release Branches Staying On A Single Major Go Version
    1. Breaking stdlib changes without sufficiently
    long GODEBUG opt-out.
    2. Breaking stdlib changes with GODEBUG opt-
    out which is subject to change.

    View full-size slide

  62. Kubernetes Release Branches Staying On A Single Major Go Version
    1. Breaking stdlib changes without sufficiently
    long GODEBUG opt-out.
    2. Breaking stdlib changes with GODEBUG opt-
    out which is subject to change.
    Possible to set using os.Setenv(), but you’re
    pollu1ng the execu1on environment of the user and
    default values of GODEBUGs can change! De-stabilising.

    View full-size slide

  63. Kubernetes Release Branches Staying On A Single Major Go Version
    1. Breaking stdlib changes without sufficiently
    long GODEBUG opt-out.
    2. Breaking stdlib changes with GODEBUG opt-
    out which is subject to change.
    3. Breaking Go runtime changes with
    GODEBUG opt-out.

    View full-size slide

  64. Kubernetes Release Branches Staying On A Single Major Go Version
    1. Breaking stdlib changes without sufficiently
    long GODEBUG opt-out.
    2. Breaking stdlib changes with GODEBUG opt-
    out which is subject to change.
    3. Breaking Go runCme changes with
    GODEBUG opt-out.

    View full-size slide

  65. Kubernetes Release Branches Staying On A Single Major Go Version
    1. Breaking stdlib changes without sufficiently
    long GODEBUG opt-out.
    2. Breaking stdlib changes with GODEBUG opt-
    out which is subject to change.
    3. Breaking Go runtime changes with GODEBUG
    opt-out.
    The runtime reads vars before user programs start. Cannot set in
    func init() or using os.Setenv(), too late! Users need to
    intervene and set env var. De-stabilising.

    View full-size slide

  66. How Does go1.21 Help?

    View full-size slide

  67. How Does go1.21 Help?
    1. Breaking stdlib changes without sufficiently
    long GODEBUG opt-out.
    “GODEBUG settings added for compatibility will be maintained for a minimum of two years (four Go releases).”
    https://go.dev/blog/compat

    View full-size slide

  68. How Does go1.21 Help?
    1. Breaking stdlib changes without sufficiently
    long GODEBUG opt-out.
    Min. 2 years means each Kubernetes version is guaranteed to have the GODEBUG setting for its entire
    support period.

    View full-size slide

  69. How Does go1.21 Help?
    1. Breaking stdlib changes without sufficiently
    long GODEBUG opt-out.
    Min. 2 years means each Kubernetes version is guaranteed to have the GODEBUG sePng for its enQre
    support period.
    Stabilised.

    View full-size slide

  70. How Does go1.21 Help?
    1. Breaking stdlib changes without sufficiently
    long GODEBUG opt-out.
    2. Breaking stdlib changes with GODEBUG opt-
    out which is subject to change.
    “A program’s GODEBUG settings are configured to match the Go version listed in the main
    package’s go.mod file.”
    https://go.dev/blog/compat

    View full-size slide

  71. How Does go1.21 Help?
    1. Breaking stdlib changes without sufficiently
    long GODEBUG opt-out.
    2. Breaking stdlib changes with GODEBUG opt-
    out which is subject to change.
    Users don’t need to intervene if the value of a GODEBUG setting changes.

    View full-size slide

  72. How Does go1.21 Help?
    1. Breaking stdlib changes without sufficiently
    long GODEBUG opt-out.
    2. Breaking stdlib changes with GODEBUG opt-
    out which is subject to change.
    Users don’t need to intervene if the value of a GODEBUG se5ng changes.
    Stabilised.

    View full-size slide

  73. How Does go1.21 Help?
    1. Breaking stdlib changes without sufficiently
    long GODEBUG opt-out.
    2. Breaking stdlib changes with GODEBUG opt-
    out which is subject to change.
    3. Breaking Go runtime changes with GODEBUG
    opt-out.
    “A program can change individual GODEBUG se>ngs by using //go:debug lines in package main.”
    h"ps://go.dev/blog/compat

    View full-size slide

  74. How Does go1.21 Help?
    1. Breaking stdlib changes without sufficiently
    long GODEBUG opt-out.
    2. Breaking stdlib changes with GODEBUG opt-
    out which is subject to change.
    3. Breaking Go runtime changes with GODEBUG
    opt-out.
    “[...] it‘s not okay to make end users set an environment variable to run a program and setting the variable in main.main or even main’s init can
    be too late. The //go:debug lines provide a clear way to set those specific GODEBUGs”
    https://go.googlesource.com/proposal/+/master/design/56986-godebug.md#rationale

    View full-size slide

  75. How Does go1.21 Help?
    1. Breaking stdlib changes without sufficiently
    long GODEBUG opt-out.
    2. Breaking stdlib changes with GODEBUG opt-
    out which is subject to change.
    3. Breaking Go runQme changes with GODEBUG
    opt-out.
    We now have a way of granularly toggling GODEBUG settings at build time.

    View full-size slide

  76. How Does go1.21 Help?
    1. Breaking stdlib changes without sufficiently
    long GODEBUG opt-out.
    2. Breaking stdlib changes with GODEBUG opt-
    out which is subject to change.
    3. Breaking Go runQme changes with GODEBUG
    opt-out.
    We now have a way of granularly toggling GODEBUG settings at build time.
    Stabilised.

    View full-size slide

  77. We can now bump Go versions on release branches! 🎉

    View full-size slide

  78. 4. Help users reconcile with default Go behaviour.

    View full-size slide

  79. Let’s take an example.

    View full-size slide

  80. But wait… how does the user know when a GODEBUG seSng
    (like x509sha1) is going to be removed?

    View full-size slide

  81. GODEBUG History
    “This section documents the GODEBUG settings introduced and removed in each major Go release
    for compatibility reasons.”
    https://go.dev/doc/godebug#history

    View full-size slide

  82. How do you know if you’re relying on non-default behaviour?

    View full-size slide

  83. How do you know if you’re relying on non-default behaviour?
    Need to sprinkle some observability ✨

    View full-size slide

  84. Helping Users Reconcile With Default Go Behaviour
    For the x509sha1 example, we added our own
    observability in terms of metrics and Kubernetes
    audit logging annotations.

    View full-size slide

  85. Helping Users Reconcile With Default Go Behaviour
    For the x509sha1 example, we added our own
    observability in terms of metrics and Kubernetes
    audit logging annotaCons.
    ❯ kubectl get --raw '/metrics' | prom2json \
    | jq '.[] |
    select(.name | test("x509_insecure_sha1_total"))'

    View full-size slide

  86. A consideration with this approach is that these are metrics
    that the project now has to maintain and evolve.

    View full-size slide

  87. A consideration with this approach is that these are metrics
    that the project now has to maintain and evolve.
    Lucky for us…

    View full-size slide

  88. StarVng go1.21, Go programs can monitor their own non-
    default behaviour!

    View full-size slide

  89. “When possible, each GODEBUG setting has an associated runtime/metrics counter
    named /godebug/non-default-behavior/:events that counts the
    number of times a particular program’s behavior has changed based on a non-default value
    for that setting.”
    https://go.dev/doc/godebug

    View full-size slide

  90. The Kubernetes /metrics endpoint by default exports all Go
    runtime metrics!

    View full-size slide

  91. ❯ kubectl get --raw '/metrics' \
    | prom2json \
    | jq '.[] | select(.name=="go_godebug_non_default_behavior_x509sha1_events_total")'

    View full-size slide

  92. 5. Actually Absorbing A Go Release

    View full-size slide

  93. Let’s assume: currently development and release branches are on go1.N
    and we’d like to move to go1.N+1

    View full-size slide

  94. Actually Absorbing A Go Release

    View full-size slide

  95. Actually Absorbing A Go Release

    View full-size slide

  96. Actually Absorbing A Go Release

    View full-size slide

  97. Actually Absorbing A Go Release

    View full-size slide

  98. Actually Absorbing A Go Release
    Come back to “different ways things can break”.
    Fix dependencies, code and behaviours.

    View full-size slide

  99. Actually Absorbing A Go Release
    Most importantly: ensure any fix you do is
    validated against both go1.N and go1.N+1.

    View full-size slide

  100. Actually Absorbing A Go Release
    At this point, the development branch is ready to
    be bumped to go1.N+1

    View full-size slide

  101. Actually Absorbing A Go Release

    View full-size slide

  102. Actually Absorbing A Go Release
    Give preference to collaborating with
    dependency maintainers and scoping the fix as
    much as possible.

    View full-size slide

  103. Actually Absorbing A Go Release

    View full-size slide

  104. Actually Absorbing A Go Release
    Update release branches to go1.N+1 iff:
    ● go1.N+1 has been released for ~3 months
    (go-release-cycle / 2).

    View full-size slide

  105. Actually Absorbing A Go Release
    Update release branches to go1.N+1 iff:
    ● go1.N+1 has been released for ~3 months
    (go-release-cycle / 2).
    ● A released Kubernetes version uses
    go1.N+1 for at least a month.

    View full-size slide

  106. Actually Absorbing A Go Release
    Update release branches to go1.N+1 iff:
    ● go1.N+1 has been released for ~3 months
    (go-release-cycle / 2).
    ● A released Kubernetes version uses
    go1.N+1 for at least a month.
    ● Backported changes continue to pass
    compatibility checks between go1.N and
    go1.N+1.

    View full-size slide

  107. We’ve successfully absorbed a Go release!

    View full-size slide

  108. Acknowledgements
    Huge shoutout to Jordan Liggia and folks over at SIGs Architecture, Release and TesCng who
    make this happen release aber release!

    View full-size slide

  109. References
    1. KEP-3744: Stay on supported go versions
    2. Design Proposal: Extended backwards compatibility for Go
    3. Backward Compatibility, Go 1.21, and Go 2
    4. Design Proposal: Extended forwards compatibility for Go
    5. Go, Backwards Compatibility, and GODEBUG

    View full-size slide