$30 off During Our Annual Pro Sale. View Details »

OpenShift Updates and Release Process

Rob
August 17, 2020

OpenShift Updates and Release Process

Rob

August 17, 2020
Tweet

More Decks by Rob

Other Decks in Technology

Transcript

  1. Understanding over-the-air capabilities
    OpenShift Updates and
    Release Process
    Rob Szumski
    OpenShift Product Management
    @robszumski
    Scott Dodson
    OpenShift Engineering
    @sdodson
    1

    View Slide

  2. Each OpenShift release
    is a collection of Operators
    ● 30 Operators run every major part of the
    platform:
    ○ Console, Monitoring, Authentication,
    Machine management, Kubernetes
    Control Plane, etcd, DNS, and more.
    ● Operators constantly strive to meet the
    desired state, merging admin config and Red
    Hat recommendations
    ● CI testing is constantly running install,
    upgrade and stress tests against groups of
    Operators

    View Slide

  3. OpenShift release cadence
    Stream of updates that transitions from full feature development to critical bugs
    x.1
    MAY JUN JUL AUG SEP OCT NOV DEC JAN FEB MAR APR MAY
    N release
    Full support, RFEs, bugfixes, security
    N-2 release
    OTA pathway to N release, critical bugs and security
    ● Z-stream releases weekly
    ○ New installer binary
    ○ Over-the-air upgrade package
    ○ Release notes published
    ○ Errata notice published
    ● Active development happens while
    the Y-stream is the latest
    ● Critical bugs and security remain
    fixed for the entire duration, including
    backports
    ● Each release includes Kubernetes
    software and RHCOS node software
    x.1.2 x.1.24

    View Slide

  4. Upgrading weekly z-stream releases
    What to expect when maintaining your clusters with the latest security patches
    ● Updates can be driven by Console or
    programmatically through API
    ● Upgrades happen in place, there is no
    re-provisioning of Nodes
    ● Apps using Kubernetes HA features
    should not have downtime
    ● Pods typically do not need to be
    rescheduled, although all Nodes will
    reboot in a serial fashion
    ● All user sessions will be reset
    ● Update duration is dependent on the
    size of the cluster and how long Pods
    take to evict themselves from your
    Nodes

    View Slide

  5. Connected Clusters
    Cluster’s are given a set of happy paths through different versions
    Admin
    Quay.io
    Container
    Registry
    Connected
    OpenShift Cluster
    Red Hat sourced
    update image
    OpenShift
    Update Service
    Red Hat sourced
    update graph
    (Cincinnati protocol)
    Select desired version
    from available options

    View Slide

  6. OpenShift release channels
    Gain control over the pace of over-the-air updates
    ● Best mechanism for
    testing compatibility
    with bleeding edge
    versions of OpenShift
    ● Can include versions for
    which there is no
    recommended update
    path
    candidate-4.5 fast-4.5 stable-4.5
    ● Always contains GA
    versions of OpenShift
    ● Fastest pace channel
    ● Use on at least 1
    production cluster to
    catch issues specific to
    you
    ● Always contains GA
    versions of OpenShift
    ● Slower paced channel
    ● Released after stability
    looks good on fast
    ● May lag fast during the
    first weeks of a new
    y-stream release by
    design
    Read more: documentation on updating your cluster
    GitHub: Look at the channel source data

    View Slide

  7. Feedback through CI
    Release
    Candidates
    GA Update
    Signing
    Final
    Testing
    GA Build
    Dev & CI
    Extra focus on
    upgrade testing
    Version number
    born here
    Release pulled
    if tests fail
    Feedback through telemetry
    Feedback through support cases
    Feedback through bugs
    Promote to
    candidate channel
    Promote to
    fast channel
    Promote to
    stable channel
    Extra focus on
    real-world envs
    Pulled for real-world
    errors found outside CI
    Errata & docs
    published
    Extra focus on
    upgrade errors &
    platform stability
    Edges blocked for bug count, upgrade
    error rates or degraded Operator health
    Extra focus on
    workload stability
    Promotions for
    Z-stream = ~2 days
    Y-stream = ~weeks
    Red Hat & Partner Testing Pre-Release General Availability
    OpenShift release process

    View Slide

  8. Overlap of OCP support lifecycles
    A rolling N-2 support window keeps you secure and up to date with Kubernetes
    x.2
    x.3 EUS extended support period for an EUS
    x.4
    x.5
    x.6
    x.1
    Year 1 Year 2 Year 3
    MAY JUN JUL AUG SEP OCT NOV DEC JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC JAN FEB MAR APR MAY JUN JUL AUG
    N release
    Full support, RFEs, bugfixes, security
    N-2 release
    OTA pathway to N release, critical bugs and security
    Upgrade window

    View Slide

  9. Upgrading 4.4 to 4.5 using channels
    Cluster admins are always in control of when clusters update
    fast-4.4
    fast-4.5
    stable-4.5
    Current Channel
    Running 4.4.11
    Do I want to remain on fast or
    go to stable for 4.5?
    Next Channel

    View Slide

  10. How is this done safely?
    Red Hat curates the best sequence of updates through a graph database
    $ curl -sH 'Accept: application/json'
    'https://api.openshift.com/api/upgrades_info/v1/g
    raph?channel=fast-4.4' | jq -r
    '[.nodes[].version] | sort | unique[]'
    4.3.12
    4.3.13
    4.3.18
    4.3.19
    4.3.21
    4.3.22
    4.3.23
    4.3.25
    4.3.26
    4.3.27
    4.3.28
    4.3.29
    4.4.10
    4.4.11
    4.4.12
    4.4.3
    4.4.4
    4.4.5
    4.4.6
    4.4.8
    4.4.9
    ● Paths are constantly being tweaked
    to get the best experience
    ● Feedback through telemetry, bugs
    and automated testing
    ● Paths can skip entire sections if safe
    ● Admins can force their way through
    the cluster’s protections, if desired
    Simple view Full update graph
    Versions that can upgrade to 4.4
    Choices to upgrade to within 4.4

    View Slide

  11. During the upgrade
    Control plane
    upgrades first
    Mixed versions
    of Operators
    is expected

    View Slide

  12. Blocked edges
    Paths that are unsafe can be “blocked” to force clusters through a safer alternative
    Skip Bug in 4.4.11
    Issue identified, no fix
    X
    ● Goal 1: route around the versions with
    bugs
    ● Goal 2: provide remediation for
    impacted clusters
    ● Threshold for blocking an edge can
    be low if it’s widespread or rare but
    severe.
    ● OpenShift fleet gets smarter every
    day
    X

    View Slide

  13. Threshold for blocking edges
    You’ll see a common set of questions on Bugzilla
    Who is impacted? If we have to block upgrade edges based on this issue, which edges would need blocking?
    example: Customers upgrading from 4.y.Z to 4.y+1.z running on GCP with thousands of namespaces, approximately 5% of
    the subscribed fleet
    example: All customers upgrading from 4.y.z to 4.y+1.z fail approximately 10% of the time
    What is the impact? Is it serious enough to warrant blocking edges?
    example: Up to 2 minute disruption in edge routing
    example: Up to 90seconds of API downtime
    example: etcd loses quorum and you have to restore from backup
    How involved is remediation (even moderately serious impacts might be acceptable if they are easy to mitigate)?
    example: Issue resolves itself after five minutes
    example: Admin uses oc to fix things
    example: Admin must SSH to hosts, restore from backups, or other non standard admin activities
    Is this a regression (if all previous versions were also vulnerable, updating to the new, vulnerable version does not
    increase exposure)?
    example: No, it’s always been like this we just never noticed
    example: Yes, from 4.y.z to 4.y+1.z Or 4.y.z to 4.y.z+1

    View Slide

  14. Encountering a blocked edge
    What should I do if I am faced with a blocked edge?
    Already running version X
    ● You’re running a supported release.
    ● Red Hat is committed to supporting any debugging,
    recovery, and mitigation which may be required to
    get you through the update.
    ● Not every [cluster × platform × workload × config]
    hits every issue
    ● Typical: bugs are fixed and a new path is published
    for you to follow
    ● Less common: mechanisms in place to force your
    way through, after testing to understand the
    ramifications. When in doubt, ask Red Hat.
    Desire to run version X
    ● Blocked edges don’t affect upgrades once started
    ● Always test it out in your test environment
    ● Understand bugs, errata and content within the
    release
    ● Once ready, find the image pull spec, ie:
    https://access.redhat.com/errata/RHBA-2020:1393
    $ oc adm upgrade --help
    ...
    Options:
    --allow-explicit-upgrade=false
    --allow-upgrade-with-warnings=false
    # use CLI to grab release info
    $ oc adm release info
    quay.io/openshift-release-dev/ocp-release:4.3.12-x86_64 | grep
    'Name:\|OS/Arch:\|Pull From:'
    Name: 4.3.12
    OS/Arch: linux/amd64
    Pull From:
    quay.io/openshift-release-dev/ocp-release@sha256:75e8f20e9d5a8fcf5b
    ba4b8f7d17057463e222e350bcfc3cf7ea2c47f7d8ba5d
    # upgrade to the content within the image
    $ oc adm upgrade --allow-explicit-upgrade --to-image
    quay.io/openshift-release-dev/ocp-release@sha256:75e8f20e9d5a8fcf5b
    ba4b8f7d17057463e222e350bcfc3cf7ea2c47f7d8ba5d

    View Slide

  15. Disconnected Clusters
    Designed to give you the same automation as connected clusters
    Admin
    Local Container Registry
    Local Copy of
    Update Image
    Disconnected
    OpenShift Cluster
    Red Hat sourced
    Update Image
    Mirrored to local
    registry
    Cluster updated locally
    Same as connected
    Release images & signatures
    Release notes & bugs
    Click button in GUI or upgrade via API
    Monitoring progress
    Debugging issues
    Mirroring commands
    Point CRI-O at internal registry
    instead of quay.io
    Same as connected
    Unique to disconnected
    Quay.io
    Container
    Registry
    OpenShift
    Update Service

    View Slide

  16. Check the upgrade paths in OpenShift Update Service
    ● Narrow your selection of versions to those that have upgrade paths from your current OpenShift version
    ● Coming Soon! A webpage to guide you through this
    Understand any bugs that may be open against candidate versions
    ● Ask your Technical Account Manager for advice and bugs they may be tracking for you specifically
    ● Use BugZilla advanced search for your desired y-stream version
    Review roadmap for your compute platforms, storage providers and networking plugins
    ● Enhancements maybe be coming that would make sense to integrate into an upgrade cycle, especially if it takes a
    longer amount of time to qualify a release
    Choosing a release to qualify for your clusters
    Disconnected clusters require a human to provide input alongside Update Service
    $ git clone https://github.com/openshift/cincinnati.git && cd hack
    $ curl -sH 'Accept:application/json' 'https://api.openshift.com/api/upgrades_info/v1/graph?channel=fast-4.4' |
    ./graph.sh | dot -Tpng >graph-fast-4.4.png

    View Slide

  17. Pinch points between Y-streams
    Typically you must be on the last few releases to move to another y-stream
    4.3 4.4 4.5
    Simple view of upgrade paths Collapsing down to 4.3.28
    Early z-streams are
    typically serial,
    creating wider
    branches
    Later z-streams
    collapse back down as
    graph is enhanced
    with feedback
    Narrow paths between
    y-streams increases
    quality, these are highly
    tested

    View Slide

  18. FAQ
    Sometimes builds don’t make it into a channel. Why?
    ● Once a version number is minted, we don’t ever reuse it
    ● If a build has issues that are found immediately after it is built, it will not be promoted to any channel
    ● This can happen with the first release of a new y-stream
    ○ 4.4.0-4.4.2 had issues, 4.4.3 became the first GA release on that channel
    ○ 4.5.1 has issues discovered related to RHCOS
    4.4.4 is in stable and 4.4.5 isn't, is 4.4.5 safe for us to start using?
    ● All releases in fast are GA, just like stable. The only difference is timing.
    ● You should be testing out newer releases on test and staging clusters.
    ● This is how you will find issues specific to your environment before it rolls out more widely.
    Why do we have to upgrade to 4.3.18 before we can go to 4.4?
    ● Yes, the later releases on a channel are required to upgrade to the next y-stream
    ● This reduces the paths, which increases focus and quality
    ● Later, more paths may be added as the upgrade looks healthy and more testing is done
    Why is there no release at the expected time?
    ● Rarely, a release is skipped for build issues
    ● Red Hat maintains at least 3 different y-streams, that are all shipping upgrades. Delaying one typically
    delays another.
    ○ If there is no critical security content, we rather skip than delay
    Can I skip a release during an upgrade? Go from 4.3 to 4.5?
    ● No, you will need to go through a 4.4 release even if it is run for a short amount of time
    ● Kubernetes is going to be making several changes/migrations that will need to be made
    ○ Many API’s moving to stable that require migrations
    ○ Storage and other plugins moving from in-tree to out-of-tree with migrations

    View Slide

  19. More Resources
    ● Blog posts about upgrades
    ○ https://www.openshift.com/blog/red-hat-openshift-cluster-upgrades-and-application-operator-updates
    ● OpenShift Continuous Integration and Testing
    ○ https://openshift-release.apps.ci.l2s4.p1.openshiftapps.com/
    ● Production Cincinnati endpoint
    ○ https://api.openshift.com/api/upgrades_info/v1/graph

    View Slide

  20. linkedin.com/company/red-hat
    youtube.com/user/RedHatVideos
    facebook.com/redhatinc
    twitter.com/RedHat
    Red Hat is the world’s leading provider of enterprise
    open source software solutions. Award-winning
    support, training, and consulting services make
    Red Hat a trusted adviser to the Fortune 500.
    Thank you
    20

    View Slide