Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Ask the Product Manager: Top 5 Bugs in September with Mike Barrett, Senior Director of Product Management

Ask the Product Manager: Top 5 Bugs in September with Mike Barrett, Senior Director of Product Management

Top 5 problems with Kubernetes and how we are fixing them with Mike Barrett

OpenShift is used by over 1,000 customers. Those customers call Red Hat support when they have questions. I'm going to take you through the top 5 issues that come up the most.

Red Hat Livestreaming

September 21, 2020
Tweet

More Decks by Red Hat Livestreaming

Other Decks in Technology

Transcript

  1. Ask the Product Manager:
    Top 5 Bugs in September
    Mike Barrett, Senior Director of Product Management
    September 21, 2020

    View Slide

  2. Envoy | Layer7 Inside Apps | mTLS | Cert Lifecycle | Application Patterns
    Quarkus | gRPC | Cloud Native Business Rules | Kafka | AI toolkits | Structureless Data
    Density | Complex Scheduling | Vertical Scaling | Eviction and Limit Automations | Problem
    Detection | Groups V2 | KMS to Vaults
    Self Compliance | Artifact Freshness | Tenant Level Observability | Storage to Backup Automations
    KubeVirt | RHCOS| Katacontainers | Bare Metal | Edge Formfactor | High Performance Networking
    Deeper Automations with Vendored Clouds Service Access | Networking | Routing | Machine Scaling
    Amazon Web Services Microsoft Azure Google Cloud
    IBM Cloud
    OpenStack
    Serverless Code & Event Based Merger with Integration Services
    Multi-Cluster & Multi-Vendor | Placement Policy | Configuration Enforcement | Governance | Compliance | Recovery
    API Management
    The Next 24 Months
    AWS OutPost
    Azure Arc
    Google Anthos
    IBM Cloud
    RHT Open Hybrid Cloud
    VMware Tanzu
    Extending IaaS via Network and Remote Control Points
    Pipelines | GitOps | Builds |
    Workspaces
    Autonomous Platform with Connected Intelligence

    View Slide

  3. 3
    Supported Releases for Binary Fixing (Patching)
    June, 2018 Kubernetes 1.11 4 months Oct, 2018 OpenShift 3.11 (Until June, 2022)
    Sept, 2019 Kubernetes 1.16 4 months Jan, 2020 OpenShift 4.3 (Until 4.6)
    Dec, 2019 Kubernetes 1.17 4 months April, 2020 OpenShift 4.4 (Until 4.7)
    March, 2020 Kubernetes 1.18 4 months July, 2020 OpenShift 4.5 (Until 4.8)
    Aug, 2020 Kubernetes 1.19 2 months Oct, 2020 OpenShift 4.6 (Until May 2022)
    https://kubernetes.io/docs/setup/release/version-skew-policy/

    View Slide

  4. Security fixes
    100s of defect and performance fixes
    200+ validated integrations
    Middleware & Storage integrations
    (container images, storage, networking, cloud services, etc)
    Enterprise lifecycle management
    Certified Kubernetes
    Kubernetes
    Release
    OpenShift
    Release
    1-4 months
    hardening
    What it takes to create an OpenShift Product Release
    https://bugzilla.redhat.com/
    https://issues.redhat.com/

    View Slide

  5. 5
    Why Trail the Upstream
    The Sweet Spot is 1 Release Behind for Production Level Support

    View Slide

  6. 6
    Sprint Releases
    Sprint Start Sprint Start Sprint Start Sprint Start Sprint Start
    Begin Next
    Release
    Begin RCM
    Process
    Deploy to
    dev-prev-prod
    Deploy to
    dev-prev-prod
    Release Start
    Note: Total number of sprints may vary by release
    Deploy to
    dev-prev-prod
    Stage 1
    Dependencies
    Due
    Deploy to
    dev-prev-prod
    Kube rebase
    delivered
    Kube rebase
    #2 delivered
    (if needed)
    Sprint Start
    Deploy to
    dev-prev-prod
    Feature
    Complete OCP GA
    No New Features/Bug Burn Down
    Code Freeze /
    Begin Final
    Regression

    View Slide

  7. 7
    Dec
    Kube 1.20
    Kube 1.21
    branch
    Jan
    Kube 1.20.z
    Mar
    Kube 1.21
    Kube 1.22 branch
    Upstream Fix
    Sept
    Kube 1.19.2
    Kube 1.20 branch
    Oct
    OCP 4.6
    Backport &
    Ship fix
    OCP 4.6.z
    z-stream
    Nov
    Kube 1.20
    Nov
    OKD 4.7
    Nightlies
    Backport Fix
    Kube 1.19.z
    If allowed
    Downstream Fix
    OKD 4.7
    Nightlies
    1
    2 3 4
    May
    OCP 4.8
    Apr
    Kube 1.21.z
    OpenShift Dedicated on OCP 4.7
    OpenShift Release and Example Fix
    Every 1 week
    Feb
    OCP 4.7
    Cherry Pick
    Back
    https://github.com/kubernetes/sig-release/blob/master/releases/patch-releases.md
    https://github.com/kubernetes/community/blob/master/contributors/devel/sig-release/release.md
    Bug Hits 4.6!

    View Slide

  8. 8
    Bug 1: After installation infra and audit index
    pattern not available in Kibana
    https://bugzilla.redhat.com/show_bug.cgi?id=1866619
    https://bugzilla.redhat.com/show_bug.cgi?id=1877414
    https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.19/#subjectaccessreviewspec-
    v1-authorization-k8s-io
    $ oc adm groups new test quicklab
    group.user.openshift.io/test created
    $ oc get group test
    NAME USERS
    test quicklab
    $ oc adm policy add-cluster-role-to-group cluster-admin test
    clusterrole.rbac.authorization.k8s.io/cluster-admin added: "test"
    $ oc whoami
    quicklab
    $ oc auth can-i get pods --subresource=log
    yes
    $ oc auth can-i get pods --subresource=log --as=quicklab
    no
    $ oc auth can-i get pods --subresource=log --as=quicklab --as-groups=test

    View Slide

  9. 9
    Bug 2: KubeAPIErrorsHigh firing on daily base
    but at random times
    https://bugzilla.redhat.com/show_bug.cgi?id=1748434
    https://github.com/kubernetes/enhancements/pull/1878
    https://bugzilla.redhat.com/show_bug.cgi?id=1877346
    https://github.com/kubernetes/kubernetes/issues/91073
    Had to do with handling a API server reboot or network outage better from a
    kubelet point of view.

    View Slide

  10. 10
    Bug 3: Machine Config Daemon Daemon Set
    does not set universal Toleration (and
    therefore gets booted if taints are set on a
    node)
    https://bugzilla.redhat.com/show_bug.cgi?id=1780318
    Had to do with remembering to place a toleration on your Kubernetes Operator’s
    operand node.

    View Slide

  11. 11
    Bug 4: Etcd cluster "etcd": 100% of requests for
    Watch failed on etcd instance :2379.
    grpc_service="etcdserverpb.Watch"
    https://bugzilla.redhat.com/show_bug.cgi?id=1677689
    https://github.com/etcd-io/etcd/pull/11375
    https://github.com/etcd-io/etcd/pull/12196
    More conclusively determine that a leader has actually been lost before propagating a
    ErrGRPCNoLeader error.

    View Slide

  12. 12
    Bug 5: Unable to provision vSphere volume
    https://bugzilla.redhat.com/show_bug.cgi?id=1821280
    https://github.com/kubernetes/kubernetes/pull/93971
    https://github.com/kubernetes/kubernetes/pull/90836
    When the vSphere Kubernetes secret is updated that is used by the dynamic
    storage provider it doesn’t pick up the new secret and fails to create the PV.

    View Slide

  13. linkedin.com/company/red-hat
    youtube.com/user/RedHatVideos
    facebook.com/redhatinc
    twitter.com/RedHat
    Thank you
    13

    View Slide