Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Ask an OpenShift Admin Episode 61: Service Mesh

Ask an OpenShift Admin Episode 61: Service Mesh

These slides were used during episode 61 of the Ask an OpenShift Admin livestream. For more information, see the stream here: https://www.youtube.com/watch?v=Mxcp1F4bJNE.

Do you need better insight into how your application services are communicating with each other? Do you wish you had the ability to fully implement blue/green deployment scenarios with little administrative overhead?

Red Hat Service Mesh gives OpenShift Administrators insights to their clusters with built-in observability and traceability features. They can control traffic distribution to their applications and operate in a true blue/green deployment scenario. Red Hat Service Mesh provides all of these capabilities without the need to change the application code!

Red Hat Livestreaming

March 17, 2022
Tweet

More Decks by Red Hat Livestreaming

Other Decks in Technology

Transcript

  1. penShift Service Mesh
    rtwin Schneider
    Principal Technical Marketing Manager, Red Hat
    1

    View Slide

  2. OpenShift Service Mesh
    Contents
    ● What is a Service Mesh?
    ● Why do I need a Service Mesh at all?
    ● Key Capabilities and Usage Scenarios
    ● OpenShift Service Mesh and how can I get it?
    ● What type of apps is it for?
    ● Should I care about a Service Mesh? Personas
    ● What is the Overhead?
    ● Service Mesh across clusters - Mesh Federation
    ● Roadmap & FAQ

    View Slide

  3. What is a Service Mesh?
    “Proxies and a Control Plane”

    View Slide

  4. What is a Service Mesh ?
    A programmable network?!
    ● ... a bunch of userspace proxies as sidecar next to your services
    ● ... a controlplane with management components managing the proxies and providing an API
    ● ... proxies intercept calls and “do” something with them
    ● … the proxies are Layer 7-aware and act as proxies and reverse proxies

    View Slide

  5. Istio Service Mesh - Architecture

    View Slide

  6. Connecting Services within the Mesh
    ● All service pods are given an Envoy proxy as a sidecar
    container. Together, these form the Data Plane.
    ● All communications occur through these proxies.
    ● This creates a mesh of communication that has full visibility
    and control of all traffic.
    ● The proxies - and thus the mesh, are configured and
    managed by a central Control Plane.
    Service A
    Envoy
    Proxy
    Service B
    Envoy
    Proxy
    Service C
    Envoy
    Proxy
    Control Plane

    View Slide

  7. What is a Service Mesh ?
    An Abstraction of Microservice Connectivity
    ● ... a strict control mechanism over the communication of a set of microservices
    ● ... a ‘firewall’ and a ‘router’ for incoming requests
    ● ... it is completely abstracted and invisible to the microservices themselves
    ● … it helps transition monolithic applications to distributed microservice architecture

    View Slide

  8. Key Capabilities
    of a Service Mesh

    View Slide

  9. Service Mesh - Key capabilities
    ● Traffic Management
    ○ Control the flow of traffic and API calls between services
    ○ Make calls more reliable
    ○ Make the network more robust in the face of adverse conditions
    ○ Give applications greater flexibility for deployment
    ● Observability
    ○ Understand the dependencies between services
    ○ Identify the nature and flow of traffic between them services
    ○ Quickly identify issues
    ○ Observe and demonstrate traffic flow and communication timing

    View Slide

  10. Service Mesh - Key capabilities
    ● Policy Enforcement
    ○ Apply organizational policy to the interaction between services
    ○ Ensure access policies are enforced and resources are fairly distributed among consumers
    ○ Policy changes are made by configuring the mesh, not by changing application code
    ● Service Identity and Security
    ○ Provide services in the mesh with a verifiable identity
    ○ Protect service traffic as it flows over networks of varying degrees of trust

    View Slide

  11. Why do I need Service Mesh?

    View Slide

  12. Developing Microservices
    A Common Pattern
    ● A common pattern when developing microservices.
    ● In Development:
    ○ New services are written.
    ○ They are tested locally - looks good!
    ○ The are tested in a staging cluster - looks good!
    ● Ship it!
    Service A Service B
    Service C
    Gateway








    View Slide

  13. Microservice in Production
    A Common Pattern
    Service A Service B
    Service C
    Gateway
    X
    ?
    ?
    ?
    ?
    ?
    ?
    ?
    ● In production, things become less predictable:
    ○ Sporadic delays and failures are seen.
    ○ Performance is not as expected.
    ○ Security holes may be discovered.
    ○ Fixes are made, but upgrades cause further issues.
    ● Microservices are distributed systems and troubleshooting
    distributed systems is hard.

    View Slide

  14. The Fallacies of Distributed Computing
    Microservices are Distributed Systems
    Service A Service B
    Service C
    Gateway
    ?
    ?
    ?
    ?
    ?
    ?
    ?
    ?
    ● These challenges are a result of the fallacies of distributed
    computing:
    ○ The network is reliable.
    ○ Latency is zero.
    ○ Bandwidth is infinite.
    ○ The network is secure.
    ○ Topology doesn't change.
    ○ There is one administrator.
    ○ Transport cost is zero.
    ○ The network is homogeneous.

    View Slide

  15. Why Service Mesh?
    Solving Microservices Challenges with Code
    ● These challenges are often mitigated with:
    ○ Code to handle failures between services.
    ○ Logs, metrics and traces in source code.
    ○ 3rd party libraries for managing deployments,
    security and more.
    ● A wide range of open source libraries exist to
    managing these challenges (Netflix are best known)
    ● This results in:
    ○ Different solutions in different services.
    ○ Boilerplate code.
    ○ New dependencies to keep up date.
    Every Service
    Service
    ...and more boilerplate code.
    Traffic Management Code
    Failure Handling Code
    Metrics & Tracing Code
    Security Code
    Container Platform

    View Slide

  16. Why Service Mesh?
    An Abstraction for Microservice Challenges
    ● Service Mesh solve distributed systems
    challenges at a common infrastructure
    layer.
    ● This reduces boilerplate code and
    copy/paste errors across services.
    ● Enforces common policies across all
    services.
    ● Removes the obligation to implement
    cross cutting concerns from developers.
    Service
    Container Platform
    Service Mesh
    Services
    Without Service Mesh
    Services
    With Service Mesh
    Service
    ...and more boilerplate code.
    Traffic Management Code
    Failure Handling Code
    Metrics & Tracing Code
    Security Code
    Container Platform

    View Slide

  17. What type of app is it for?
    “Microservices, Monolithic, Serverless …”

    View Slide

  18. What type of apps ?
    ● Microservices? Yes! That is what it’s “made” for
    ● Monolithic? Yes, but …
    ● Serverless? Yes
    ● Jobs? Yes, but …
    ● Event based, Kafka, Message Brokers? Yes, but …
    ● Non containerized? VMs? Bare Metal? Yes

    View Slide

  19. OpenShift Service Mesh and
    how can I get it?

    View Slide

  20. OpenShift Service Mesh
    OpenShift Service Mesh
    ● Based on the upstream Istio.io project, and maintained as the
    downstream Maistra.io project.
    ● Built on the upstream Istio.io, though not the bleeding edge:
    ○ Red Hat performs validation and QA on upstream Istio
    releases to ensure they are ready for production support.
    ○ Fixes and enhancements are contributed to upstream Istio.
    ○ Maistra.io maintains a unique set of features for OpenShift
    Service Mesh customers.
    ○ OpenShift Service Mesh 2.1 is based on Istio 1.9.

    View Slide

  21. OpenShift Service Mesh
    Next generation service management through Open source software
    The service mesh - traffic management and control
    User interface for service communication visualisation
    Analytics and timing information for service communication

    View Slide

  22. Management, Monitoring & Observability
    ● OpenShift Service Mesh includes a baked in stack for
    management, monitoring and observability:
    ○ Kiali, with its topology view can be used to observe,
    manage and troubleshoot the mesh.
    ○ Grafana and Prometheus provide out of the box
    metrics and monitoring for all services.
    ○ Jaeger and ElasticSearch capture distributed
    traces providing a “per request” view for isolating
    bottlenecks between services.

    View Slide

  23. What's New in OpenShift 4.10
    23
    • Service mesh | Serverless
    • Builds | CI/CD pipelines
    • GitOps | Distributed Tracing
    • Log management
    • Cost management
    • Languages and runtimes
    • API management
    • Integration
    • Messaging
    • Process automation
    • Databases | Cache
    • Data ingest and preparation
    • Data analytics
    • AI/ML
    • Developer CLI | IDE
    • Plugins and extensions
    • CodeReady workspaces
    • CodeReady containers
    Developer services
    Developer productivity
    Kubernetes cluster services
    Install | Over-the-air updates | Networking | Ingress | Storage | Monitoring | Log forwarding | Registry | Authorization | Containers | VMs | Operators | Helm
    Linux (container host operating system)
    Kubernetes (orchestration)
    Physical Virtual Private cloud Public cloud Edge
    Cluster security Global registry
    Multicluster management
    Data services*
    Data-driven insights
    Application services*
    Build cloud-native apps
    Platform services
    Manage workloads
    * Red Hat OpenShift® includes supported runtimes for popular languages/frameworks/databases. Additional capabilities listed are from the Red Hat Application Services and Red Hat Data Services portfolios.
    ** Disaster recovery, volume and multicloud encryption, key management service, and support for multiple clusters and off-cluster workloads requires OpenShift Data Foundation Advanced
    Observability | Discovery | Policy | Compliance |
    Configuration | Workloads
    Image management | Security scanning |
    Geo-replication Mirroring | Image builds
    Declarative security | Container vulnerability
    management | Network segmentation |
    Threat detection and response
    RWO, RWX, Object | Efficiency |
    Performance | Security | Backup |
    DR Multicloud gateway
    Cluster data management
    Red Hat open hybrid cloud platform

    View Slide

  24. Connect, Secure, Control and Observe Services on OpenShift
    ● A software infrastructure layer between
    Kubernetes and your services for
    managing communications.
    ● Handles common “microservice”
    challenges, so that developers don’t
    have to:
    ○ Security
    ○ Monitoring & Observability
    ○ Application Resilience
    ○ Upgrades, Rollouts & A/B Testing
    ○ And more...
    Product Manager: Jamie Longmuir and Mauricio "Maltron" Leal
    OPENSHIFT
    OpenShift Service Mesh
    Istio Jaeger
    Red Hat Enterprise Linux CoreOS
    Physical Virtual Private cloud Public cloud
    Services
    F
    * Eventing is currently in Technology Preview
    ** Functions are currently a work in progress initiative
    Kiali
    OpenShift Service Mesh
    Envoy Envoy Envoy

    View Slide

  25. Installation & Management
    ● OpenShift Service Mesh is Operator driven, installed and
    upgraded via OpenShift’s OperatorHub.
    ● A custom resource (CRD) called a ServiceMeshControlPlane
    is used for configuring control plane components, including:
    ○ Number of replicas (for a highly available Control Plane)
    ○ Resource requests
    ○ Node affinity
    ○ and more...
    ● ServiceMeshMemberRoll and ServiceMeshMember
    resources configure which projects are part of the mesh.

    View Slide

  26. ● OpenShift Service Mesh provides a multi-tenant
    topology where multiple service meshes are deployed
    within a single OpenShift cluster.
    ● A mesh consists of one or more projects (namespaces).
    ● Each mesh is isolated and managed independently.
    ● Communication between meshes involves configuring
    one or more Gateways, as you would for accessing
    external services.
    Service A
    Service B
    Service Mesh: foo.com
    Service C
    Service D
    Service Mesh: bar.com
    Control Plane Control Plane
    Project: foo-istio-system Project: bar-istio-system
    Project: foo Project: bar
    Multi-Tenant Service Mesh

    View Slide

  27. Service Mesh with OpenShift Routes
    OpenShift Service Mesh
    ● In Service Mesh, an Ingress Gateway is used for
    accessing services within the Mesh.
    ● The Ingress Gateway is a standalone Envoy proxy
    that acts as an entry point into the mesh.
    ● In OpenShift, a route* acts as an entrypoint into
    the cluster, backed by HAProxy.
    ● OpenShift Service Mesh automatically creates and
    configures routes when Ingress Gateways are
    created.
    * OpenShift also supports Kubernetes Ingress (which were inspired by routes)
    and Red Hat is an active contributor in the next generation of Ingress - Service
    APIs.
    Ingress
    Gateway
    Service A Service B
    Service C
    OpenShift
    Route
    OpenShift Cluster Network

    View Slide

  28. Security & Compliance
    OpenShift Service Mesh
    ● Reduced Permissions for Service Mesh administration:
    ○ Upstream Istio requires users to have elevated
    privileges to manage a Service Mesh.
    ○ In OpenShift, the Service Mesh Operator performs
    privileged operations on behalf of individual mesh
    installations.
    ○ Control Plane and Data Plane components require no
    elevated permissions to be granted to users.
    ○ Service Mesh components only have visibility within
    their mesh namespaces
    ○ This reduces the level of permissions required to
    manage a service mesh, controlled using kubernetes
    RBAC.
    OpenShift Cluster
    Service
    Mesh
    Service
    Mesh
    Service
    Mesh
    Cluster
    Admin
    Service Mesh Operator
    Mesh
    User
    Mesh
    User
    Mesh
    User

    View Slide

  29. Security & Compliance
    OpenShift Service Mesh
    ● OpenSSL Encryption:
    ○ OpenShift Service Mesh uses RHEL’s OpenSSL
    library in place of the BoringSSL library used by
    upstream Istio.
    ○ OpenSSL is the standard cryptographic library
    within Red Hat, supported by the RHEL team at
    Red Hat.
    ○ Facilitates FIPS compliance, taking advantage
    of the OpenSSL FIPS Object Module

    View Slide

  30. Service Mesh with API Management
    OpenShift Service Mesh
    ● 3Scale is Red Hat’s API Management solution that
    makes it easy to share, secure, distribute, control and
    monetize your APIs.
    ● Available as both a hosted SaaS offering, as well as on
    premises.
    ● 3Scale integrates directly with OpenShift Service
    Mesh:
    ○ As of 2.0, this integration uses Istio’s Mixer
    component (deprecated)
    ○ As of Service Mesh 2.1, this will use a
    WebAssembly Extension plugin.

    View Slide

  31. Difference to Upstream Istio?

    View Slide

  32. CONFIDENTIAL
    Integrations with OpenShift
    components such as
    OperatorHub, OpenShift
    Routes and 3Scale API
    Management.
    OpenShift
    Integrations
    OpenShift Service Mesh
    OpenShift Service Mesh vs Istio
    Multiple meshes securely
    deployed within the same
    cluster, with each mesh
    isolated and managed
    independently.
    Multi-Tenant
    Architecture
    Pre-configured Kiali, Jaeger
    and Grafana for simplified
    management, monitoring
    and observability.
    Management, Monitoring &
    Observability
    Control Plane and Data
    Plane components execute
    with standard privileges.
    OpenSSL for FIPS
    compliance.
    Security & Compliance
    Focus
    Additions for Red Hat’s Enterprise & Public Sector Customers

    View Slide

  33. What is the difference between Istio and OSSM?
    33
    ○ For automatic injection, we use annotation on the deployment instead of the
    namespace label that upstream uses. All services to be included in the mesh must have
    these annotations.
    ○ We add network policies which change the network behavior - restricting traffic from
    outside of the mesh, and opening traffic inside the mesh (to be managed by Mesh
    policies). This feature can optionally be disabled.
    ○ We replaced MeshPolicy with ServiceMeshPolicy and ClusterRbacConfig with
    ServicemeshRbacConfig.
    ○ We are multi-tenant by default. We have a servicemesh member roll/service mesh
    control plane that would need to be configured with the projects that are to be included
    in the mesh. We are exploring a cluster-wide installation option similar to upstream for
    2.3/2.4.

    View Slide

  34. Service Mesh Scenarios
    “Connect, secure, observe and control traffic”

    View Slide

  35. Service Mesh in operation
    With service mesh
    External user
    UI
    Service
    Sidecar
    Container
    G VS
    Gateway
    Virtual
    service
    Application
    Container
    DR
    Destination
    rule
    Istiod
    External user
    UI
    Service
    Route
    FQDN
    Application
    Container
    Without service mesh

    View Slide

  36. Gateway and virtual service
    36
    ● Gateway
    ○ Ingress controller provided as part of the Istio control plane
    ○ Standalone Envoy proxy at the edge of the mesh
    ○ Load balancing for incoming traffic (layer 4 - 6)
    ● Virtual Service
    ○ Configuration of routing requirements
    ○ Operates alongside destination rules
    ○ Distributed to the sidecar proxy containers by istiod (control plane)
    ○ Fine grained control of traffic management
    Layer 1 - physical structure
    Layer 2 - Data link (frames)
    Layer 3 - IP packets
    Layer 4 - Transport (TCP/UDP)
    Layer 5 - Session (API’s )
    Layer 6 - Presentation (SSL)
    Layer 7 - Application (http)
    Virtual
    service
    OSI Stack
    Gateway

    View Slide

  37. Virtual service example - Traffic control
    37
    apiVersion: networking.istio.io/v1alpha3
    kind: VirtualService
    metadata:
    name: layer2-a
    spec:
    hosts:
    - layer2-a
    http:
    - match:
    - uri:
    prefix: /call-layers
    - uri:
    exact: /get-info
    route:
    - destination:
    host: layer2-a
    port:
    number: 8080
    subset: inst-1
    weight: 80
    - destination:
    host: layer2-a
    port:
    number: 8080
    subset: inst-2
    weight: 20
    timeout: 1.500s
    Http uri matching using ‘prefix’ and ‘exact’
    Prefix required for example /call-layers?key=value
    Route destination rules :
    80% to layer-2a inst-1
    20% to layer-2a inst-2
    Timeout of 1.5 seconds after which communication is abandoned

    View Slide

  38. Releasing Services
    ● Control all communications allows for fine-grained traffic
    control between services without source code changes or
    restarting services.
    ● Create multiple “subsets” of a service (e.g. different versions)
    to enable:
    ○ Canary Deployments - apply a small amount of traffic
    to a new subset using weights.
    ○ A/B Testing - Apply a fraction of traffic to a different
    service using weights.
    ○ Mirrored Launches - duplicate live traffic loads across
    services to see how a new service handles the real
    world.
    ○ Header based routing.
    Service A
    V1
    Service B
    Service C
    Ingress &
    Egress
    Control Plane
    Service A
    V2

    View Slide

  39. Securing Services
    ● As all communication is via the proxies, we can enforce and
    manage security policies across services without source
    code changes.
    ○ Enforce the use of mTLS encryption across all
    services.
    ○ Authenticate requests using JSON Web Token (JWT)
    validation.
    ○ Define service-to-service and user-to-service
    authorization policies.
    ■ Facilitate Zero-trust networking.
    ○ Secure the Service Mesh Control plane with RBAC
    policies.
    Service A Service B
    Service C
    Ingress &
    Egress
    Control Plane

    View Slide

  40. Monitoring & Observing Services
    ● As communication is via proxies, we have full visibility of all
    traffic without source code changes*.
    ○ Metrics and dashboards - request volumes, duration,
    success/failure rates, etc.
    ○ Distributed Tracing - identify bottlenecks in slow
    request paths.
    ○ OpenShift Service Mesh includes Kiali, Jaeger,
    Grafana, Prometheus, ElasticSearch.
    *Trace context propagation within services will require minor code changes.
    Service A Service B
    Service C
    Ingress &
    Egress
    Control Plane

    View Slide

  41. Securing Services - mTLS encryption
    ● Enforce and manage security policies across services without source code changes.
    POD
    SERVICE
    A
    ENVOY
    POD
    SERVICE
    B
    ENVOY
    POD
    SERVICE
    C
    ENVOY
    TLS TLS

    View Slide

  42. Building Resilient Services
    ● Timeout
    ○ Allow more time for the application to respond to the Envoy
    proxy
    apiVersion:
    networking.istio.io/v1alpha3
    kind: VirtualService
    metadata:
    name: layer2
    spec:
    hosts:
    - layer2
    http:
    - match:
    - uri:
    exact: /call-layers
    route:
    - destination:
    host: layer2
    port:
    number: 8080
    subset: v1
    retries:
    attempts: 5
    perTryTimeout: 10s
    POD
    SERVICE
    A
    ENVOY
    POD
    SERVICE
    B
    ENVOY
    POD
    SERVICE
    C
    ENVOY
    timeout: 10 sec
    retry: 5
    timeout: 15 sec
    retry: 5

    View Slide

  43. Testing Service Resilience
    ● Circuit Breaker configuration :
    ○ Set threshold limits beyond which the circuit breaker "trips"
    ○ Further traffic is prevented by the service mesh
    ○ Build resilience into applications
    ○ Provide a microservice with the time required to recover
    apiVersion:
    networking.istio.io/v1alpha3
    kind: DestinationRule
    metadata:
    name: userprofile
    spec:
    host: userprofile
    subsets:
    - name: v3
    labels:
    version: '3.0'
    trafficPolicy:
    connectionPool:
    http:
    http1MaxPendingRequests: 1
    maxRequestsPerConnection: 1
    outlierDetection:
    consecutiveErrors: 1
    interval: 1s
    baseEjectionTime: 10m
    maxEjectionPercent: 100
    POD
    SERVICE
    A
    ENVOY
    POD
    SERVICE
    B
    ENVOY
    POD
    SERVICE
    C
    ENVOY

    View Slide

  44. Testing Service Resilience
    ● Fault injection provides the ability to validate how
    services will perform when failures inevitably occur.
    ○ Examples:
    ■ 50% of requests to a service will fail with error code
    503.
    ○ Enables chaos engineering.
    apiVersion:
    networking.istio.io/v1alpha3
    kind: VirtualService
    metadata:
    name: userprofile
    spec:
    hosts:
    - userprofile
    http:
    - fault:
    abort:
    httpStatus: 503
    percentage:
    value: 50
    route:
    - destination:
    host: userprofile
    subset: v3
    POD
    SERVICE
    A
    ENVOY
    POD
    SERVICE
    B
    ENVOY
    POD
    SERVICE
    C
    ENVOY
    503 status
    in 50% of responses

    View Slide

  45. Testing Service Resilience
    ● Fault injection provides the ability to validate how
    services will perform when failures inevitably occur.
    ○ Examples:
    ■ 20% of services will be delayed by 5 seconds.
    ○ Enables chaos engineering.
    apiVersion:
    networking.istio.io/v1alpha3
    kind: VirtualService
    metadata:
    name: userprofile
    spec:
    hosts:
    - userprofile
    http:
    - fault:
    delay:
    fixedDelay: 5s
    percentage:
    value: 20
    route:
    - destination:
    host: userprofile
    subset: v3
    POD
    SERVICE
    A
    ENVOY
    POD
    SERVICE
    B
    ENVOY
    POD
    SERVICE
    C
    ENVOY
    5 sec delay
    in 30% of requests

    View Slide

  46. Extending Service Mesh
    ● The power of Envoy is that it is highly extensible with WebAssembly
    Extensions.
    ● WebAssembly is a format that allows extensions to be written in
    more than 15 programming languages.
    ● This will allow mesh operators to incorporate custom cross-cutting
    functionality at the proxy level.
    Service Service
    Service
    Ingress &
    Egress
    Control Plane

    View Slide

  47. Service Mesh Across Clusters
    OpenShift Service Mesh 2.1

    View Slide

  48. Upstream Istio offers multiple deployment models for multicluster services meshes.
    ● Multi-Primary, Primary-Remote, External Control Plane, etc.
    ● These assume that the mesh and cluster admins are part of the same administrative
    boundary - no multi-tenancy.
    ● These topologies connect the IstioD control planes and the Kubernetes API servers
    of all involved meshes and clusters - a security risk.
    Istio Multicluster Topologies
    Istiod B
    Kubernetes
    API Server B
    Istiod A
    Kubernetes
    API Server A
    Istiod C
    Kubernetes
    API Server C

    View Slide

  49. ● OpenShift Service Mesh’s Multi-Cluster Strategy aims to put security first and
    support multi-tenant environments.
    ● Multi-Tenant support has been a pillar of OpenShift Service Mesh from day 1.
    ● We have divided our multi-cluster approach into two categories:
    ○ Service Mesh Federation - securely connecting distinct service meshes
    across multiple clusters to enable sharing, load balancing and failover
    scenarios.
    ○ Multi-cluster Service Mesh - a single service mesh stretched across multiple
    clusters managed by a single control plane.
    OpenShift Service Mesh Multi-Cluster

    View Slide

  50. Service Mesh Federation - Topology
    50
    ● Each mesh remains distinct with its own
    control plane.
    ● Federated meshes may be in the same or
    different OpenShift clusters.
    ● All traffic between meshes is via
    configurable Ingress/Egress Gateways
    ○ Connectivity between Gateways is a
    prerequisite
    ● For multi-cluster, there is no need to
    connect with the Kubernetes API
    server.
    Service
    Service
    Service
    Service
    Control Plane (Istiod)
    Control Plane (Istiod)
    Gateway Gateway

    View Slide

  51. Service Mesh Federation - Administration
    51
    ● Provides a “need to know” model for
    multi-cluster service mesh
    ● Decisions around exposing services
    between meshes are delegated to
    mesh administrators.
    ● Services must explicitly be configured to
    be exported and visible to other meshes.
    ○ Including configuring trust domains
    between meshes.
    Service A
    Service B
    Service Mesh: foo.com
    Service C
    Service D
    Service Mesh: bar.com
    Control Plane (Istiod)
    Control Plane (Istiod)
    Gateway Gateway

    View Slide

  52. Federation: New Configuration
    52
    ● ServiceMeshPeer - Meshes are federated in
    pairs, and a MeshFederation is configured for
    each side of the pair.
    ○ Gateway Configuration
    ○ Root Trust Configuration
    ● ExportedServiceSet - configures which
    services to export for a given federation.
    ● ImportedServiceSet - configures which
    services to import from a given federation
    Service A
    Service B
    Service Mesh: foo.com
    Service C
    Service D
    Service Mesh: bar.com
    Control Plane (Istiod)
    Control Plane (Istiod)
    Gateway Gateway

    View Slide

  53. Federation: Imported Services
    53
    ● Once a service has been imported it can be managed as
    if it was a local service.
    ● By default, services will be identified by the remote
    mesh trust zone and namespace.
    ● Policies can then be created using the remote service
    identity, for example:
    ○ Authorization Policies
    ○ mTLS encryption
    ○ Routing rules for canary deployments, a/b testing,
    etc.
    ○ Observability (to the egress only)
    ○ Resilience & testing - timeouts, retries, circuit
    breakers, fault injection, etc.
    Service A
    Service B
    Service Mesh: foo.com
    bar.com/
    namespace/
    Service C
    Control Plane (Istiod)

    View Slide

  54. Federation: Multi-Mesh Services
    54
    ● Services can be configured to be imported as
    if they were actually local services using
    importAsLocal.
    ● If the service already exists, the endpoints of
    both services will be aggregated together
    as a single service.
    ● This can be used to load balance traffic
    between meshes and mitigate local failures
    with available remote endpoints.
    Service A
    Service B
    Service Mesh: foo.com
    Service A’
    Control Plane (Istiod)

    View Slide

  55. Service Mesh Federation in Kiali
    ● Kiali will display the local meshes as well as
    services imported from other meshes.
    ● Federated meshes may be in the same or
    different OpenShift clusters.
    ● Services in different namespaces and
    clusters are given different boxes.

    View Slide

  56. Service Mesh Personas?
    “Who is using the Mesh?”

    View Slide

  57. Service Mesh Personas
    OpenShift Service Mesh
    ● “Need to know” permissions for administration:
    ○ Cluster Admins:
    ■ Manage clusters and infrastructure.
    ■ Often central ops/infra group.
    ○ Mesh Admin(s):
    ■ Manage one or more Service Mesh(es) -
    application connectivity and security within
    the mesh.
    ■ Does not require cluster admin.
    ○ Service Admin(s):
    ■ Responsible for one or more services, though
    may not manage service mesh resources.
    OpenShift Cluster
    Service
    Mesh
    Service
    Mesh
    Service
    Mesh
    Cluster
    Admin(s)
    Service Mesh Operator
    Mesh
    Admin(s)
    Service
    Admin(s)

    View Slide

  58. ● Software engineer, focus on business logic
    ● Software Architect / Platform architect
    ● Software Architect / Platform architect not using Kubernetes but
    Microservices
    ● Platform Administrator
    ● Cluster Administrator
    Personas

    View Slide

  59. What is the Overhead?
    “Proxies and a Control Plane”

    View Slide

  60. ● The Envoy proxy uses 0.5 vCPU and 50 MB memory per 1000
    requests per second going through the proxy.
    ● Istiod uses 1 vCPU and 1.5 GB of memory.
    ● The Envoy proxy adds 3.12 ms to the 90th percentile latency.
    https://istio.io/latest/docs/ops/deployment/performance-and-scalability/
    Load test (1000 services 2000 sidecars)

    View Slide

  61. CPU consumption scales with the following factors:
    ● The rate of deployment changes.
    ● The rate of configuration changes.
    ● The number of proxies connecting to Istiod.
    Control plane performance

    View Slide

  62. ● Number of client connections
    ● Target request rate
    ● Request size and response size
    ● Number of proxy worker threads
    ● Protocol
    ● CPU cores (proxy 0.6 vCPU per 1000 requests per second)
    ● Number and types of proxy filters, specifically telemetry v2 related
    filters. (A large number of listeners, clusters, and routes can
    increase memory usage.)
    ● Inside the mesh, a request traverses the client-side proxy and then
    the server-side proxy. In the default configuration of Istio 1.6.8 (that
    is, Istio with telemetry v2), the two proxies add about 3.12 ms and
    3.13 ms to the 90th and 99th percentile latency, respectively, over
    the baseline data plane latency.
    Data plane performance

    View Slide

  63. P90 latency vs client connections (1.13)

    View Slide

  64. P99 latency vs client connections (1.13)

    View Slide

  65. Roadmap & FAQ

    View Slide

  66. What's New in OpenShift 4.10
    66
    OpenShift Service Mesh
    ▸ OpenShift Service Mesh 2.2 (ETA: April 2022) will be
    based on Istio 1.12 and Kiali 1.47+.
    ▸ Istio 1.12 introduces WasmPlugin API which will
    deprecate the ServiceMeshExtensions API introduced
    in 2.0.
    ▸ Service Mesh 2.1.1+ and 2.2 allows users to override and
    customize Kubernetes NetworkPolicy creation.
    ▸ Kiali updates in Service Mesh 2.2:
    ▸ Enhancements to improve viewing and navigating
    large service meshes
    ▸ View internal certificate information
    ▸ Set Envoy proxy log levels
    ▸ New Service Mesh Federation demo

    View Slide

  67. Is OpenShift Service Mesh FIPS compliant?
    67
    OpenShift Service Mesh is FIPS compliant and supported on a FIPS enabled OpenShift
    clusters. OpenShift Service Mesh achieves FIPS compliance by ensuring that all encryption is
    performed using the FIPS validated OpenSSL module
    https://csrc.nist.gov/projects/cryptographic-module-validation-program/certificate/3781 to
    perform all encryption (via dynamic linking).
    Note that as newer versions of RHEL are released, newer OpenSSL modules will need to go
    through NIST's extensive validation process, which can take up to 16 months.
    Thus, there may occasionally be lag between the latest version of OpenSSL being used with
    service mesh and full FIPS validation of the module.

    View Slide

  68. When will OSSM support IPv6 Dual Stack?
    68
    ○ Q12022 Status: We understand that IPv6 and dual stack is particularly desirable for large
    meshes and in particular telco use cases.
    ○ Upstream Istio provides “alpha” support for IPv6 on Kubernetes, but not dual-stack. To
    support Service Mesh with IPv6, we will need to be able to validate and document
    Service Mesh on an OpenShift cluster with IPv6 configured.
    ○ As of OCP 4.10, IPv6/dual-stack is supported on installer-provisioned bare metal clusters
    with OVN-Kubernetes. We will explore this feature in second half of 2022….

    View Slide

  69. Federation on ARO, ROSA, OSD, etc.
    ● When will Service Mesh federation be supported on managed OpenShift environments such as ARO,
    ROSA, OSD, etc.
    ○ Q12022 Status: To date, the service mesh team has not been able to test federation across
    these environments, nor do we have a time fame for doing so. The main area of concern is the
    configuration of the load balancers attached to the federation ingress gateways. Those need
    to be able to support raw tls traffic. Obviously our aim is to support federation between
    meshes in any OpenShift environment though. If a customer wants to federate meshes across
    managed OpenShift clusters, they should proceed to attempt it and document the steps you
    took to get there. These notes could then be helpful for us to build out our own docs and
    sufficient QE testing to declare support across these environments. This is not an issue that is
    current scheduled, and customer requests and field assistance will be needed to drive this
    forward.
    ● Product Issue: https://issues.redhat.com/browse/OSSM-693

    View Slide

  70. Single Control Plane Multi-cluster Topology
    ● When will Service Mesh support a single control plane managing a multi-cluster data plane?
    ○ Q12022 Status: Service Mesh 2.1 introduced federation of meshes across clusters, but does not
    provide a central control plane for managing data planes across clusters. Upstream Istio
    provides this functionality by opening communication between clusters Kubernetes API
    servers - creating a significant security opening, unless all of the involve clusters are trusted as
    part of the same admin domain. This is not something we are able to facilitate in the context of
    a multi-tenant service mesh. Red Hat’s Advanced Cluster Management has begun work on a
    solution for managing multiple federated meshes from a single control plane, and this may in
    the future evolve into support for a multi-cluster service mesh between trusted clusters (ie
    single tenant). ACM will be the channel for OpenShift’s multi-cluster service mesh
    management. For a single control plane, multi-tenant, multi-cluster mesh solution, customers
    may need to consider a partner solution - such as Solo.io’s Gloo Mesh or Kong’s service mesh.
    ● Product Issue:

    View Slide

  71. Navigating Scaling
    `
    Integrations
    Navigating Scaling
    `
    Integrations
    Navigating Scaling
    `
    Integrations
    Roadmap
    Product Manager: Jamie Longmuir
    Near Term - 2022 Q1
    OSSM 2.2
    Mid Term - 2022 Q2
    OSSM 2.3
    Long Term - 2022 Q3+
    OSSM 2.4+
    ● Continue to evolve federation for
    multi-cluster service mesh use cases
    ● More flexible integration with Network Policies
    71
    ● Internal improvements to increase release
    cadence - keeping closer to upstream Istio.
    ● Kiali enhancements for large meshes and
    federation
    ● OpenShift Console multi-cluster mesh admin
    ● Support (unmanaged) Service Mesh on Red
    Hat OpenShift on AWS (ROSA)
    ● Support for Service Mesh with OpenShift
    Virtualization (OCP 4.10)
    ● Update to Istio 1.12
    ● Continue to evolve Federation for multi-cluster
    service mesh use cases
    ● Cluster-wide installation option
    Service Mesh troubleshooting guide with Kiali
    ● Kiali enhancements for managing and validating
    federated Service Meshes.
    ● Service Mesh on external VMs
    ● IPv6 support
    ● Multi-cluster service mesh with ACM
    ● Continue to optimize performance and
    scalability of Istio and Envoy
    ● Kiali support for centralized multi-cluster service
    mesh
    ● Enhanced CLI support for Service Mesh
    ● Support Service Mesh and OpenShift
    Multi-Cluster management
    ● Gateway API (Ingress v2)
    ● Keep within 1 release of latest Istio
    ● Update to Istio 1.14+

    View Slide

  72. Service Mesh 2.2
    72
    ● Upgrade Istio to 1.12
    ● Internal enhancements to stay closer to upstream Istio over time
    ○ Release OSSM at most 2 release behind Istio
    ● Minor customer driven feature enhancements
    ● Target: Late Q1 2022

    View Slide

  73. Service Mesh 2.3
    73
    ● Upgrade Istio to 1.14+
    ● Candidate features:
    ○ Service Mesh on external VMs
    ○ Additional multi-cluster use cases
    ● Target: Late Q2 2022

    View Slide

  74. Service Mesh with External VMs
    ● Sometimes there are services outside of
    Kubernetes that you want to include in the
    Service Mesh.
    ● Examples:
    ○ Legacy services running on VMs or bare
    metal.
    ○ External datastores.
    ● Including these services in the mesh can
    provide the same security, observability and
    traffic management features as services
    within the cluster.
    Service A
    Service B
    Service Mesh: foo.com
    Service C
    Control Plane
    OpenShift Cluster
    Legacy Virtual Machine

    View Slide

  75. Late 2022: Additional Multi-Cluster Use Cases
    ● Additional use cases - such as a central
    logical control plane to manage a service
    mesh dataplane across multiple clusters.
    ● In conjunction with OpenShift’s Advanced
    Cluster Management (ACM).
    Service A
    Service B
    Service Mesh: foo.com
    Service C
    Service D
    Single Logical Service
    Mesh Control Plane
    Cluster 2
    Cluster 1
    Cluster

    View Slide

  76. linkedin.com/company/red-hat
    youtube.com/user/RedHatVideos
    facebook.com/redhatinc
    twitter.com/RedHat
    Thank You
    76

    View Slide

  77. Backup Slides
    BACKUP
    77

    View Slide

  78. Connecting Services Outside the Mesh
    ● External communication occurs via Gateway proxies, that
    are also part of the mesh.
    ● Ingress Gateways manage traffic entering the mesh.
    ○ An alternative to Kubernetes Ingress, with additional
    mesh features.
    ● Egress Gateways manage traffic exiting the mesh.
    ○ Can require all external services to be registered.
    ● On OpenShift, Service Mesh Ingress Gateways can be used
    in conjunction with an OpenShift route or on their own.
    Envoy
    Gateway(s)
    Service A
    Envoy
    Proxy
    Service B
    Envoy
    Proxy
    Service C
    Envoy
    Proxy
    Ingress &
    Egress
    Control Plane

    View Slide

  79. Gateway
    apiVersion: networking.istio.io/v1alpha3
    kind: Gateway
    metadata:
    name: layer1-gateway
    spec:
    selector:
    istio: ingressgateway # use istio
    default controller
    servers:
    - port:
    number: 80
    name: http
    protocol: HTTP
    hosts:
    - “*”
    ● Deployed to the Ingress envoy proxy pod within the istio control plane
    apiVersion: v1
    kind: Pod
    metadata:
    openshift.io/scc: restricted
    labels:
    app: istio-ingressgateway
    istio: ingressgateway
    name: istio-ingressgateway-6f8cf6c85f-5989d
    namespace: istio-system
    ● Listening on port 80 for http traffic
    ● Accepting connections from any host
    ● Can restrict access to specific hosts with a configuration such as :
    hosts:
    - myserver1.com
    - anotherserver.co.uk

    View Slide

  80. Virtual Service
    apiVersion:
    networking.istio.io/v1alpha3
    kind: VirtualService
    metadata:
    name: layers
    spec:
    hosts:
    “*”
    gateways:
    - layer1-gateway
    http:
    route:
    - destination:
    host: layer1
    port:
    number: 8080
    ● Defines a set of routing rules applied to a specific host
    ● Associated with a gateway into which the rules are applied
    ● spec.hosts - The destination service to which traffic is being sent
    ○ Matches the hosts specification for the gateway when a gateway is
    referenced
    ● Directs traffic to the identified Kubernetes destination service
    ○ host: layer1
    ● http section may contain matching rules for the desired conditions :
    ○ uri match
    ○ Timeout / retry / redirect / fault injection
    ○ Rewrite uri’s

    View Slide

  81. Virtual service and Destination rule
    apiVersion: .../v1alpha3
    kind: VirtualService
    metadata:
    name: layer2-a
    spec:
    hosts:
    - layer2-a
    http:
    route:
    - destination:
    host: layer2-a
    port:
    number: 8080
    subset: inst-1
    weight: 80
    - destination:
    host: layer2-a
    port:
    number: 8080
    subset: inst-2
    weight: 20
    apiVersion: .../v1alpha3
    kind: DestinationRule
    metadata:
    name: layer2-a
    spec:
    host: layer2-a
    subsets:
    - name: inst-1
    labels:
    instance: instance1
    - name: inst-2
    labels:
    instance: instance2
    apiVersion: v1
    kind: Service
    metadata:
    name: layer2-a
    labels:
    app: layer2-a
    spec:
    ports:
    - name: http
    port: 8080
    selector:
    app: layer2-a
    apiVersion: apps/v1
    kind: Deployment
    metadata:
    mame: Deployment-1-inst1
    labels:
    app: layer2-a
    instance: instance1
    apiVersion: apps/v1
    kind: Deployment
    metadata:
    mame: Deployment-1-inst2
    labels:
    app: layer2-a
    instance: instance2
    80%
    20%
    VS
    Virtual
    service
    DR
    Destination
    rule
    UI
    Service
    Deployment -1
    Instance 1
    Instance 2

    View Slide

  82. Kiali service visualisation
    82
    ● Graphical representation of service connectivity
    ● Display options
    ○ Applications, services, versioned applications, workload (no application grouping)
    ○ Label connectors by percentage traffic, request per second, response time
    ○ Annotation options
    ○ Find / hide content
    ○ Wide range of query logic

    View Slide

  83. 83
    Kiali service visualisation

    View Slide

  84. 84
    Kiali visualisation of mesh resources
    ● View resources
    ○ Virtual services
    ○ Gateway
    ○ Destination rules
    ● Analysis of errors and helpful
    annotation

    View Slide

  85. 85
    Jaeger analytics

    View Slide