Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Considerations for running Distributed SQL Databases on Kubernetes

Considerations for running Distributed SQL Databases on Kubernetes

Kubernetes has hit a home run for stateless workloads, but can it do the same for stateful services such as distributed databases? Before we can answer that question, we need to understand the challenges of running stateful workloads on, well anything. In this talk, we will first look at which stateful workloads, specifically databases, are ideal for running inside Kubernetes. Secondly, we will explore the various concerns around running databases in Kubernetes for production environments, such as: - The production-readiness of Kubernetes for stateful workloads in general - The pros and cons of the various deployment architectures - The failure characteristics of a distributed database inside containers In this session, we will demonstrate what Kubernetes brings to the table for stateful workloads and what database servers must provide to fit the Kubernetes model. This talk will also highlight some of the modern databases that take full advantage of Kubernetes and offer a peek into what’s possible if stateful services can meet Kubernetes halfway. We will go into the details of deployment choices, how the different cloud-vendor managed container offerings differ in what they offer, as well as compare performance and failure characteristics of a Kubernetes-based deployment with an equivalent VM-based deployment.

-How different kinds of databases work on Kubernetes
- The production-readiness of Kubernetes for stateful workloads in general
- The pros and cons of the various deployment architectures
- The failure characteristics of a distributed database inside containers

AMEY BANARSE

August 03, 2021
Tweet

More Decks by AMEY BANARSE

Other Decks in Technology

Transcript

  1. © 2020 - All Rights Reserved 1
    YugabyteDB –
    Distributed SQL
    Database on Kubernetes
    Amey Banarse
    VP of Data Engineering, Yugabyte, Inc.
    Aug 3rd, 2021

    View Slide

  2. © 2020 - All Rights Reserved 2
    Introductions
    VP of Data Engineering, Yugabyte
    Pivotal • FINRA • NYSE
    University of Pennsylvania (UPenn)
    @ameybanarse
    about.me/amey
    Amey Banarse

    View Slide

  3. © 2020 - All Rights Reserved 3
    Kubernetes Is Massively Popular in Fortune 500s
    ● Walmart – Edge Computing
    KubeCon 2019 https://www.youtube.com/watch?v=sfPFrvDvdlk
    ● Target – Data @ Edge
    https://tech.target.com/2018/08/08/running-cassandra-in-kubernetes
    -across-1800-stores.html
    ● eBay – Platform Modernization
    https://www.ebayinc.com/stories/news/ebay-builds-own-servers-intends
    -to-open-source/

    View Slide

  4. © 2020 - All Rights Reserved 4
    The State of Kubernetes 2020
    ● Substantial growth in large enterprises
    ○ Being used in production environments
    ● On-premises deployments still most
    common
    ● There are pain points, but most developers
    and executives feel k8s investment is
    worth it

    View Slide

  5. © 2020 - All Rights Reserved 5
    Data on K8s Ecosystem Is Evolving Rapidly

    View Slide

  6. © 2020 - All Rights Reserved 6
    Why run a DB in K8s?

    View Slide

  7. © 2020 - All Rights Reserved 7
    Better resource utilization
    ● Reduce cost with better packing of DBs
    ● Useful when running large number of DBs
    ○ Multi-tenant applications with a DB
    per tenant
    ○ Self-service private DBaaS
    ● But watch out for noisy neighbors
    ○ Perf issues when running critical
    production workloads
    Node #1 Node #2 Node #3

    View Slide

  8. © 2020 - All Rights Reserved 8
    Resize pod resources dynamically
    ● Dynamically change CPU, memory
    ● Embrace Automation - done without
    incurring downtime
    ○ Scale DB with workload
    ○ Automate to scale up automatically
    $ kubectl apply -f cpu-request-limit.yaml
    $ kubectl apply -f
    memory-request-limit.yaml

    View Slide

  9. © 2020 - All Rights Reserved 9
    Portability between clouds and on-premises
    ● Infrastructure as code
    ● Works in a similar fashion on any cloud
    ○ Cloud-provider managed k8s (AKS,
    EKS, GKE)
    ○ Self-managed k8s (public/private cloud)
    ● But not perfectly portable
    ○ Need to understand some cloud specific
    constructs (Example: volume types,
    load balancers)

    View Slide

  10. © 2020 - All Rights Reserved 10
    Out of box infrastructure orchestration
    ● Pods the fail are automatically restarted
    ● Pods are resized across nodes in cluster
    ○ Optimal resource utilization
    ○ Specify policies in code (example:
    anti-affinity)
    ● Loss of some flexibility
    ○ Cannot make permanent changes on
    pods

    View Slide

  11. © 2020 - All Rights Reserved 11
    Automating day 2 operations
    ● Robust automation with CRDs (Custom
    Resource Definitions) or commonly referred
    as ‘K8s Operator’
    ● Easy to build an operator for ops
    ○ Periodic backups
    ○ DB software upgrades
    ● Automating failover of traditional RDBMS can
    be dangerous
    ○ Potential for data loss?
    ○ Mitigation: use a distributed DB

    View Slide

  12. © 2020 - All Rights Reserved 12
    Why NOT run a DB in K8s?

    View Slide

  13. © 2020 - All Rights Reserved 13
    Greater chance of pod failures
    ● Pods fail more often than VMs or bare metal
    ● Many reasons for increased failure rate
    ○ Process failures - config issues or bugs
    ○ Out of memory and the OOM Killer
    ○ Transparent rescheduling of pods
    ● Will pod failures cause disruption of the
    service or data loss?
    ○ Mitigation: use a distributed DB
    x
    Node #1 Node #2 Node #3
    Data loss likely if local storage used by pod

    View Slide

  14. © 2020 - All Rights Reserved 14
    Local vs persistent storage
    ● Local storage = use local disk on the node
    ○ Not replicated, but higher performance
    ○ Data not present in new pod location
    ● Persistent storage = use replicated storage
    ○ Data visible to pod after it moves to
    new node
    ○ What to do for on-prem? Use software
    solution (additional complexity)
    ● Mitigation: use a distributed DB
    Node #1 Node #2 Node #3
    Disk 1 Disk 3
    Pod sees a new, empty disk (Disk 3) after move
    with local storage

    View Slide

  15. © 2020 - All Rights Reserved 15
    Need for a load balancer
    ● Restricted cluster ingress in k8s
    ○ If app not on same k8s cluster, needs LB
    ● Needs load balancer to expose DB
    externally
    ○ Not an issue on public clouds - use
    cloud-provider network LBs
    ○ But there may be per-cloud limits on
    NLBs and public IP address limits
    ● Bigger problem on-prem with hardware
    based load balancers (Example: F5)
    Node #1 Node #2 Node #3
    Load balancer to access any DB service

    View Slide

  16. © 2020 - All Rights Reserved 16
    Networking complexities
    ● Two k8s clusters cannot “see” each other
    ● Network discovery and reachability issues
    ○ Pods of one k8s cluster cannot refer and
    replicate to pods in another k8s cluster by
    default
    ● Mitigation #1: use DNS chaining today
    (operational complexity, depends on env)
    ● Mitigation #2: use service mesh like Istio (but
    lower performance - HTTP layer vs TCP)
    Replication
    ?
    Video: Kubecon EU 2021 - Building the Multi-Cluster Data Layer

    View Slide

  17. © 2020 - All Rights Reserved 17
    Running a Distributed SQL DB in k8s (YugabyteDB)
    ● Better resource utilization
    ● Resize pod resources dynamically
    ● Portability (cloud and on-premises)
    ● Out of box infrastructure orchestration
    ● Automate day 2 DB operations
    ● Greater chance of pod failures
    ● Local storage vs persistent storage
    ● Need for a load balancer
    ● Networking complexities
    ● Operational maturity curve

    View Slide

  18. © 2020 - All Rights Reserved 18
    Transactional, distributed SQL database designed for resilience
    and scale
    100% open source, PostgreSQL compatible, enterprise-grade RDBMS
    …..built to run across all your cloud environments

    View Slide

  19. © 2020 - All Rights Reserved 19
    A Brief History of Yugabyte
    Part of Facebook’s cloud native DB evolution
    ● Yugabyte team dealt with this growth first hand
    ● Massive geo-distributed deployment given global users
    ● Worked with world-class infra team to solve these issues
    Builders of multiple popular databases
    +1 Trillion
    ops/day
    +100 Petabytes
    data set sizes
    Yugabyte founding team ran Facebook’s public cloud
    scale DBaaS

    View Slide

  20. © 2020 - All Rights Reserved
    Designing the perfect Distributed SQL Database
    20
    Aurora much more popular than Spanner
    Amazon Aurora Google Spanner
    A highly available MySQL and
    PostgreSQL-compatible
    relational database service
    Not scalable but HA
    All RDBMS features
    PostgreSQL & MySQL
    The first horizontally scalable,
    strongly consistent, relational
    database service
    Scalable and HA
    Missing RDBMS features
    New SQL syntax
    bit.ly/distributed-sql-deconstructed
    Skyrocketing adoption of PostgreSQL for
    cloud-native applications

    View Slide

  21. © 2020 - All Rights Reserved
    Designed for cloud native microservices.
    21
    Sharding & Load
    Balancing
    Raft Consensus
    Replication
    Distributed
    Transaction Manager
    & MVCC
    Document Storage Layer
    Custom RocksDB Storage Engine
    DocDB Distributed Document Store
    Yugabyte Query Layer
    YSQL YCQL
    PostgreSQL
    Google
    Spanner
    YugabyteDB
    SQL Ecosystem

    Massively
    adopted

    New SQL flavor

    Reuse PostgreSQL
    RDBMS Features

    Advanced
    Complex

    Basic
    cloud-native

    Advanced
    Complex and cloud-native
    Highly Available ✘ ✓ ✓
    Horizontal Scale ✘ ✓ ✓
    Distributed Txns ✘ ✓ ✓
    Data Replication Async Sync Sync + Async

    View Slide

  22. © 2020 - All Rights Reserved
    Design Goal: support all RDBMS features
    What’s supported today (Yugabyte v2.7)
    Impossible without reusing PostgreSQL code!
    Amazon Aurora uses this strategy. Other distributed SQL
    databases do not support most of these features. Building
    these features from ground-up is:
    ● Hard to build robust functional spec
    ● Takes a lot of time to implement
    ● Takes even longer for users to adopt and mature

    View Slide

  23. © 2020 - All Rights Reserved
    Layered Architecture
    DocDB Storage Layer
    Distributed, transactional document store
    with sync and async replication support
    YSQL
    A fully PostgreSQL
    compatible relational API
    YCQL
    Cassandra compatible
    semi-relational API
    Extensible Query Layer
    Extensible query layer to support multiple API’s
    Microservice requiring
    relational integrity
    Microservice requiring
    massive scale
    Microservice requiring
    geo-distribution of data
    Extensible query layer
    ○ YSQL: PostgreSQL-based
    ○ YCQL: Cassandra-based
    Transactional storage layer
    ○ Transactional ACID compliant
    ○ Resilient and scalable
    ○ Document storage

    View Slide

  24. © 2020 - All Rights Reserved
    1. Single Region, Multi-Zone
    Availability Zone 1
    Availability Zone 2 Availability Zone 3
    Consistent Across Zones
    No WAN Latency But No
    Region-Level Failover/Repair
    2. Single Cloud, Multi-Region
    Region 1
    Region 2 Region 3
    Consistent Across Regions
    with Auto Region-Level
    Failover/Repair
    3. Multi-Cloud, Multi-Region
    Cloud 1
    Cloud 2 Cloud 3
    Consistent Across Clouds
    with Auto Cloud-Level
    Failover/Repair
    Resilient and strongly consistent across failure domains

    View Slide

  25. © 2020 - All Rights Reserved 25
    YugabyteDB on K8s
    Architecture

    View Slide

  26. © 2020 - All Rights Reserved
    YugabyteDB Deployed as StatefulSets
    26
    node2
    node1 node4
    node3
    yb-master
    StatefulSet
    yugabytedb
    yb-master-1 pod
    yugabytedb
    yb-master-0 pod
    yugabytedb
    yb-master-2 pod
    yb-tserver
    StatefulSet
    tablet 1’
    yugabytedb
    yb-tserver-1 pod
    tablet 1’
    yugabytedb
    yb-tserver-0 pod tablet 1’
    yugabytedb
    yb-tserver-3 pod
    tablet 1’
    yugabytedb
    yb-tserver-2 pod

    Local/Remote
    Persistent Volume
    Local/Remote
    Persistent Volume
    Local/Remote
    Persistent Volume
    Local/Remote
    Persistent Volume
    yb-masters
    Headless Service
    yb-tservers
    Headless Service
    App
    Clients
    Admin
    Clients

    View Slide

  27. © 2020 - All Rights Reserved
    Under the Hood – 3 Node Cluster
    27
    DocDB Storage Engine
    Purpose-built for ever-growing data, extended from RocksDB
    yb-master1
    yb-master3
    yb-master2
    YB-Master
    Manage shard metadata &
    coordinate cluster-wide ops
    Worker node1
    Worker node3
    Worker node2
    Global Transaction Manager
    Tracks ACID txns across multi-row ops, incl. clock skew mgmt.
    Raft Consensus Replication
    Highly resilient, used for both data replication & leader election
    tablet 1’
    tablet 1’
    yb-tserver1 yb-tserver2
    yb-tserver3
    tablet 1’
    tablet2-leader
    tablet3-leader
    tablet1-leader
    YB-TServer
    Stores/serves data
    in/from tablets (shards)
    tablet1-follower
    tablet1-follower
    tablet3-follower
    tablet2-follower
    tablet3-follower
    tablet2-follower



    YB Helm Charts at
    charts.yugabyte.com

    View Slide

  28. © 2020 - All Rights Reserved 28
    Deployed on all popular Kubernetes platforms

    View Slide

  29. © 2020 - All Rights Reserved 29
    YugabyteDB on K8s Demo
    Single YB Universe Deployed on 3 separate GKE
    Clusters

    View Slide

  30. © 2020 - All Rights Reserved
    YugabyteDB Universe on 3 GKE Clusters
    Deployment:
    3 GKE clusters
    Each with 3 x N1 Standard 8 nodes
    3 pods in each cluster using 4 cores
    Cores: 4 cores per pod
    Memory: 7.5 GB per pod
    Disk: ~ 500 GB total for universe
    30

    View Slide

  31. © 2020 - All Rights Reserved
    yb-tserver1 yb-tserver2 yb-tserver3

    View Slide

  32. © 2020 - All Rights Reserved
    “Cloud native technologies empower organizations to build and run
    scalable applications in modern, dynamic environments such as public,
    private and hybrid clouds. Containers, service meshes, microservices,
    immutable infrastructure and declarative APIs exemplify this approach.
    These techniques enable loosely coupled systems that are resilient,
    manageable and observable. Combined with robust automation, they allow
    engineers to make high-impact changes frequently and predictably with
    minimal toil.”
    Cloud Native - cncf.io definition

    View Slide

  33. © 2021 All Rights Reserved
    ● Database Reliability Engineering
    ○ Inspired by Google’s SRE model
    ○ Blending DevOps culture with
    DBA teams
    ○ Infrastructure as code
    ○ Automation is the key
    Introducing DBRE model

    View Slide

  34. © 2021 All Rights Reserved
    ● Responsibility of the data shared by cross-functional
    teams
    ● Provide patterns and knowledge to support other team’s
    processes to facilitate their work
    ● Defining reference architectures and configurations for
    data stores that are approved for operations, and can be
    deployed by teams.
    DBRE Guiding Principles

    View Slide

  35. © 2020 - All Rights Reserved 35
    YugabyteDB on K8s Multi-Region Requirements
    ● Pod to pod communication over TCP ports using RPC calls across n K8s clusters
    ● Global DNS Resolution system
    ○ Across all the K8s clusters so that pods in one cluster can connect to pods in other clusters
    ● Ability to create load balancers in each region/DB
    ● RBAC: ClusterRole and ClusterRoleBinding
    ● Reference:
    Deploy YugabyteDB on multi cluster GKE
    https://docs.yugabyte.com/latest/deploy/kubernetes/multi-cluster/gke/helm-chart/

    View Slide

  36. © 2020 - All Rights Reserved 36
    Ensuring High Performance
    LOCAL STORAGE REMOTE STORAGE
    Lower latency, Higher throughput
    Recommended for workloads that do their own
    replication
    Pre-provision outside of K8s
    Use SSDs for latency-sensitive apps
    Higher latency, Lower throughput
    Recommended for workloads do not perform any
    replication on their own
    Provision dynamically in K8s
    Use alongside local storage for cost-efficient tiering
    Most used

    View Slide

  37. © 2020 - All Rights Reserved 37
    Configuring Data Resilience
    POD ANTI-AFFINITY MULTI-ZONE/REGIONAL/MULTI-REGION
    POD SCHEDULING
    Pods of the same type should not be
    scheduled on the same node
    Keeps impact of node failures to
    absolute minimum
    Multi-Zone – Tolerate zone failures for
    K8s worker nodes
    Regional – Tolerate zone failures for
    both K8s worker and master nodes
    Multi-Region / Multi-Cluster –
    Requires network discovery between
    multi cluster

    View Slide

  38. © 2020 - All Rights Reserved 38
    BACKUP & RESTORE
    Backups and restores are a
    database level construct
    YugabyteDB can perform
    distributed snapshot and copy to a
    target for a backup
    Restore the backup into an existing
    cluster or a new cluster with a
    different number of TServers
    ROLLING UPGRADES
    Supports two upgradeStrategies:
    onDelete (default) and
    rollingUpgrade
    Pick rolling upgrade strategy for
    DBs that support zero downtime
    upgrades such as YugabyteDB
    New instance of the pod spawned
    with same network id and storage
    HANDLING FAILURES
    Pod failure handled by K8s
    automatically
    Node failure has to be handled
    manually by adding a new slave
    node to K8s cluster
    Local storage failure has to be
    handled manually by mounting
    new local volume to K8s
    Automating Day 2 Operations

    View Slide

  39. © 2020 - All Rights Reserved 39
    https://github.com/yugabyte/yugabyte-platform-operator
    Based on Custom Controllers that have direct
    access to lower level K8S API
    Excellent fit for stateful apps requiring human
    operational knowledge to correctly scale,
    reconfigure and upgrade while simultaneously
    ensuring high performance and data resilience
    Complementary to Helm for packaging
    Auto-scaling with k8s operators
    CPU usage in the yb-tserver
    StatefulSet
    Scale pods
    CPU > 80% for 1min and
    max_threshold not exceeded

    View Slide

  40. © 2020 All Rights Reserved
    Performance parity across Kubernetes and VMs - TPCC workloads
    40
    TPCC Workload VMs (AWS) Kubernetes (GKE)
    Topology 3 x c5.4xlarge( 16 vCPUs, 32 GiB RAM, 400 GB SSD) 3 x TServer Pods (16 vCPUs, 15 GB RAM, 400 GB SSD)
    tpmC 12,597.63 12,299.60
    Efficiency 97.96% 95.64%
    Throughput 469.06 requests/sec 462.32 requests/sec
    Latency New Order
    Avg Latency: 33.313 ms
    P99 Latency: 115.446 ms
    Payment
    Avg Latency: 24.735 ms
    P99 Latency: 86.051 ms
    OrderStatus
    Avg Latency: 14.357 ms
    P99 Latency: 43.475 ms
    Delivery
    Avg Latency: 66.522 ms
    P99 Latency: 205.065 ms
    StockLevel
    Avg Latency: 212.180 ms
    P99 Latency: 670.487 ms
    New Order
    Avg Latency: 59.66 ms
    P99 Latency: 478.89 ms
    Payment
    Avg Latency: 33.53 ms
    P99 Latency: 248.07 ms
    OrderStatus
    Avg Latency: 14.65 ms
    P99 Latency: 100.48 ms
    Delivery
    Avg Latency: 148.36 ms
    P99 Latency: 838.42 ms
    StockLevel
    Avg Latency: 99.99 ms
    P99 Latency: 315.38 ms

    View Slide

  41. © 2021 All Rights Reserved
    Target Use Cases
    41
    Systems of Record
    and Engagement
    Event Data and IoT Geo-Distributed
    Workloads
    Resilient, business critical data Handling massive scale Sync, async, and geo replication
    ● Identity management
    ● User/account profile
    ● eCommerce apps - checkout,
    shopping cart
    ● Real time payment systems
    ● Vehicle telemetry
    ● Stock bids and asks
    ● Shipment information
    ● Credit card transactions
    ● Vehicle telemetry
    ● Stock bids and asks
    ● Shipment information
    ● Credit card transactions

    View Slide

  42. © 2020 - All Rights Reserved 42
    A Classic Enterprise App Scenario

    View Slide

  43. © 2020 - All Rights Reserved
    Modern Cloud Native Application Stack
    Deployed on
    https://github.com/yugabyte/yugastore-java

    View Slide

  44. © 2020 - All Rights Reserved
    Yugastore – Kronos Marketplace
    44

    View Slide

  45. © 2020 - All Rights Reserved
    Microservices Architecture
    45
    CART
    MICROSERVICE
    PRODUCT
    MICROSERVICE
    API Gateway
    CHECKOUT
    MICROSERVICE
    Yugabyte Cluster
    YSQL
    YSQL
    YCQL
    UI App
    REST
    APIs
    .

    View Slide

  46. © 2020 - All Rights Reserved
    Istio Traffic Management for Microservices
    46
    CART
    MICROSERVICE
    PRODUCT
    MICROSERVICE
    API
    Gateway
    CHECKOUT
    MICROSERVICE
    UIU
    UI APP
    Galley
    Citadel
    Pilot
    Istio
    Edge Proxy
    Istio Control Plane
    Istio Service Discovery
    Istio Edge Gateway
    Istio Route Configuration
    using Envoy Proxy

    View Slide

  47. © 2020 - All Rights Reserved
    The fastest growing Distributed SQL Database
    Slack users
    ▲ 3K
    We 💛 stars! Give us one:
    github.com/YugaByte/yugabyte-db
    Join our community:
    yugabyte.com/slack
    Clusters deployed
    ▲ 600K

    View Slide

  48. © 2020 - All Rights Reserved 48
    Thank You
    Join us on Slack: yugabyte.com/slack
    Star us on GitHub: github.com/yugabyte/yugabyte-db

    View Slide