Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Distributed SQL Operational Best Practices with YugabyteDB

Distributed SQL Operational Best Practices with YugabyteDB

Amey Banarse, Principal Data Architect at YugabyteDB walked through an overview of YugabyteDB's architecture and how to apply operational best practices to its administrations. Topics covered included:

* Self-service deployments
* Deployment topologies
* Automation
* Data recoverability
* Config management
* Monitoring & health checks
* Security integration

Recorded on 12/18/19
video webinar: https://vimeo.com/380552852

AMEY BANARSE

December 28, 2020
Tweet

More Decks by AMEY BANARSE

Other Decks in Technology

Transcript

  1. YugabyteDB Operational
    Best Practices
    Amey Banarse
    Principal Data Architect, Yugabyte

    View Slide

  2. 2
    © 2019 All rights reserved.
    Introduction
    2

    Amey Banarse
    Principal Data Architect, YugabyteDB

    Pivotal ♦
    FINRA
    University of Pennsylvania
    @ameybanarse
    http://about.me/amey

    View Slide

  3. © 2020 All Rights Reserved
    PostgreSQL-compatible, high-performance, open-source, cloud-native
    distributed SQL database
    100% Apache 2.0
    Low Latency
    & High Throughput
    Built for Kubernetes &
    Cloud Native Ecosystem

    View Slide

  4. 4
    Yugabyte Confidential © 2019 All rights reserved.
    ● Database Reliability Engineering
    ○ Inspired by Google’s SRE model
    ○ Blending DevOps culture with
    DBA teams
    ○ Infrastructure as code
    ○ Automation is the key
    Introducing DBRE model

    View Slide

  5. 5
    Yugabyte Confidential © 2019 All rights reserved.
    ● Responsibility of the data shared by cross-functional
    teams
    ● Provide patterns and knowledge to support other team’s
    processes to facilitate their work
    ● Defining reference architectures and configurations for
    data stores that are approved for operations, and can be
    deployed by teams.
    DBRE Guiding Principles

    View Slide

  6. 6
    Yugabyte Confidential © 2019 All rights reserved.
    “Cloud native technologies empower organizations to build and run
    scalable applications in modern, dynamic environments such as public,
    private and hybrid clouds. Containers, service meshes, microservices,
    immutable infrastructure and declarative APIs exemplify this approach.
    These techniques enable loosely coupled systems that are resilient,
    manageable and observable. Combined with robust automation, they allow
    engineers to make high-impact changes frequently and predictably with
    minimal toil.”
    Cloud Native - cncf.io definition

    View Slide

  7. 7
    Yugabyte Confidential © 2019 All rights reserved.
    Designed for Cloud Native Microservices
    Sharding & Load
    Balancing
    Raft Consensus
    Replication
    Distributed
    Transaction Manager
    & MVCC
    Document Storage Layer
    Custom RocksDB Storage Engine
    DocDB Distributed Document Store
    Yugabyte Query Layer
    YCQL YSQL
    PostgreSQL
    Google
    Spanner
    YugabyteDB
    SQL Ecosystem

    Massively
    adopted

    New SQL flavor

    Reuse PostgreSQL
    RDBMS Features

    Advanced
    Complex

    Basic
    cloud-native

    Advanced
    Complex and cloud-native
    Highly Available ✘ ✓ ✓
    Horizontal Scale ✘ ✓ ✓
    Distributed Txns ✘ ✓ ✓
    Data Replication Async Sync Sync + Async

    View Slide

  8. © 2020 All Rights Reserved
    Yugabyte Cloud
    Fully Managed DBaaS
    Yugabyte offers flexible consumption models
    24 x 7 Enterprise Support
    Yugabyte
    Platform UI
    Operational Excellence
    DBaaS out of the box
    Yugabyte Platform
    YugabyteDB
    Self managed
    Self Service UI
    Yugabyte managed Public DBaaS
    Community
    supported
    Yugabyte DB Transactional
    Distributed SQL DB
    100% Open Source
    Apache 2.0
    PostgreSQL
    compatible
    Enterprise grade
    RDBMS
    Cloud Native
    Self or Yugabyte managed
    https://download.yugabyte.com

    View Slide

  9. 9
    Yugabyte Confidential © 2019 All rights reserved.
    ● Comprehensive operations lifecycle manager
    ● Cloud Native deployments on public & private cloud
    ● Robust automation for Day 2 ops with self healing
    ● Built-in monitoring and alerts
    ● Configuration & Change Management
    Yugabyte Platform

    View Slide

  10. 10
    Yugabyte Confidential © 2019 All rights reserved.
    ● Choosing the right topology for business
    ○ Business Continuity or data regulations
    ○ Single or Multi Regional cluster
    ● Multi & Hybrid Cloud deployments
    ● Self Service deployment on any form factor -
    Containers, Virtual Machine(VM), Bare Metal
    ● Ability to expand to new Regions
    Self-Service Deployment

    View Slide

  11. 11
    Yugabyte Confidential © 2019 All rights reserved.
    1. Single Region, Multi-Zone
    Availability Zone 1
    Availability Zone 2 Availability Zone 3
    Consistent Across Zones
    No WAN Latency But No
    Region-Level Failover/Repair
    2. Single Cloud, Multi-Region
    Region 1
    Region 2 Region 3
    Consistent Across Regions
    with Auto Region-Level
    Failover/Repair
    3. Multi-Cloud, Multi-Region
    Cloud 1
    Cloud 2 Cloud 3
    Consistent Across Clouds
    with Auto Cloud-Level
    Failover/Repair
    Resilient and strongly consistent across failure domains

    View Slide

  12. 12
    © 2018 All rights reserved.
    ● On AWS, 7 Production clusters, 3 are
    15 Nodes each. ~4 non-Prod.
    ● Deployed across 3 AZs, 5 Nodes per
    AZ, rf=3. Can handle AZ failures.
    ● In Prod, they did zero downtime
    upgrade of YB from v1.1.9 to 1.2.6
    ● ~400 GB compressed data per node =
    1 TB per node
    ● Write-heavy clusters, 60K to 80K
    writes/sec at peak with CPU utilization
    < 20%.
    YugabyteDB Customers & Sample Deployment
    Topologies
    Customer A
    ● 18-node Production cluster with 2
    TB of compressed data per node.
    ● 300+ days of uninterrupted
    uptime and availability
    ● Zero-downtime node repair
    ● Cluster expansion with
    zero-downtime and in minutes
    Customer B

    View Slide

  13. 13
    Yugabyte Confidential © 2019 All rights reserved.
    ● Changes incorporated into deployment and
    infrastructure automation, with focus on
    testing, fallback, and impact mitigation
    ● Rolling Upgrades, Scale Up/Down & CI tools
    ● Self healing
    ● Operational Efficiency with minimal toil
    Automation

    View Slide

  14. 14
    Yugabyte Confidential © 2019 All rights reserved.
    ● Standardized and automated backup and
    recovery processes
    ● Business Continuity and Disaster Recovery
    ○ Defining RPO and RTO
    ● Defining the right backup strategy with data
    validation pipelines
    Data Recoverability

    View Slide

  15. 15
    Yugabyte Confidential © 2019 All rights reserved.
    Data Recoverability
    ● Auto Failover
    ● Resilient to
    AZ/Rack outages
    ● No downtime for
    client apps

    View Slide

  16. 16
    Yugabyte Confidential © 2019 All rights reserved.
    ● IaaS Configurations
    ● Centralized Store for Cluster Config
    ● Security - KMS or SmartKey integration
    ● Backup/Restore - Object Store or NFS
    Config Management

    View Slide

  17. 17
    Yugabyte Confidential © 2019 All rights reserved.
    ● Monitoring KPI(s) on various levels
    ○ Node, Network, Memory usage & Storage
    layer
    ● Build Alerting and integration w/ Incident
    Response platforms like PagerDuty
    ● Automated health-checks for regular activities
    Monitoring & Health Checks

    View Slide

  18. 18
    Yugabyte Confidential © 2019 All rights reserved.
    Enterprise Security Hardening
    18
    • Third-party security audit
    • Audit Logging for DB operations
    • YSQL Kerberos and GSSAPI (in development)
    • LDAP support for better user management
    • YSQL support is GA since YB v2.4
    • YCQL support will be available in YB v2.6.x
    • Data in Transit Encryption Enhancements
    • TLS v1.2 across the database
    • Simplified TLS workflows in YB Platform
    • Data at Rest Encryption with Vormetric TDE and AWS KMS Intg.

    View Slide

  19. 19
    Yugabyte Confidential © 2019 All rights reserved.
    Case Study

    View Slide

  20. 20
    Yugabyte Confidential © 2019 All rights reserved.
    ● 27 billion operations per day with 30-40 ms
    of latency
    ● 35 TBs of data managed by a single cluster
    ● Rolling upgrades with zero downtime
    ● Less time spent managing databases and
    more time spent on core business
    ● YugabyteDB can support Plume’s next phase
    of growth which is predicted to reach 75+
    billion ops per day
    Plume Design

    View Slide

  21. 21
    Yugabyte Confidential © 2019 All rights reserved.
    Demo

    View Slide