Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Distributed SQL Operational Best Practices with YugabyteDB

Distributed SQL Operational Best Practices with YugabyteDB

Amey Banarse, Principal Data Architect at YugabyteDB walked through an overview of YugabyteDB's architecture and how to apply operational best practices to its administrations. Topics covered included:

* Self-service deployments
* Deployment topologies
* Automation
* Data recoverability
* Config management
* Monitoring & health checks
* Security integration

Recorded on 12/18/19
video webinar: https://vimeo.com/380552852

AMEY BANARSE

December 28, 2020
Tweet

More Decks by AMEY BANARSE

Other Decks in Technology

Transcript

  1. 2 © 2019 All rights reserved. Introduction 2  Amey

    Banarse Principal Data Architect, YugabyteDB ♦ Pivotal ♦ FINRA University of Pennsylvania @ameybanarse http://about.me/amey
  2. © 2020 All Rights Reserved PostgreSQL-compatible, high-performance, open-source, cloud-native distributed

    SQL database 100% Apache 2.0 Low Latency & High Throughput Built for Kubernetes & Cloud Native Ecosystem
  3. 4 Yugabyte Confidential © 2019 All rights reserved. • Database

    Reliability Engineering ◦ Inspired by Google’s SRE model ◦ Blending DevOps culture with DBA teams ◦ Infrastructure as code ◦ Automation is the key Introducing DBRE model
  4. 5 Yugabyte Confidential © 2019 All rights reserved. • Responsibility

    of the data shared by cross-functional teams • Provide patterns and knowledge to support other team’s processes to facilitate their work • Defining reference architectures and configurations for data stores that are approved for operations, and can be deployed by teams. DBRE Guiding Principles
  5. 6 Yugabyte Confidential © 2019 All rights reserved. “Cloud native

    technologies empower organizations to build and run scalable applications in modern, dynamic environments such as public, private and hybrid clouds. Containers, service meshes, microservices, immutable infrastructure and declarative APIs exemplify this approach. These techniques enable loosely coupled systems that are resilient, manageable and observable. Combined with robust automation, they allow engineers to make high-impact changes frequently and predictably with minimal toil.” Cloud Native - cncf.io definition
  6. 7 Yugabyte Confidential © 2019 All rights reserved. Designed for

    Cloud Native Microservices Sharding & Load Balancing Raft Consensus Replication Distributed Transaction Manager & MVCC Document Storage Layer Custom RocksDB Storage Engine DocDB Distributed Document Store Yugabyte Query Layer YCQL YSQL PostgreSQL Google Spanner YugabyteDB SQL Ecosystem ✓ Massively adopted ✘ New SQL flavor ✓ Reuse PostgreSQL RDBMS Features ✓ Advanced Complex ✘ Basic cloud-native ✓ Advanced Complex and cloud-native Highly Available ✘ ✓ ✓ Horizontal Scale ✘ ✓ ✓ Distributed Txns ✘ ✓ ✓ Data Replication Async Sync Sync + Async
  7. © 2020 All Rights Reserved Yugabyte Cloud Fully Managed DBaaS

    Yugabyte offers flexible consumption models 24 x 7 Enterprise Support Yugabyte Platform UI Operational Excellence DBaaS out of the box Yugabyte Platform YugabyteDB Self managed Self Service UI Yugabyte managed Public DBaaS Community supported Yugabyte DB Transactional Distributed SQL DB 100% Open Source Apache 2.0 PostgreSQL compatible Enterprise grade RDBMS Cloud Native Self or Yugabyte managed https://download.yugabyte.com
  8. 9 Yugabyte Confidential © 2019 All rights reserved. • Comprehensive

    operations lifecycle manager • Cloud Native deployments on public & private cloud • Robust automation for Day 2 ops with self healing • Built-in monitoring and alerts • Configuration & Change Management Yugabyte Platform
  9. 10 Yugabyte Confidential © 2019 All rights reserved. • Choosing

    the right topology for business ◦ Business Continuity or data regulations ◦ Single or Multi Regional cluster • Multi & Hybrid Cloud deployments • Self Service deployment on any form factor - Containers, Virtual Machine(VM), Bare Metal • Ability to expand to new Regions Self-Service Deployment
  10. 11 Yugabyte Confidential © 2019 All rights reserved. 1. Single

    Region, Multi-Zone Availability Zone 1 Availability Zone 2 Availability Zone 3 Consistent Across Zones No WAN Latency But No Region-Level Failover/Repair 2. Single Cloud, Multi-Region Region 1 Region 2 Region 3 Consistent Across Regions with Auto Region-Level Failover/Repair 3. Multi-Cloud, Multi-Region Cloud 1 Cloud 2 Cloud 3 Consistent Across Clouds with Auto Cloud-Level Failover/Repair Resilient and strongly consistent across failure domains
  11. 12 © 2018 All rights reserved. • On AWS, 7

    Production clusters, 3 are 15 Nodes each. ~4 non-Prod. • Deployed across 3 AZs, 5 Nodes per AZ, rf=3. Can handle AZ failures. • In Prod, they did zero downtime upgrade of YB from v1.1.9 to 1.2.6 • ~400 GB compressed data per node = 1 TB per node • Write-heavy clusters, 60K to 80K writes/sec at peak with CPU utilization < 20%. YugabyteDB Customers & Sample Deployment Topologies Customer A • 18-node Production cluster with 2 TB of compressed data per node. • 300+ days of uninterrupted uptime and availability • Zero-downtime node repair • Cluster expansion with zero-downtime and in minutes Customer B
  12. 13 Yugabyte Confidential © 2019 All rights reserved. • Changes

    incorporated into deployment and infrastructure automation, with focus on testing, fallback, and impact mitigation • Rolling Upgrades, Scale Up/Down & CI tools • Self healing • Operational Efficiency with minimal toil Automation
  13. 14 Yugabyte Confidential © 2019 All rights reserved. • Standardized

    and automated backup and recovery processes • Business Continuity and Disaster Recovery ◦ Defining RPO and RTO • Defining the right backup strategy with data validation pipelines Data Recoverability
  14. 15 Yugabyte Confidential © 2019 All rights reserved. Data Recoverability

    • Auto Failover • Resilient to AZ/Rack outages • No downtime for client apps
  15. 16 Yugabyte Confidential © 2019 All rights reserved. • IaaS

    Configurations • Centralized Store for Cluster Config • Security - KMS or SmartKey integration • Backup/Restore - Object Store or NFS Config Management
  16. 17 Yugabyte Confidential © 2019 All rights reserved. • Monitoring

    KPI(s) on various levels ◦ Node, Network, Memory usage & Storage layer • Build Alerting and integration w/ Incident Response platforms like PagerDuty • Automated health-checks for regular activities Monitoring & Health Checks
  17. 18 Yugabyte Confidential © 2019 All rights reserved. Enterprise Security

    Hardening 18 • Third-party security audit • Audit Logging for DB operations • YSQL Kerberos and GSSAPI (in development) • LDAP support for better user management • YSQL support is GA since YB v2.4 • YCQL support will be available in YB v2.6.x • Data in Transit Encryption Enhancements • TLS v1.2 across the database • Simplified TLS workflows in YB Platform • Data at Rest Encryption with Vormetric TDE and AWS KMS Intg.
  18. 20 Yugabyte Confidential © 2019 All rights reserved. • 27

    billion operations per day with 30-40 ms of latency • 35 TBs of data managed by a single cluster • Rolling upgrades with zero downtime • Less time spent managing databases and more time spent on core business • YugabyteDB can support Plume’s next phase of growth which is predicted to reach 75+ billion ops per day Plume Design