Upgrade to Pro — share decks privately, control downloads, hide ads and more …

KubeCon CloudNativeCon 2019 SanDiego Recap 〜Vitess〜 English.ver

cyberblack28
December 11, 2019

KubeCon CloudNativeCon 2019 SanDiego Recap 〜Vitess〜 English.ver

Cloud Native Meetup Tokyo #11 KubeCon Recap

cyberblack28

December 11, 2019
Tweet

More Decks by cyberblack28

Other Decks in Technology

Transcript

  1. Cloud Native Meetup Tokyo #11 KubeCon + CloudNativeCon Recap 2019.12.10

    @CyberAgent © cyberblack28 KubeCon + CloudNativeCon 2019 San Diego Recap ~ Vitess ~
  2. Profile Name : Yutaka Ichikawa Twitter : cyberblack28 Hatena Blog

    : https://cyberblack28.hatenablog.com/ SpeakerDeck : https://speakerdeck.com/cyberblack28 Job Educational Solution Architect Developer Advocate / Technical Evangelist Infrastructure Engineer Frontend Engineer Community & Certification Publications #deepcn #rancherjp CKA KCM100 CKAD 2018 2019
  3. 1. Overview 2. What’s Vitess 3. Case Studies & Maintainer

    Track & Storage Sessions 4. Summary Agenda
  4. Overview KubeCon + CloudNativeCon NA 2019 Vitess Sessions 1.Keynote Sessions

    Tuesday, November 19 • 9:20am - 9:45am Keynote: CNCF Project Updates - Bryan Liles, KubeCon + CloudNativeCon North America 2019 Co-Chair & Senior Staff Engineer, VMware http://bit.ly/kubecon2019na_vitess1 http://bit.ly/kubecon2019na_vitess_m1 Wednesday, November 20 • 9:27am - 9:32am Sponsored Keynote: Network, Please Evolve – Chapter 2 - Vijoy Pandey, Vice President/CTO Cloud, Cisco http://bit.ly/kubecon2019na_vitess1_2 http://bit.ly/kubecon2019na_vitess_m1_2
  5. Overview KubeCon + CloudNativeCon NA 2019 Vitess Sessions 2.Case Studies

    Tuesday, November 19 • 11:50am - 12:25pm Scaling Resilient Systems: A Journey into Slack's Database Service - Rafael Chacon & Guido Iaquinti, Slack Thursday, November 21 • 2:25pm - 3:00pm Gone in 60 Minutes: Migrating 20 TB from AKS to GKE in an Hour with Vitess - Derek Perkins, Nozzle http://bit.ly/kubecon2019na_vitess2 http://bit.ly/kubecon2019na_vitess_m2 http://bit.ly/kubecon2019na_vitess_m3
  6. Overview KubeCon + CloudNativeCon NA 2019 Vitess Sessions 3.Maintainer Track

    Sessions Tuesday, November 19 • 11:50am - 12:25pm How to Migrate a MySQL Database to Vitess - Sugu Sougoumarane & Morgan Tocker, PlanetScale Wednesday, November 20 • 2:25pm - 3:00pm Geo-partitioning with Vitess - Deepthi Sigireddi & Jitendra Vaidya, PlanetScale http://bit.ly/kubecon2019na_vitess3 http://bit.ly/kubecon2019na_vitess_m4 http://bit.ly/kubecon2019na_vitess4 http://bit.ly/kubecon2019na_vitess_m5
  7. Overview KubeCon + CloudNativeCon NA 2019 Vitess Sessions 4.Storage Sessions

    Tuesday, November 19 • 3:20pm - 3:55pm Vitess: Stateless Storage in the Cloud - Sugu Sougoumarane, PlanetScale http://bit.ly/kubecon2019na_vitess5 http://bit.ly/kubecon2019na_vitess_m6
  8. Overview Keynote Sessions Keynote: CNCF Project Updates - Bryan Liles,

    KubeCon + CloudNativeCon North America 2019 Co-Chair & Senior Staff Engineer, VMware
  9. Overview November 5, 2019 Cloud Native Computing Foundation Announces Vitess

    Graduation Vitess is the eighth project to graduate, following Kubernetes, Prometheus, Envoy, CoreDNS, containerd, Fluentd, and Jaeger. Version is Vitess 4.0.1. Announcement : http://bit.ly/vitess_graduation Vitess graduated one year and nine months after becoming the CNCF Incubation Project in February 2018. 1.Adoption “Mission-critical production workloads running in real companies” 2.Maintainer Diversity “Identify long-term contributions from multiple organizations, then drill down into project details and test how to do for your design strategy.” 3.Project Health “Determining the appropriateness of project health”
  10. Overview “Slack's Vitess introduction was due to the very rapidly

    changing business needs and a system that was flexible enough to accommodate those changes.” “Slack currently has a goal of about 35% migration to Vitess and 100% next year.”
  11. Overview “JD.com is China's largest online shopping site. China's Black

    Friday sale has achieved a huge scale of about 4,000 key spaces, over 30,000 pods, and a QPS of 35 million (peak).”
  12. Overview KubeCon + CloudNativeCon China 2019 Vitess Sessions Tuesday, June

    25 • 11:00 - 11:35 Two Years with Vitess: How JD.com Runs the World's Largest Vitess - Xuhaihua & Jin Ke Xie , JD.com http://bit.ly/kubecon2019china_vitess1 http://bit.ly/kubecon2019china_vitess_m1
  13. Overview “Launched startup company Vitess called Nozzle. All of their

    applications were run on Kubernetes and moved from AKS to GKE, realizing "No Vendor Lock-in" in Kubernetes and Vitess.
  14. Overview Abstracting the L2 / L3 network part and realizing

    communication functions according to Kubernetes functions
  15. Overview “Until now, Technical Complexity & Organizational Complexity & Process

    Complexity is born.” “You want these DB pods to be able to securely communicate for DB sharding & replication.
  16. What’s Vitess A cloud-native database cluster system that achieves high

    availability and scale on a large scale with Sharding MySQL
  17. What’s Vitess vtgate A proxy server that routes queries from

    application to vttablet and returns the results to the client tablet mysqld and vttablet set vttablet Proxy server placed in front of MySQL (mysqld), also serves to protect MySQL from query rewriting, deduplication, and harmful queries vtctld HTTP server that serves as the window for management operations (GUI) of Vitess cluster vtctl Command line tool for managing Vitess cluster (CLI) Topology Metadata store that manages configuration information of Vitess cluster, Kubernetes supports etcd, and other than etcd supports ZooKeeper Technical Terms
  18. What’s Vitess Sharding • Store data divided into two or

    more databases • Scale-out and performance improvement by adding Shard Sharding of Vitess • Vertical Sharding Store in multiple databases for each table • Horizontal Sharding Divide one table into multiple shards and store them in multiple databases
  19. What’s Vitess Table Sharding VSchema is Sharding definition, routing information

    Refer to VTworkerVSchema and execute Sharding split processing Refer to VSchema and route to the appropriate Shard Keyspace is a logical database that combines multiple shards. Recognized as one database from application.
  20. What’s Vitess Reference Docs • Vitess is a database clustering

    system for horizontal scaling of MySQL https://vitess.io/ • Vitess Twitter https://twitter.com/vitessio • CrashAcademy 「CNDJP 勉強会 #8 Vitessのパフォーマンスと運用性を検証してみた」 https://crash.academy/ng/video/412/1736 • Vitess Slack https://vitess.slack.com/ • Vitess Github https://github.com/vitessio/vitess
  21. Case Studies & Maintainer Track & Storage Sessions In this

    talk, Rafael and Guido will share an overview about how Slack designed, built, scaled and then iterated to improve its distributed database service based on top of Vitess, now a CNCF project. The Databases team at Slack scaled a Vitess cluster from 0 to spikes of 2.7 Million queries per second. This journey has taught us how to operate a database cluster with more than 2000 nodes and expecting to growth to more than 3500 in the next 12 months.
  22. Case Studies & Maintainer Track & Storage Sessions 1.Databases at

    Slack Current status Legacy Shards Vitess Shards In progress migration of our entire dataset to Vitess.
  23. Case Studies & Maintainer Track & Storage Sessions Legacy Shards

    Vitess Shards Application level team-sharded active master-master MySQL setup. Master-replica MySQL setup fully managed by Vitess.
  24. Case Studies & Maintainer Track & Storage Sessions Why are

    we migrating? • “Migrating to Vitess at (Slack) Scale” - Mike Demmer (https://www.percona.com/live/18/sessions/migrating-to-vitess-at-slack-scale) • “Designing and launching the next-generation database system at Slack: from whiteboard to production” - Guido Iaquinti (https://www.percona.com/live/18/sessions/designing-and-launching-the-next-generation-database-system-slack-from-whiteboard-to-production) • “Smooth scaling: Slack’s journey toward a new database” - Ameet Kotian (https://conferences.oreilly.com/velocity/vl-ny/public/schedule/detail/69885) For more details please see the presentations on the slide.
  25. Case Studies & Maintainer Track & Storage Sessions tl;dr; shard

    size limits, inefficient resource distribution, operational overhead, single sharding model “While Slack users are on the rise, they are unable to scale quickly and flexibly and cannot meet business needs.” Why are we migrating?
  26. Case Studies & Maintainer Track & Storage Sessions • Scaling

    and sharding flexibility without changing SQL (much) • MySQL core maintains operator and developer know-how • Proven at scale at YouTube and more recently others • Active developer community and approachable code base Why Vitess?
  27. Case Studies & Maintainer Track & Storage Sessions Stats •

    Queries per day: 53+ billion • Storage provisioned: 7.5+ PB • Served by legacy infrastructure: ~60% • Served by Vitess: ~40% • Target: 70% served by Vitess by EOY Aim to complete the transition to Vitess within 2020 !!
  28. Case Studies & Maintainer Track & Storage Sessions 2.Running databases

    in the cloud Immutable infrastructure Instance failure Durability through replication
  29. Case Studies & Maintainer Track & Storage Sessions How we

    run Vitess EC2 Percona MySQL5.7 ASG for stateless components Ephemeral NVMe (no EBS)
  30. Case Studies & Maintainer Track & Storage Sessions 3.Fault tolerance

    & isolation Slack cloud infrastructure • Amazon EC2 is hosted in multiple locations world-wide. • These locations are composed of Regions and Availability Zones (AZ’s). • Each Region is a separate geographic area. • AZ’s in a Region are connected through low-latency links.
  31. Case Studies & Maintainer Track & Storage Sessions Vitess initial

    deployment • A single cell across multiple AZ’s (fundamental). • Global and local topology using the same Consul cluster (circumstantial). Topology : Vitess Key-Value Store Consul : Service Discovery
  32. Case Studies & Maintainer Track & Storage Sessions Resilient systems

    • Minimize the blast radius. • Isolation is key. • Understand your dependencies.
  33. Case Studies & Maintainer Track & Storage Sessions Current deployment

    • Isolated topologies (one dc for each AZ and one for the global topo). • Blast radius is mapped to physical infrastructure.
  34. Case Studies & Maintainer Track & Storage Sessions We have

    benefited already • AZ failure during backup time. • Single cell was affected!
  35. Case Studies & Maintainer Track & Storage Sessions 4.Key Lessons

    Complex system failures • Complex systems are intrinsically dangerous systems. • Complex systems are heavily and successfully defended against failure. • Catastrophe is always just around the corner. • Complex systems contain changing mixtures of failures latent within them. How Complex Systems Fail – MIT (https://web.mit.edu/2.75/resources/random/How%20Complex%20Systems%20Fail.pdf)
  36. Case Studies & Maintainer Track & Storage Sessions Complex system

    failures Humility towards complexity. Reach out to other fields and learn from their experience.
  37. Case Studies & Maintainer Track & Storage Sessions Gone in

    60 Minutes: Migrating 20 TB from AKS to GKE in an Hour with Vitess - Derek Perkins, Nozzle • The holy grail of Cloud Native tech is to have zero vendor lock-in • migrate a high throughput production workload of 20 TB from Azure (AKS) to Google (GKE) in under an hour
  38. Case Studies & Maintainer Track & Storage Sessions Vendor lock-in

    is preferably zero, but there are good cases. But right judgment is important.
  39. Case Studies & Maintainer Track & Storage Sessions AKS GKE

    GCS Backup Restore cross-cluster networking for zero downtime
  40. Case Studies & Maintainer Track & Storage Sessions AKS GCS

    Node Pool Internal App Deploy all internal applications GKE Deploy cert-manager external dns nginx ingress Set up node pools for dedicated Vitess tablets
  41. Case Studies & Maintainer Track & Storage Sessions AKS GCS

    Internal App Scale down Node Pool Internal App GKE Shut down Backup Start
  42. Case Studies & Maintainer Track & Storage Sessions "Google Cloud

    Platform drives our analytics and machine learning needs. With BigQuery and Cloud Machine Learning Engine on Google Kubernetes Engine, we have an insights platform that's customized for our performance, IT, and cost requirements." —Derek Perkins, Founder & CEO, Nozzle https://cloud.google.com/customers/nozzle/ Cloud Tasks GKE Bigquery
  43. Case Studies & Maintainer Track & Storage Sessions How to

    Migrate a MySQL Database to Vitess - Sugu Sougoumarane & Morgan Tocker, PlanetScale • Vitess basics • a demo of live-migrating an existing MySQL installation into Vitess. → No Demo ! 1 2 3 4 5 0
  44. Case Studies & Maintainer Track & Storage Sessions Geo-partitioning with

    Vitess - Deepthi Sigireddi & Jitendra Vaidya, PlanetScale • Problems and solutions in GDPR • Vitess approach to Geo-patitioning based on GDPR • Vitess custom sharding scheme demo
  45. Case Studies & Maintainer Track & Storage Sessions GDPR(General Data

    Protection Regulation) Rules aimed at strengthening and integrating data protection for all individuals within the European Union. Custom Sharding Scheme is one of the ways Vitess responds to GDPR's request to “Localize data storage locations in the country of residence of users”. There will be such rules outside the EU.
  46. Case Studies & Maintainer Track & Storage Sessions Custom Sharding

    Scheme Demo in Four reagions & Eight contries
  47. Case Studies & Maintainer Track & Storage Sessions Vitess: Stateless

    Storage in the Cloud - Sugu Sougoumarane, PlanetScale Design principles for making Vitess Cloud Native 1 2 3 4 0 0 5
  48. Summary • Vites is graduation with v4 !! • The

    number of Vitess hires is increasing over the past year • Not Kubernetes + Vitess & Kubernetes + Vitess case studies • Gained knowledge that it is necessary to think about GDPR