Overview KubeCon + CloudNativeCon NA 2019 Vitess Sessions 2.Case Studies Tuesday, November 19 • 11:50am - 12:25pm Scaling Resilient Systems: A Journey into Slack's Database Service - Rafael Chacon & Guido Iaquinti, Slack Thursday, November 21 • 2:25pm - 3:00pm Gone in 60 Minutes: Migrating 20 TB from AKS to GKE in an Hour with Vitess - Derek Perkins, Nozzle http://bit.ly/kubecon2019na_vitess2 http://bit.ly/kubecon2019na_vitess_m2 http://bit.ly/kubecon2019na_vitess_m3
Overview KubeCon + CloudNativeCon NA 2019 Vitess Sessions 3.Maintainer Track Sessions Tuesday, November 19 • 11:50am - 12:25pm How to Migrate a MySQL Database to Vitess - Sugu Sougoumarane & Morgan Tocker, PlanetScale Wednesday, November 20 • 2:25pm - 3:00pm Geo-partitioning with Vitess - Deepthi Sigireddi & Jitendra Vaidya, PlanetScale http://bit.ly/kubecon2019na_vitess3 http://bit.ly/kubecon2019na_vitess_m4 http://bit.ly/kubecon2019na_vitess4 http://bit.ly/kubecon2019na_vitess_m5
Overview November 5, 2019 Cloud Native Computing Foundation Announces Vitess Graduation Vitess is the eighth project to graduate, following Kubernetes, Prometheus, Envoy, CoreDNS, containerd, Fluentd, and Jaeger. Version is Vitess 4.0.1. Announcement : http://bit.ly/vitess_graduation Vitess graduated one year and nine months after becoming the CNCF Incubation Project in February 2018. 1.Adoption “Mission-critical production workloads running in real companies” 2.Maintainer Diversity “Identify long-term contributions from multiple organizations, then drill down into project details and test how to do for your design strategy.” 3.Project Health “Determining the appropriateness of project health”
Overview “Slack's Vitess introduction was due to the very rapidly changing business needs and a system that was flexible enough to accommodate those changes.” “Slack currently has a goal of about 35% migration to Vitess and 100% next year.”
Overview “JD.com is China's largest online shopping site. China's Black Friday sale has achieved a huge scale of about 4,000 key spaces, over 30,000 pods, and a QPS of 35 million (peak).”
Overview KubeCon + CloudNativeCon China 2019 Vitess Sessions Tuesday, June 25 • 11:00 - 11:35 Two Years with Vitess: How JD.com Runs the World's Largest Vitess - Xuhaihua & Jin Ke Xie , JD.com http://bit.ly/kubecon2019china_vitess1 http://bit.ly/kubecon2019china_vitess_m1
Overview “Launched startup company Vitess called Nozzle. All of their applications were run on Kubernetes and moved from AKS to GKE, realizing "No Vendor Lock-in" in Kubernetes and Vitess.
Overview “Until now, Technical Complexity & Organizational Complexity & Process Complexity is born.” “You want these DB pods to be able to securely communicate for DB sharding & replication.
What’s Vitess vtgate A proxy server that routes queries from application to vttablet and returns the results to the client tablet mysqld and vttablet set vttablet Proxy server placed in front of MySQL (mysqld), also serves to protect MySQL from query rewriting, deduplication, and harmful queries vtctld HTTP server that serves as the window for management operations (GUI) of Vitess cluster vtctl Command line tool for managing Vitess cluster (CLI) Topology Metadata store that manages configuration information of Vitess cluster, Kubernetes supports etcd, and other than etcd supports ZooKeeper Technical Terms
What’s Vitess Sharding • Store data divided into two or more databases • Scale-out and performance improvement by adding Shard Sharding of Vitess • Vertical Sharding Store in multiple databases for each table • Horizontal Sharding Divide one table into multiple shards and store them in multiple databases
What’s Vitess Table Sharding VSchema is Sharding definition, routing information Refer to VTworkerVSchema and execute Sharding split processing Refer to VSchema and route to the appropriate Shard Keyspace is a logical database that combines multiple shards. Recognized as one database from application.
Case Studies & Maintainer Track & Storage Sessions In this talk, Rafael and Guido will share an overview about how Slack designed, built, scaled and then iterated to improve its distributed database service based on top of Vitess, now a CNCF project. The Databases team at Slack scaled a Vitess cluster from 0 to spikes of 2.7 Million queries per second. This journey has taught us how to operate a database cluster with more than 2000 nodes and expecting to growth to more than 3500 in the next 12 months.
Case Studies & Maintainer Track & Storage Sessions 1.Databases at Slack Current status Legacy Shards Vitess Shards In progress migration of our entire dataset to Vitess.
Case Studies & Maintainer Track & Storage Sessions Why are we migrating? • “Migrating to Vitess at (Slack) Scale” - Mike Demmer (https://www.percona.com/live/18/sessions/migrating-to-vitess-at-slack-scale) • “Designing and launching the next-generation database system at Slack: from whiteboard to production” - Guido Iaquinti (https://www.percona.com/live/18/sessions/designing-and-launching-the-next-generation-database-system-slack-from-whiteboard-to-production) • “Smooth scaling: Slack’s journey toward a new database” - Ameet Kotian (https://conferences.oreilly.com/velocity/vl-ny/public/schedule/detail/69885) For more details please see the presentations on the slide.
Case Studies & Maintainer Track & Storage Sessions tl;dr; shard size limits, inefficient resource distribution, operational overhead, single sharding model “While Slack users are on the rise, they are unable to scale quickly and flexibly and cannot meet business needs.” Why are we migrating?
Case Studies & Maintainer Track & Storage Sessions • Scaling and sharding flexibility without changing SQL (much) • MySQL core maintains operator and developer know-how • Proven at scale at YouTube and more recently others • Active developer community and approachable code base Why Vitess?
Case Studies & Maintainer Track & Storage Sessions Stats • Queries per day: 53+ billion • Storage provisioned: 7.5+ PB • Served by legacy infrastructure: ~60% • Served by Vitess: ~40% • Target: 70% served by Vitess by EOY Aim to complete the transition to Vitess within 2020 !!
Case Studies & Maintainer Track & Storage Sessions 2.Running databases in the cloud Immutable infrastructure Instance failure Durability through replication
Case Studies & Maintainer Track & Storage Sessions 3.Fault tolerance & isolation Slack cloud infrastructure • Amazon EC2 is hosted in multiple locations world-wide. • These locations are composed of Regions and Availability Zones (AZ’s). • Each Region is a separate geographic area. • AZ’s in a Region are connected through low-latency links.
Case Studies & Maintainer Track & Storage Sessions Vitess initial deployment • A single cell across multiple AZ’s (fundamental). • Global and local topology using the same Consul cluster (circumstantial). Topology : Vitess Key-Value Store Consul : Service Discovery
Case Studies & Maintainer Track & Storage Sessions Current deployment • Isolated topologies (one dc for each AZ and one for the global topo). • Blast radius is mapped to physical infrastructure.
Case Studies & Maintainer Track & Storage Sessions 4.Key Lessons Complex system failures • Complex systems are intrinsically dangerous systems. • Complex systems are heavily and successfully defended against failure. • Catastrophe is always just around the corner. • Complex systems contain changing mixtures of failures latent within them. How Complex Systems Fail – MIT (https://web.mit.edu/2.75/resources/random/How%20Complex%20Systems%20Fail.pdf)
Case Studies & Maintainer Track & Storage Sessions Complex system failures Humility towards complexity. Reach out to other fields and learn from their experience.
Case Studies & Maintainer Track & Storage Sessions Gone in 60 Minutes: Migrating 20 TB from AKS to GKE in an Hour with Vitess - Derek Perkins, Nozzle • The holy grail of Cloud Native tech is to have zero vendor lock-in • migrate a high throughput production workload of 20 TB from Azure (AKS) to Google (GKE) in under an hour
Case Studies & Maintainer Track & Storage Sessions AKS GCS Node Pool Internal App Deploy all internal applications GKE Deploy cert-manager external dns nginx ingress Set up node pools for dedicated Vitess tablets
Case Studies & Maintainer Track & Storage Sessions "Google Cloud Platform drives our analytics and machine learning needs. With BigQuery and Cloud Machine Learning Engine on Google Kubernetes Engine, we have an insights platform that's customized for our performance, IT, and cost requirements." —Derek Perkins, Founder & CEO, Nozzle https://cloud.google.com/customers/nozzle/ Cloud Tasks GKE Bigquery
Case Studies & Maintainer Track & Storage Sessions How to Migrate a MySQL Database to Vitess - Sugu Sougoumarane & Morgan Tocker, PlanetScale • Vitess basics • a demo of live-migrating an existing MySQL installation into Vitess. → No Demo ! 1 2 3 4 5 0
Case Studies & Maintainer Track & Storage Sessions GDPR(General Data Protection Regulation) Rules aimed at strengthening and integrating data protection for all individuals within the European Union. Custom Sharding Scheme is one of the ways Vitess responds to GDPR's request to “Localize data storage locations in the country of residence of users”. There will be such rules outside the EU.
Summary • Vites is graduation with v4 !! • The number of Vitess hires is increasing over the past year • Not Kubernetes + Vitess & Kubernetes + Vitess case studies • Gained knowledge that it is necessary to think about GDPR