Upgrade to Pro — share decks privately, control downloads, hide ads and more …

KubeCon CloudNativeCon 2019 SanDiego Recap 〜Vitess〜 English.ver

cyberblack28
December 11, 2019

KubeCon CloudNativeCon 2019 SanDiego Recap 〜Vitess〜 English.ver

Cloud Native Meetup Tokyo #11 KubeCon Recap

cyberblack28

December 11, 2019
Tweet

More Decks by cyberblack28

Other Decks in Technology

Transcript

  1. Cloud Native Meetup Tokyo #11 KubeCon + CloudNativeCon Recap
    2019.12.10 @CyberAgent
    © cyberblack28
    KubeCon + CloudNativeCon 2019
    San Diego Recap ~ Vitess ~

    View Slide

  2. Profile
    Name : Yutaka Ichikawa
    Twitter : cyberblack28
    Hatena Blog : https://cyberblack28.hatenablog.com/
    SpeakerDeck : https://speakerdeck.com/cyberblack28
    Job
    Educational Solution Architect
    Developer Advocate / Technical Evangelist
    Infrastructure Engineer
    Frontend Engineer
    Community & Certification
    Publications
    #deepcn
    #rancherjp
    CKA KCM100
    CKAD
    2018 2019

    View Slide

  3. Let’s Start
    Cloud Native
    AP Communications Co., Ltd

    View Slide

  4. Information
    http://bit.ly/kubecon2018na_recap

    View Slide

  5. 1. Overview
    2. What’s Vitess
    3. Case Studies & Maintainer Track &
    Storage Sessions
    4. Summary
    Agenda

    View Slide

  6. Overview

    View Slide

  7. Overview
    KubeCon + CloudNativeCon NA 2019 Vitess Sessions
    1.Keynote Sessions
    Tuesday, November 19 • 9:20am - 9:45am
    Keynote: CNCF Project Updates - Bryan Liles, KubeCon + CloudNativeCon North America
    2019 Co-Chair & Senior Staff Engineer, VMware
    http://bit.ly/kubecon2019na_vitess1 http://bit.ly/kubecon2019na_vitess_m1
    Wednesday, November 20 • 9:27am - 9:32am
    Sponsored Keynote: Network, Please Evolve – Chapter 2 - Vijoy Pandey, Vice
    President/CTO Cloud, Cisco
    http://bit.ly/kubecon2019na_vitess1_2 http://bit.ly/kubecon2019na_vitess_m1_2

    View Slide

  8. Overview
    KubeCon + CloudNativeCon NA 2019 Vitess Sessions
    2.Case Studies
    Tuesday, November 19 • 11:50am - 12:25pm
    Scaling Resilient Systems: A Journey into Slack's Database Service - Rafael Chacon &
    Guido Iaquinti, Slack
    Thursday, November 21 • 2:25pm - 3:00pm
    Gone in 60 Minutes: Migrating 20 TB from AKS to GKE in an Hour with Vitess - Derek
    Perkins, Nozzle
    http://bit.ly/kubecon2019na_vitess2 http://bit.ly/kubecon2019na_vitess_m2
    http://bit.ly/kubecon2019na_vitess_m3

    View Slide

  9. Overview
    KubeCon + CloudNativeCon NA 2019 Vitess Sessions
    3.Maintainer Track Sessions
    Tuesday, November 19 • 11:50am - 12:25pm
    How to Migrate a MySQL Database to Vitess - Sugu Sougoumarane & Morgan Tocker,
    PlanetScale
    Wednesday, November 20 • 2:25pm - 3:00pm
    Geo-partitioning with Vitess - Deepthi Sigireddi & Jitendra Vaidya, PlanetScale
    http://bit.ly/kubecon2019na_vitess3 http://bit.ly/kubecon2019na_vitess_m4
    http://bit.ly/kubecon2019na_vitess4 http://bit.ly/kubecon2019na_vitess_m5

    View Slide

  10. Overview
    KubeCon + CloudNativeCon NA 2019 Vitess Sessions
    4.Storage Sessions
    Tuesday, November 19 • 3:20pm - 3:55pm
    Vitess: Stateless Storage in the Cloud - Sugu Sougoumarane, PlanetScale
    http://bit.ly/kubecon2019na_vitess5 http://bit.ly/kubecon2019na_vitess_m6

    View Slide

  11. Overview
    Keynote Sessions
    Keynote: CNCF Project Updates - Bryan Liles, KubeCon + CloudNativeCon North America 2019
    Co-Chair & Senior Staff Engineer, VMware

    View Slide

  12. Overview
    November 5, 2019
    Cloud Native Computing Foundation Announces Vitess Graduation
    Vitess is the eighth project to graduate,
    following Kubernetes, Prometheus, Envoy, CoreDNS, containerd,
    Fluentd, and Jaeger. Version is Vitess 4.0.1.
    Announcement : http://bit.ly/vitess_graduation
    Vitess graduated one year and nine months after becoming the CNCF
    Incubation Project in February 2018.
    1.Adoption
    “Mission-critical production workloads running in real companies”
    2.Maintainer Diversity
    “Identify long-term contributions from multiple organizations, then drill
    down into project details and test how to do for your design strategy.”
    3.Project Health
    “Determining the appropriateness of project health”

    View Slide

  13. Overview
    2019 San Diego 2018 Seattle

    View Slide

  14. Overview
    GitHub Investigate (As of December 2019)

    View Slide

  15. Overview
    “Slack's Vitess introduction was due to the very rapidly changing business needs and a system that was flexible
    enough to accommodate those changes.”
    “Slack currently has a goal of about 35% migration to Vitess and 100% next year.”

    View Slide

  16. Overview
    “JD.com is China's largest online shopping site. China's Black Friday sale has achieved a huge scale of about 4,000
    key spaces, over 30,000 pods, and a QPS of 35 million (peak).”

    View Slide

  17. Overview
    KubeCon + CloudNativeCon China 2019 Vitess Sessions
    Tuesday, June 25 • 11:00 - 11:35
    Two Years with Vitess: How JD.com Runs the World's Largest Vitess - Xuhaihua & Jin Ke Xie , JD.com
    http://bit.ly/kubecon2019china_vitess1 http://bit.ly/kubecon2019china_vitess_m1

    View Slide

  18. Overview
    “Launched startup company Vitess called Nozzle. All of their applications were run on Kubernetes and moved from
    AKS to GKE, realizing "No Vendor Lock-in" in Kubernetes and Vitess.

    View Slide

  19. Overview
    Keynote Sessions
    Sponsored Keynote: Network, Please Evolve – Chapter 2 - Vijoy Pandey, Vice President/CTO
    Cloud, Cisco

    View Slide

  20. Overview
    Abstracting the L2 / L3 network part and realizing communication functions
    according to Kubernetes functions

    View Slide

  21. Overview
    “Until now, Technical Complexity &
    Organizational Complexity & Process
    Complexity is born.”
    “You want these DB pods to be able to
    securely communicate for DB sharding &
    replication.

    View Slide

  22. Overview
    NSM-driven DB sharding & replication enables comprehensive, efficient communication,
    security and observability.

    View Slide

  23. What’s Vitess

    View Slide

  24. What’s Vitess

    View Slide

  25. What’s Vitess
    A cloud-native database cluster system
    that achieves high availability and scale
    on a large scale
    with Sharding MySQL

    View Slide

  26. What’s Vitess
    Architecture

    View Slide

  27. What’s Vitess
    Vitess on Kubernetes

    View Slide

  28. What’s Vitess
    vtgate A proxy server that routes queries from application to vttablet and returns the results to the client
    tablet mysqld and vttablet set
    vttablet Proxy server placed in front of MySQL (mysqld), also serves to protect MySQL from query rewriting,
    deduplication, and harmful queries
    vtctld HTTP server that serves as the window for management operations (GUI) of Vitess cluster
    vtctl Command line tool for managing Vitess cluster (CLI)
    Topology Metadata store that manages configuration information of Vitess cluster, Kubernetes supports etcd,
    and other than etcd supports ZooKeeper
    Technical Terms

    View Slide

  29. What’s Vitess
    Sharding
    • Store data divided into two or more databases
    • Scale-out and performance improvement by adding Shard
    Sharding of Vitess
    • Vertical Sharding
    Store in multiple databases for each table
    • Horizontal Sharding
    Divide one table into multiple shards and store them in multiple databases

    View Slide

  30. What’s Vitess
    Table Sharding
    VSchema is
    Sharding definition,
    routing information
    Refer to VTworkerVSchema and execute Sharding split processing
    Refer to VSchema
    and route to the
    appropriate Shard
    Keyspace is a
    logical
    database that
    combines
    multiple
    shards.
    Recognized
    as one
    database
    from
    application.

    View Slide

  31. What’s Vitess
    Reference Docs
    • Vitess is a database clustering system for horizontal scaling of MySQL
    https://vitess.io/
    • Vitess Twitter
    https://twitter.com/vitessio
    • CrashAcademy 「CNDJP 勉強会 #8 Vitessのパフォーマンスと運用性を検証してみた」
    https://crash.academy/ng/video/412/1736
    • Vitess Slack
    https://vitess.slack.com/
    • Vitess Github
    https://github.com/vitessio/vitess

    View Slide

  32. Case Studies & Maintainer Track &
    Storage Sessions

    View Slide

  33. Case Study Slack

    View Slide

  34. Case Studies & Maintainer Track & Storage Sessions
    In this talk, Rafael and Guido will share an overview about how Slack designed, built, scaled and then
    iterated to improve its distributed database service based on top of Vitess, now a CNCF project. The
    Databases team at Slack scaled a Vitess cluster from 0 to spikes of 2.7 Million queries per second.
    This journey has taught us how to operate a database cluster with more than 2000 nodes and
    expecting to growth to more than 3500 in the next 12 months.

    View Slide

  35. Case Studies & Maintainer Track & Storage Sessions

    View Slide

  36. Case Studies & Maintainer Track & Storage Sessions

    View Slide

  37. Case Studies & Maintainer Track & Storage Sessions
    1.Databases at Slack
    Current status
    Legacy Shards Vitess Shards
    In progress migration of our entire dataset to Vitess.

    View Slide

  38. Case Studies & Maintainer Track & Storage Sessions
    Legacy Shards
    Vitess Shards
    Application level
    team-sharded active
    master-master
    MySQL setup.
    Master-replica
    MySQL setup fully
    managed by Vitess.

    View Slide

  39. Case Studies & Maintainer Track & Storage Sessions
    Why are we migrating?
    • “Migrating to Vitess at (Slack) Scale” - Mike Demmer
    (https://www.percona.com/live/18/sessions/migrating-to-vitess-at-slack-scale)
    • “Designing and launching the next-generation database system at Slack:
    from whiteboard to production” - Guido Iaquinti
    (https://www.percona.com/live/18/sessions/designing-and-launching-the-next-generation-database-system-slack-from-whiteboard-to-production)
    • “Smooth scaling: Slack’s journey toward a new database” - Ameet Kotian
    (https://conferences.oreilly.com/velocity/vl-ny/public/schedule/detail/69885)
    For more details please see the presentations on the slide.

    View Slide

  40. Case Studies & Maintainer Track & Storage Sessions
    tl;dr; shard size limits, inefficient resource distribution, operational overhead, single
    sharding model
    “While Slack users are on the rise, they are unable to scale quickly and flexibly and
    cannot meet business needs.”
    Why are we migrating?

    View Slide

  41. Case Studies & Maintainer Track & Storage Sessions
    • Scaling and sharding flexibility without changing SQL (much)
    • MySQL core maintains operator and developer know-how
    • Proven at scale at YouTube and more recently others
    • Active developer community and approachable code base
    Why Vitess?

    View Slide

  42. Case Studies & Maintainer Track & Storage Sessions
    Stats
    • Queries per day: 53+ billion
    • Storage provisioned: 7.5+ PB
    • Served by legacy infrastructure: ~60%
    • Served by Vitess: ~40%
    • Target: 70% served by Vitess by EOY
    Aim to complete the transition to Vitess within 2020 !!

    View Slide

  43. Case Studies & Maintainer Track & Storage Sessions

    View Slide

  44. Case Studies & Maintainer Track & Storage Sessions
    2.Running databases in the cloud
    Immutable infrastructure
    Instance failure
    Durability through replication

    View Slide

  45. Case Studies & Maintainer Track & Storage Sessions
    How we run Vitess
    EC2
    Percona MySQL5.7
    ASG for stateless components
    Ephemeral NVMe (no EBS)

    View Slide

  46. Case Studies & Maintainer Track & Storage Sessions
    Not Kubernetes

    View Slide

  47. Case Studies & Maintainer Track & Storage Sessions

    View Slide

  48. Case Studies & Maintainer Track & Storage Sessions
    3.Fault tolerance & isolation
    Slack cloud infrastructure
    • Amazon EC2 is hosted in multiple locations world-wide.
    • These locations are composed of Regions and Availability Zones
    (AZ’s).
    • Each Region is a separate geographic area.
    • AZ’s in a Region are connected through low-latency links.

    View Slide

  49. Case Studies & Maintainer Track & Storage Sessions
    Vitess initial deployment
    • A single cell across multiple AZ’s (fundamental).
    • Global and local topology using the same Consul cluster
    (circumstantial).
    Topology : Vitess Key-Value Store
    Consul : Service Discovery

    View Slide

  50. Case Studies & Maintainer Track & Storage Sessions

    View Slide

  51. Case Studies & Maintainer Track & Storage Sessions
    Resilient systems
    • Minimize the blast radius.
    • Isolation is key.
    • Understand your dependencies.

    View Slide

  52. Case Studies & Maintainer Track & Storage Sessions
    Current deployment
    • Isolated topologies (one dc for each AZ and one for the global
    topo).
    • Blast radius is mapped to physical infrastructure.

    View Slide

  53. Case Studies & Maintainer Track & Storage Sessions
    We have benefited already
    • AZ failure during backup time.
    • Single cell was affected!

    View Slide

  54. Case Studies & Maintainer Track & Storage Sessions
    Performance wins

    View Slide

  55. Case Studies & Maintainer Track & Storage Sessions

    View Slide

  56. Case Studies & Maintainer Track & Storage Sessions
    4.Key Lessons
    Complex system failures
    • Complex systems are intrinsically dangerous systems.
    • Complex systems are heavily and successfully defended against failure.
    • Catastrophe is always just around the corner.
    • Complex systems contain changing mixtures of failures latent within them.
    How Complex Systems Fail – MIT
    (https://web.mit.edu/2.75/resources/random/How%20Complex%20Systems%20Fail.pdf)

    View Slide

  57. Case Studies & Maintainer Track & Storage Sessions
    Complex system failures
    Humility towards complexity.
    Reach out to other fields and learn
    from their experience.

    View Slide

  58. Case Studies & Maintainer Track & Storage Sessions

    View Slide

  59. Case Study Nozzle

    View Slide

  60. Case Studies & Maintainer Track & Storage Sessions
    Gone in 60 Minutes: Migrating 20 TB from AKS to GKE in an Hour with Vitess -
    Derek Perkins, Nozzle
    • The holy grail of Cloud Native tech is to have zero vendor lock-in
    • migrate a high throughput production workload of 20 TB from Azure (AKS) to Google (GKE) in under an hour

    View Slide

  61. Case Studies & Maintainer Track & Storage Sessions
    Vendor lock-in is preferably zero, but
    there are good cases.
    But right judgment is important.

    View Slide

  62. Case Studies & Maintainer Track & Storage Sessions

    View Slide

  63. Case Studies & Maintainer Track & Storage Sessions
    Provisioning / Autoscaling Database

    View Slide

  64. Case Studies & Maintainer Track & Storage Sessions
    Queues

    View Slide

  65. Case Studies & Maintainer Track & Storage Sessions
    Network Egress

    View Slide

  66. Case Studies & Maintainer Track & Storage Sessions

    View Slide

  67. Case Studies & Maintainer Track & Storage Sessions

    View Slide

  68. Case Studies & Maintainer Track & Storage Sessions

    View Slide

  69. Case Studies & Maintainer Track & Storage Sessions

    View Slide

  70. Case Studies & Maintainer Track & Storage Sessions

    View Slide

  71. Case Studies & Maintainer Track & Storage Sessions

    View Slide

  72. Case Studies & Maintainer Track & Storage Sessions

    View Slide

  73. Case Studies & Maintainer Track & Storage Sessions

    View Slide

  74. Case Studies & Maintainer Track & Storage Sessions

    View Slide

  75. Case Studies & Maintainer Track & Storage Sessions
    AKS GKE
    GCS
    Backup
    Restore
    cross-cluster networking for zero downtime

    View Slide

  76. Case Studies & Maintainer Track & Storage Sessions

    View Slide

  77. Case Studies & Maintainer Track & Storage Sessions
    AKS
    GCS
    Node
    Pool
    Internal
    App
    Deploy all internal applications
    GKE
    Deploy cert-manager external dns nginx ingress
    Set up node pools for dedicated Vitess tablets

    View Slide

  78. Case Studies & Maintainer Track & Storage Sessions

    View Slide

  79. Case Studies & Maintainer Track & Storage Sessions
    AKS
    GCS
    Internal
    App Scale down
    Node
    Pool
    Internal
    App
    GKE
    Shut down
    Backup Start

    View Slide

  80. Case Studies & Maintainer Track & Storage Sessions

    View Slide

  81. Case Studies & Maintainer Track & Storage Sessions

    View Slide

  82. Case Studies & Maintainer Track & Storage Sessions

    View Slide

  83. Case Studies & Maintainer Track & Storage Sessions
    "Google Cloud Platform drives our analytics
    and machine learning needs. With BigQuery
    and Cloud Machine Learning Engine on Google
    Kubernetes Engine, we have an insights
    platform that's customized for our performance,
    IT, and cost requirements."
    —Derek Perkins, Founder & CEO, Nozzle
    https://cloud.google.com/customers/nozzle/
    Cloud Tasks GKE Bigquery

    View Slide

  84. Mantainer Track

    View Slide

  85. Case Studies & Maintainer Track & Storage Sessions
    How to Migrate a MySQL Database to Vitess - Sugu Sougoumarane & Morgan Tocker, PlanetScale
    • Vitess basics
    • a demo of live-migrating an existing MySQL installation into Vitess. → No Demo !
    1 2
    3 4 5
    0

    View Slide

  86. Case Studies & Maintainer Track & Storage Sessions
    Geo-partitioning with Vitess - Deepthi Sigireddi & Jitendra Vaidya, PlanetScale
    • Problems and solutions in GDPR
    • Vitess approach to Geo-patitioning based on GDPR
    • Vitess custom sharding scheme demo

    View Slide

  87. Case Studies & Maintainer Track & Storage Sessions
    GDPR(General Data Protection Regulation)
    Rules aimed at strengthening and integrating data protection for all individuals within the European
    Union.
    Custom Sharding Scheme is one of the ways Vitess responds to GDPR's request to “Localize data storage
    locations in the country of residence of users”.
    There will be such rules outside the EU.

    View Slide

  88. Case Studies & Maintainer Track & Storage Sessions
    Custom Sharding Scheme Demo in Four reagions & Eight contries

    View Slide

  89. Storage Sessions

    View Slide

  90. Case Studies & Maintainer Track & Storage Sessions
    Vitess: Stateless Storage in the Cloud - Sugu Sougoumarane, PlanetScale
    Design principles for making Vitess Cloud Native
    1 2
    3 4
    0
    0
    5

    View Slide

  91. Bonus

    View Slide

  92. Bounus
    Vitess as a Service
    https://planetscale.com/

    View Slide

  93. Summary

    View Slide

  94. Summary
    • Vites is graduation with v4 !!
    • The number of Vitess hires is increasing over the past year
    • Not Kubernetes + Vitess & Kubernetes + Vitess case studies
    • Gained knowledge that it is necessary to think about GDPR

    View Slide

  95. Thank you !!

    View Slide