Upgrade to Pro — share decks privately, control downloads, hide ads and more …

MySQL and Vitess (and Kubernetes) at HubSpot

MySQL and Vitess (and Kubernetes) at HubSpot

In 2017-2018, to simplify the deployment, allow easier automation and have better scaling options, HubSpot MySQL databases, previously run on RDS and EC2 instances, were migrated to Vitess and Kubernetes. In this talk, we share our experience of 4+ years running Percona Server with Vitess on Kubernetes in AWS: things that worked well, others that we had to adapt, and some open problems we are still facing. We will cover sharding, the automation tooling, and our journey to multi-region. The audience will be presented with a real-world, large MySQL/Vitess/Kubernetes deployment: you will learn from what we built, and hopefully avoid some of our mistakes. Come to this talk to see how HubSpot operate 800+ MySQL MySQL replication clusters / keyspaces with Vitess and Kubernetes.

Jean-François Gagné

May 25, 2023
Tweet

More Decks by Jean-François Gagné

Other Decks in Technology

Transcript

  1. by Jean-François Gagné <jgagne AT hubspot DOT com> and Swetha

    Narayanaswamy <swetha AT hubspot DOT com> (presented at Percona Live Denver 2023) MySQL and Vitess (and Kubernetes) at
  2. The Speakers Jean-François Gagné System / Infrastructure Engineer and MySQL

    Expert Swetha Narayanaswamy Director, Engineering, Data Infrastructure
  3. Agenda Quick Introduction to Vitess and Kubernetes Presentation of HubSpot

    MySQL Infrastructure Zoom in Percona Server and MySQL at HubSpot Zoom in Kubernetes and Vitess at HubSpot
  4. HubSpot Organisation “We run thousands of MySQL databases, supporting 1500

    engineers and major portions of the product, and we sleep at night !” • What we strive for ◦ High-class developer experience enabling self-service provisioning and management ◦ Enable clean safe high velocity development of backend microservices that depend on MySQL • How we operate ◦ Small autonomous teams ◦ Split into Orchestration (resiliency), Services (developer self-service), Performance (scaling) ◦ Engineers from all three teams balance the role of guiding the rest of the organization
  5. HubSpot Organisation “We run thousands of MySQL databases, supporting 1500

    engineers and major portions of the product, and we sleep at night !” • Things we got right ◦ Ease and reliability of provisioning a new keyspace / database ◦ Tools built to support development and operation ◦ Workflows we’ve built to safely upgrade the system • Things we’re working on ◦ Automation around sharding (like self-service resharding) ◦ Client connection primitives (No 1:1 client to database connection, breaks circuit breakers) ◦ Improvements to our operator
  6. Vitess: middleware for MySQL Vitess Overview Vitess provides all the

    nuts and bolts to run MySQL at scale, including: • Scalability and reliability data-plane features (like sharding, replica reads, and query killing) • Data optimisation features (like query buffering and caching, and connection pooling) • Operation features (like reparent, backups, resharding, and move-table) • Generic vReplication feature for change-data-capture • Service Discovery (vtGate) backed by a service registry (Topology Server) • MySQL sidecar (vtTablet) providing both data and control plane functions • UIs (vtAdmin, vtCtld and vtOrc) for administrations and automation (including failover) • Probably others (above not exhaustive)
  7. Vitess Overview - Terminology • Shard (Keyspace-Shard): keyspace subset: MySQL

    primary and its replicas (a replica-set) • Keyspace: a Vitess logical schema / database ◦ The simplest keyspace (not sharded) is a single keyspace-shard ◦ A sharded keyspace contains many keyspace-shards • Reparent: the action of changing the primary for a keyspace-shard ◦ There two types of reparent: planned and emergency (also called switchover and failover in non-Vitess context) (omitting details about reparentShard and reparentTablet)
  8. Kubernetes Kubernetes Overview Kubernetes is platform for managing containerized workloads

    • It provides basic mechanisms for the deployment, maintenance, and scaling of applications • Containers: OS-level virtualization / Linux cgroup • Pod: the scheduling unit of Kubernetes, contains one or more co-located containers (Similar to a vm, a pod is assigned an ip address) • Replica-Set: stable set of pods running at any given time (Kubernetes guarantee a specified number of pods, with self-healing when losing one)
  9. HubSpot MySQL Infrastructure HubSpot MySQL Infrastructure • Inventory of what

    we have • How we got there: migration to Vitess and Kubernetes (from RDS and AWS/EC2) • Hublets: our current multi-region deployment
  10. HubSpot MySQL Infrastructure - What we have - Environment Overview

    of our environment: • 2 active AWS regions (others for backups): US East and Germany • 2 production Kubernetes clusters, 2 development / QA clusters, 2 test clusters • 2 production Vitess clusters, 2 development / QA clusters, 15+ clusters for tests • Leverage EBS for storage, and snapshot for data history (we also do backups in S3) (snapshots are quick to use – restores need copying data – but are slow to “hydrate” in AWS)
  11. HubSpot MySQL Infrastructure - What we have - Vitess &

    MySQL (Percona Server) Overview of our Vitess & MySQL installation: • Vitess version 14 (was v5, we just finished upgrading in Q1 of 2023) • Percona Server 5.7 (we will eventually go to 8.0 or 8.1) • 800+ Production keyspaces in US East (+75% for Germany, x2 for QA) • 10+ sharded keyspaces, with up to 32 shards • 1000+ primaries / keyspace-shards (each primary has at least two replicas, sometimes more) • 3100+ mysqld (primaries and replicas) • We do not persist / flush trx to disk, but we persist to a replica when committing on the primary ◦ We run with sync_binlog = 0 and innodb_flush_log_at_trx_commit = 2 (for replication throughput with remote storage / EBS, the MTR solution is still painful) ◦ We use lossless semi-sync for data persistence (that is why we have min. two replicas)
  12. HubSpot MySQL Infrastructure - What we have - Tooling 100+

    Schema Changes / Migrations per week (counting all envs; most of them with gh-ost) Overview of our tooling: • We have our own fork of gh-ost and Vitess • We use upstream Percona Server, XtraBackup, and Percona Toolkit • A lot of developer tooling: ◦ SQL Cop, Explain tool, Slow log viewer, … ◦ Query Report based on the digest table (more about this in MySQL-dedicated section) (if you want to know more, Mali will talk about our tooling at 3:30pm - Evergreen D) (Mali’s talk title: How To Not Gatekeep The Database - Safely) (Mali’s talk subject: How 1500 developers do most of their work without dba involvement)
  13. HubSpot MySQL Infrastructure - How we got there - Migration

    from RDS and AWS/EC2 Why did we migrate to Vitess and Kubernetes • Goals ◦ Create a self-service platform: a DB in minutes & zero manual actions ◦ Resource efficiency: Right-size containers, and better backups (not exclusively on snaps) ◦ Scale: Data size agnostic way to choose DB to use • Results ◦ Order of magnitude of scale vs previous system ◦ Eliminate manual actions for operational work and adopt K8s API driven automation ◦ Vitess for operation (backup/restore), replication topology management (failovers) • Post Migration ◦ Running in AWS US East, replicas in different availability zones ◦ Backup replicated to a secondary region ◦ We have our own Kubernetes operator and adjacent automation tooling
  14. HubSpot MySQL Infrastructure - Vitess Hublets Our Journey to Multi-Region:

    An Introduction https://product.hubspot.com/blog/developing-an-eu-data-center • In July 2021, HubSpot becomes multi-region by launchings Hublets • Legacy becomes the na1 Hublet, and the eu1 Hublet is added → x2 Vitess clusters, x2 Kubernetes clusters, +75% keyspaces (for Prod and QA) • To provide unique ids across Hublets, we designed and built vTickets in our fork of Vitess • To provide global data, we implement Cross-Hublet Replication (na1 → eu1)
  15. HubSpot MySQL Infrastructure - vTickets Our Journey to Multi-Region: vTickets,

    Hublet-Compatible MySQL Auto-Increment https://product.hubspot.com/blog/our-journey-to-multiregion-vtickets • 75% of the keyspaces have at least one table with an auto-increment • vTickets is a HubSpot extension to Vitess providing unique id generation across Hublets • vTickets is based on the vSequence feature of Vitess (for unique ids across shards) • allocation of vSequences is at the vtGate level, vTickets work at the vtTablet level • vTickets is more resilient and “quicker” (less latency) than vSequences • vTickets might allocate out-of-order ids at the global level (never at the Hublet or Shard level) (id 105 might be allocated on a shard while 110 is already allocated on another shard or Hublet)
  16. HubSpot MySQL Infrastructure - vTickets - High Level Design •

    Next id in Zookeper • vTickets Service • Srv called by vtTablet
  17. HubSpot MySQL Infrastructure - vTickets - Full Design with Caching

    and Prefetching • One shard shown, but there can be many • One vTickets Service per Hublet (local) • One Global vTicket Service (called by Local Services) • Cache in Local Services (resilience to wide area network outages) • Cache in vtTablet (reducing insert latency) (resilience to Local Srv. transient outages) • Prefetching in vtTablet and Local Services (more resilience and fewer tail latencies)
  18. HubSpot MySQL Infrastructure - Cross Datacenter Replication Our Journey to

    Multi-Region: Cross Datacenter MySQL Data Replication https://product.hubspot.com/blog/mysql-data-replication • Binlog Processor as a na1 replica with replication filters • Cron job pushing binary logs to S3, and pulling and applying them in eu1 • Bug#91812: mysqlbinlog | mysql fails with GTID and rpl filters on ts default (unfixed in May 2023)
  19. Percona Server at HubSpot Percona Server at HubSpot • We

    are running Percona Server 5.7 • Percona Server is a close fork of Oracle MySQL (MySQL is covered in the next section) • Percona Server has improvements over Oracle MySQL, some of which are super useful ! (These improvements come with a minor drawback)
  20. Percona Server at HubSpot - HubSpot is happy with Percona

    Server • The Extended Slow Log gives good InnoDB insight in slow queries (we wish these were in P_S) • XtraBackup and lock-free hot backup allows non-intrusive backups (not lock-free with MySQL) (even if these are plan B, plan A being cold backups, hot backups are sometimes needed) • Percona Server also has bugs fixes not in MySQL (more about this in the MySQL section) • We had to work around a crashing bug in STOP SLAVE SQL_THREAD with MTR enabled (Percona fixed Bug#96400 - still unfixed in MySQL - but regression fixed later in PS-8030) https://jfg-mysql.blogspot.com/2023/05/avoiding-stop-slave-crash-with-mtr-in-ps-5.7.html • There were deadlock problem with Multi-Threaded Replication (MTR) and backup lock (PS-3345 fixed in 5.7.28-31; but many other problems in MySQL 5.7 with MTR) • We had problems with backoff Free Waits (easily fixable in configuration) https://jfg-mysql.blogspot.com/2022/11/tail-latencies-in-percona-server-because-innodb-stalls-on-empty-free-list.html Good job Percona for Making MySQL Better !
  21. MySQL at HubSpot MySQL at HubSpot • HubSpot runs with

    the slow and general logs enabled • Comments in queries for traceability and analysis (I wish there was something better) • Look forward to setting long_query_time in query hints in 8.0 ! • A lot of things that are good to know • But first…
  22. MySQL at HubSpot - Oracle is a good steward of

    MySQL ! New features in 8.0 is what was asked (take advantage of new things without a major upgrade) • Breaking binary in-place downgrade is a logical consequence of this (I wished replicating from newer sources to older replicas was supported, for upgrade rollback) The new LTS & Innovation Releases Model https://twitter.com/MySQL/status/1641581668220772353 • Looking for more information about this, especially 8.0 being EOL in 2025/26
  23. MySQL at HubSpot - #loveSemiSync There is uncertainty about the

    future of Semi-Synchronous Replication: • I understood at FOSDEM 2023 that Oracle is not improving Semi-Sync • I am disappointed: HubSpot runs Semi-Sync without major pain, and it is great for lossless HA (IMHO, Group Replication is too complicated, and simpler is better because more robust) (simpler also allows easier troubleshooting when something goes wrong) • Two unfixed bugs from June 2021 (there are probably others): Bug#104012 & Bug#104015 • Join me in supporting Semi-Sync with #loveSemiSync (below on Twitter and LinkedIn)
  24. MySQL at HubSpot - A few things that are good

    to know Multi-threaded replication (MTR) is painful in 5.7 • Bug#89247: deadlocks, fixed in 5.7.31 and 8.0.21: patch by Percona, thanks ! • Bug#107574: deadlocks [...] when setting read_only (wrongfully closed, maybe Percona will fix in PS-8411) • Because of deadlocks, we do not run with MTR always on, we have a enable/disable script • Also, replication load metric does not work with MTR (another reason to disable) http://www.markleith.co.uk/2012/07/24/a-mysql-replication-load-average-with-performance-schema/ Other things that are good to know: • The SUPER_READ_ONLY feature is useful to avoid writing to replica by mistake ◦ When having to write to a replica, need disabling at the global level, which is risky ◦ And disabling SUPER_READ_ONLY deadlocks MTR ◦ A session override would be useful: Bug#111108
  25. MySQL at HubSpot - More good to know More things

    that are good to know: • Commit stalls on binary logs rotation with sync_binlog = 0 Nice finding by Vadim from Percona: https://twitter.com/jfg956/status/991149562722070528 (with auto-commit, shows as a slow INSERT, UPDATE or DELETE) (in the cloud and for replication speed: sync_binlog = 0 and trx_commit = 2) (to be “cloud ready”, MySQL needs to work well with these settings) • Care with Foreign Key: Bug#102586: FK CASCADE breaks with RBR and multiple-table DELETE • Care with Generated Columns: Bug#107438: Generated Columns are NULL on replicas (in 5.7) (open June 2022, 8.0 not affected, not fixed in May 2023 - this is a data corruption bug !) • Online ALTER had been failing with Duplicate Entry for 10 years ! https://jfg-mysql.blogspot.com/2021/11/duplicate-entry-in-alter-table-and-in-optimize-table.html
  26. MySQL at HubSpot - Performance Schema Performance Schema is super

    useful • Love the query observability provided by events_statements_summary_by_digest ◦ ps-collect-digest: takes a copy of this table, run regularly by cron (every minute for us) ◦ Delta-ing copies shows what happened during a time interval ◦ We store these copies in MySQL for querying with SQL: use-case for writing to replica ! • Annoyances about digests and other metrics: ◦ where … in (?) vs … in (...): 2 digests for many tuples, or just one, in the IN-clause ◦ where a=:a 0 OR a=:a 1 vs … OR a=:a 2 …: as many digests as there are OR variations ◦ where b=false vs … b=true: 2 different digests (instead of true/false being a ?) ◦ Rows examined blindspot because index condition pushdown https://jfg-mysql.blogspot.com/2022/03/rows-examined-not-trustworthy-because-index-condition-pushdown.html ◦ Another rows examined blindspot (I have not blogged about it yet, involve 0 row sent) (where a in (a 1 ,a 2 ...a n ) AND b in (b 1 ,b 2 ...b m ) scans n x m rows, but “examines” 0 !) • We have plans for ps-collect-{table,file,mutex} ! • Miss InnoDB Metrics, and accounting features in P_S (query by host, user, cnx attributes)
  27. Choice made many years ago at HubSpot: all tables are

    created with KEY_BLOCK_SIZE=8 • Probably for saving disk space (but sometimes it does not, or a smaller kbs would be better) • But it hides bad design (excessive use of JSON, VARCHAR as ENUMs, …) • And it violates standard database scaling best-practice: be biased for doing things client-side Using KEY_BLOCK_SIZE=8 by default might be a sub-optimal optimization: • It makes things more complicated, and might consume CPU on a CPU-bound workload • It is useful, but should be used on a case-by-case basis with optimized block size https://jfg-mysql.blogspot.com/2017/08/adventure-innodb-table-compression.html InnoDB Table Compression is also complicated (fully understanding it is non-trivial): • It stresses the InnoDB Free List (and no good metrics) https://jfg-mysql.blogspot.com/2022/12/free-pages-consumption-by-innodb-table-compression.html • It has a weird IO behavior (I need to blog about this) • And it has many unsolved bugs… MySQL at HubSpot - InnoDB Table Compression (ROW_FORMAT=COMPRESSED )
  28. MySQL at HubSpot - InnoDB Table Compression Bugs Some bugs

    (list non-exhaustive): • Bug#59550: Innodb_buffer_pool_pages_misc goes wrong (from 2011 !) • Bug#84439: Table of row size of ~800 bytes does not compress with KEY_BLOCK_SIZE=1 • Bug#84467: ALTERing KEY_BLOCK_SIZE keeps the old kbs in KEYs • Bug#107074: The column uncompress_ops capped at 2^31-1 in I_S.INNODB_CMP And Valerii Kravchuk has a full blog post about them: http://mysqlentomologist.blogspot.com/2018/06/on-innodb-data-compression-in-mysql.html • “If you hit some problem with this feature [InnoDB Compression] you have to live with it” • “[...] compression options do not get much attention from Oracle developers recently”
  29. MySQL at HubSpot - InnoDB Table Compression Bugs - not

    fixed by Oracle but fixed by Percona ! Bug#107059: Bad compression stats in I_S.INNODB_CMP starting in 5.7.27 (8.0 not affected) • Regression introduced in July 2019, bug opened in April 2022 (these are not used by many) • Patch provided literally hours after the bug was opened (thanks huahua xu !) • Still not fixed in May 2023: seriously ! – So I opened PS-8749 while preparing this talk, and… Thanks again Percona ! You make MySQL better ! https://twitter.com/jfg956/status/1655949959953547270
  30. MySQL at HubSpot - InnoDB Table Compression Bugs - Cascading

    Crashing Bug ! We also hit a crashing bug with InnoDB Table Compression which can lead to cascading failure • Bug#111063: Assert: !dummy_big_rec on rollback of delete or update of row_format=compressed • This is exacerbated by Vitess query killer: a killed query rolls-back, which crashes MySQL • Retrying this query after failover will probably lead to another kill, and another crash ! • Unsatisfactory fix : rebuilding the table (ALTER TABLE … FORCE ) removes the crashing condition Even if uncommon, this bug had us rethink our usage of InnoDB Table Compression • This bug only happens on very specific rows (only a few), but in many of our tables • We are currently working on decompressing most/all of our tables (this might save us CPU) • We had to patch gh-ost because of a bugs around removing compression (Bug#70534: Removing table compression leaves compressed keys) Learning: only use InnoDB Table Compression when needed More learning: test your cascading failure prevention mechanism
  31. Kubernetes at HubSpot Kubernetes at HubSpot • Historically, HubSpot was

    using Mesos and a home-developed scheduler (Singularity) • Kubernetes is slowly replacing Singularity, and our MySQL DBaaS Platform runs on Kubernetes • Running MySQL in Kubernetes is great: all the common tasks are automated • But uncommon things not taken into account by automation are tedious (testing replica, upgrades, or other things you would creatively do with less constraint env.) • The container world is a different world than the vm or bare-metal world ◦ Thinking in a “vm way” in a container world generates technical debt ◦ There are new “problems” in a container world (like CPU throttling)
  32. Kubernetes at HubSpot - A Great Success for 5 Years

    ! We DO NOT agree with this ! • HubSpot is successfully running MySQL in Kubernetes since 2018 ! • In Production ! • It has been a great success for 5 years ! • And it is a smooth ride ! • (We also run Zookeeper in Kubernetes) • (We even run HBase on ARM in Kubernetes)
  33. Kubernetes at HubSpot - Getting used to the Container /

    Kubernetes World An example of “vm concept” we brought in our Kubernetes Operator: • Performance class for keyspaces: x-small (for tests), small (default), medium, large, xl, 2xl… (control CPU and RAM allocation, and other MySQL params) • From RAM, Max Cnx, and other MySQL params, we back-compute InnoDB Buffer Pool Size → this is sub-optimal, BP size should be 1st level params and RAM should be computed from it • So we design medium “vms” / pod, not medium “MySQL Instances”: we will fix this soon (in new design, allocate BP size and forward-compute RAM for the MySQL container) (no more BP size shrinkage when increasing Max Cnx or other MySQL param) Kubernetes allows the storage of configuration in Custom Resources • Works well with everything in a single Kubernetes Clusters • But when a MySQL deployment spawns 2 clusters, keeping things in sync is complicated • Hublets also duplicated all this (not only we mange QA and Prod, but now also na1 and eu1) • An external / centralized asset database would probably make sense
  34. Kubernetes at HubSpot - CPU Throttling Another new thing in

    the Container World is CPU Throttling • Linux CGroup enforces CPU limit on container by pausing their scheduling, 100 ms in our env. • If throttled, a Primary Key lookup usually taking 1 ms might take 101 ms • A MySQL Container with 10 CPU with Max Cnx of 100 can easily get throttled ◦ Setting Max Cnx to 10 when a container has 10 CPU works for CPU-Bound workload ◦ This does not work for IO-Bound: 100 Max Cnx needed to use all IOPS ◦ Tuning mixed workloads is very complicated (more complicated as workloads change) • Maybe opportunity for the a Thread Pool, not tested yet at HubSpot (More about CPU Throttling: https://danluu.com/cgroup-throttling/)
  35. Kubernetes at HubSpot - Uncommon Operations Challenge Automation The good

    thing about Kubernetes is to have everything automated • But there is always something you need that will deviate or challenge automation • One thing we had from a long time ago: Stop File in container Startup Scripts (If the file exists, pause startup – sleep infinity – and allow troubleshooting) • Something we added 12 months ago: testing replicas (to make sure a replica is not restarted while testing, or worse reparented to) (These days, we do not use this much, instead we spawn snap pods from EBS snapshots) • Something we did not have 2 years ago and was a heavy lift: MySQL Upgrades
  36. Kubernetes at HubSpot - MySQL Upgrades Two years ago, our

    MySQL version management was a variable in our docker image • All pods were running the same MySQL version • Upgrading was changing this variable and “deploying” the new image on all pods • Hopefully, when doing a version change, things went well ◦ If not, hopefully we could rollback by re-deploying with the older version ◦ This does not work with 8.0, and some downgrade do not work with 5.7 either Since then, we built full automation for upgrades, allowing upgrading one keyspace at a time • A set of upgraded pods running an upgraded dockerfiles is spawned • Reparent to an upgraded pod to switch version, reparent to un-upgraded pod for rolling-back • And we needed such rollback during our 5.7 upgrades (IODKU becoming super slow) • This assumes replication from vN+1 to vN works, not officially supported, but can be tested • And make sure a failover / reparent does not upgrade or downgrade ;-) Learning: always have a rollback plan, especially at scale !
  37. Vitess at HubSpot Vitess at HubSpot • Vitess is a

    Great Middleware for running MySQL ◦ MySQL Operations made easier by standardizing many functions (reparent, failover automation, backups, connection pool, …) ◦ Vitess also provides useful data-plane features (like cnx pooling, query killing, “sharding”) • Some features are missing (like Quality of Service and Smart Circuit Breakers) • Upgrades are complicated • And easy, transparent, efficient, resilient, and general purpose sharding is a myth !
  38. Vitess at HubSpot - General Advice / Need to Know

    Strong advice: if deploying Vitess, plan for using your own fork (even if you do not have patches yet) • A very commun answer to Vitess feature requests: “would you like to contribute this?” • If you end-up contributing something, your fork allows you to take advantage of it earlier • Also, your needs might not be “acknowledged” by Vitess, so you will need fixing things yourself OOM of vtTablet (we hoped upgrade from v5 to v14 would solve, still there) • Hypothesis: growing RAM usage because unbounded buffering, exacerbated by slow consumer • Happens with XtraBackup, and large result-set for Cross-shard queries The Vitess data-plane is not a lean stack: • Query parsing and Result-Set handling happens in vtGates and vtTablet • This requires a lot of CPU (and RAM for result buffering) • But one could use Vitess without vtGates (to take advantage of administration features)
  39. Vitess at HubSpot - Sharding Easy, transparent, efficient, resilient, and

    general purpose sharding is a myth ! • Converting an unsharded keyspace to sharded needs effort (probably an initial split) ◦ some tables in an unsharded keyspace will probably stay unsharded ◦ needed manual work as MoveTable unsatisfactory in v5: no rollback • Resharding is easier-ish (broken in Vitess v5, we fixed it in our fork, did some resharding, no attempt in v14 yet) • Devs do not have a good mental model of sharding: they write cross-shard queries by mistake (the query optimizer is not well understood, it is worse for a distributed query optimizer) Sharding abstraction and reliability / performance are “orthogonal” • Batching Primary Key reads needs to be shard-aware (to avoid problems with X-shard queries) • To be efficient, batch jobs need to be shard-aware (to avoid hot-spotting one shard to another) • To be resilient, applications needs to be shard-aware (to limit blast radius of problematic shard) • Other problems when an application is not shard-aware related to performance / reliability
  40. Vitess at HubSpot - Quality of Service and Circuit Breaking

    HubSpot is currently working on implementing Query Priority and Load Scheding • When a vtTablet Connection Pool reaches a certain queue size, fast-fail low priority queries • Goal: allow high priority query to succeed (instead of timing-out) • We have a working POC on this, and this is being discussed in the Vitess Slack https://vitess.slack.com/archives/C04N16YRWDC/p1682090035972359 Smart Circuit Breaking with Vitess • Circuit breakers in the application allows alleviating load when something goes wrong • When one shard of a sharded keyspace is impaired, our circuit breakers are tripping • Making client-side circuit breakers shard-aware is not easy (application send queries to keyspace, not shard; → shard-abstraction is a double-edge sword) • No good solution to this yet (we disable circuit breaking on sharded keyspace)
  41. Vitess at HubSpot - Vitess needs a better Upgrade Story

    ! Aggressive release schedule, restricted support timeframe, and limited compatibility → complicated ! • Vitess releases 3 versions a year, with an official upgrade path only with n+1 (no LTS) • The support timeframe of only 1 year leaves little time for stable operations • Vitess users either have to operate “unsupported”, or have to run a continuous upgrade race → neither options are appealing ! At the beginning of 2022, HubSpot was still running Vitess v5 (released in 2020) • In 2022, we started a major “in a single step” upgrade to v13 (released in February 2022) • On the way, we pivoted to upgrading to v14 (released in June 2022) • We completed the upgrade a few weeks ago (in Q1 of 2023) • Since then: v15 released in October 2022, v16 in February 2023 (and v17 scheduled June 2023) • We will probably do another upgrade project in the next 12 months Vitess needs a better Upgrade Story !
  42. Vitess at HubSpot - HubSpot Upgrade from Vitess v5 to

    v14 The foundation for our upgrade “in a single step” is the Cell concept of Vitess • From our v5 Cell, we build a new v14 Cell, using MySQL repl. from v5, with its vtGates, etc… • First upgrade step: is a reparent to a v14 vtTablet (replica reads are still in v5) • We had to patch the v5 vtGates for compatibility with v14 vtTablet: our fork was essential here ! • Rolling-back: reparent back to v5 • Rolling-forward: point the application to the v15 vtGates (we used DNS for this) • Rolling-back once on v15 vtGates: back at v5 vtGate, and then reparent to v5 vtTablet At each step of the process, we can roll-back, and we needed this • A cross-shard query went from being super fast to super slow (SELECT DISTINCT ) (regression in vtGate query optimizer: making things right made them slower) (solution: enabling Schema Tracking in vtGate, beta feature in v11, stable in v14) • A DELETE query, on which the vtGate rewrite changed, also went from fast to slow Repeated learning: always have a rollback plan, especially at scale !
  43. Conclusion Conclusion • After 5 years, we are happy with

    running MySQL with Vitess in Kubernetes • Automation is a never finished project ! • Kubernetes is mostly handled / mastered; but our Operator is continuously improved • MySQL and Vitess are good, but they could be better (neither are fully fitting our needs yet) • We are committed to Vitess and we continue investing in it
  44. Learnings • What is working well ◦ 3100+ mysqld and

    we don’t think about them individually ◦ Failover automation: Topology and built-in methods to reparent, promote, create new replica ◦ Vitess Slack: for getting help, discussing new features, open bugs, and monthly meetings ◦ Backups • Areas of improvement ◦ Upgrade cycle: a year from v5 to v14, now we have automation, should be faster next time ◦ Vitess not 100% compatible with MySQL 8: Window functions and CTEs • Best practices ◦ Adhere as closely as possible to Vitess best practices (avoid unofficial workarounds) ◦ Have a good understanding of when you want to be sharded vs unsharded preferably before you create and start using it ◦ Upgrade: Upfront test, Query replay, duplicate cluster, then shift traffic (not in-place upgrade)
  45. Links - General • Some historical links: ◦ Mesos, HubSpot,

    and the Singularity https://product.hubspot.com/blog/mesos-hubspot-and-the-singularity ◦ Vitess at HubSpot: How We Moved Hundreds of MySQL Databases into Kubernetes https://kccna18.sched.com/event/Gras/vitess-at-hubspot-how-we-moved-hundreds-of-mysql-databases-into-kubernetes-tom-petr-alexander-charis-hubspot • Introducing HubSpot’s EU Data Center: https://www.hubspot.com/eu-data-centre • Our Journey to Multi-Region: An Introduction https://product.hubspot.com/blog/developing-an-eu-data-center • Our Journey to Multi-Region: VTickets, Hublet-Compatible MySQL Auto-Increment https://product.hubspot.com/blog/our-journey-to-multiregion-vtickets • Our Journey to Multi-Region: Cross Datacenter MySql Data Replication https://product.hubspot.com/blog/mysql-data-replication • HubSpot Quality of Service Discussion in Vitess Slack https://vitess.slack.com/archives/C04N16YRWDC/p1682090035972359
  46. Links - Speaker’s Blog • Question about Semi-Synchronous Replication: the

    Answer with All the Details https://percona.community/blog/2018/08/23/question-about-semi-synchronous-replication-answer-with-all-the-details/ • The consequences of sync_binlog != 1 https://archive.fosdem.org/2020/schedule/event/sync_binlog/ • Duplicate Entry in ALTER TABLE (and in OPTIMIZE TABLE) https://jfg-mysql.blogspot.com/2021/11/duplicate-entry-in-alter-table-and-in-optimize-table.html • Rows Examined not Trustworthy because of Index Condition Pushdown https://jfg-mysql.blogspot.com/2022/03/rows-examined-not-trustworthy-because-index-condition-pushdown.html • Tail Latencies in Percona Server because of InnoDB Stalls on Empty Free List https://jfg-mysql.blogspot.com/2022/11/tail-latencies-in-percona-server-because-innodb-stalls-on-empty-free-list.html • Avoiding a STOP SLAVE Crash with MTR in Percona Server older than 5.7.37-40 https://jfg-mysql.blogspot.com/2023/05/avoiding-stop-slave-crash-with-mtr-in-ps-5.7.html
  47. Links - Percona Server and MySQL Bugs • LP#1527463: Waiting

    for binlog lock (became PS-3345) • Bug#96400: MTS STOP SLAVE takes over a minute [...] (open Aug. 1st 2019, still not fixed) Percona fixed in PS-5824 (5.7.28-31 and 8.0.17-8) but introduced a crash fixed in PS-7147 (8.0.19-10) 5.7 forgotten from above, fixed later in PS-8030 (reported in 2022-01-11, releases on 3-31 in 5.7.37-40) • Bug#102586: Foreign Key ON DELETE CASCADE breaks with RBR and multiple-table DELETE • Bug#102988: mysqlbinlog | mysql breaks with GTIDs and [...] (test case for Bug#91812) • Bug#104012: FLUSH BINARY LOGS blocks when a client is waiting for semi-sync ACK • Bug#104015: Losing semi-sync acks could stall the primary • Bug#104626: Remove failure of Online ALTER because concurrent Duplicate entry • Bug#107438: JSON generated column values are NULL on replicas • Bug#107574: MTR deadlocks [...] changing read_only (wrongfully closed, Percona might fix PS-8411) • Bug#111108: Provide a way to override SUPER_READ_ONLY at the session level
  48. Links - InnoDB Table Compression - Bugs and Posts •

    Bug#59550: Innodb_buffer_pool_pages_misc goes wrong (from 2011 !) • Bug#70534: Removing table compression leaves compressed keys • Bug#84439: Table of row size of ~800 bytes does not compress with KEY_BLOCK_SIZE=1 • Bug#84467: ALTERing KEY_BLOCK_SIZE keeps the old kbs in KEYs • Bug#107059: Bad cmp. stats in I_S.INNODB_CMP [...] (sister bug, opened later, quickly fixed: PS-8749) • Bug#107074: The column uncompress_ops capped at 2^31-1 in I_S.INNODB_CMP • Bug#111063: Assert: !dummy_big_rec on rollback of delete or update of row_format=compressed • An Adventure in InnoDB Table Compression (for read-only tables) https://jfg-mysql.blogspot.com/2017/08/adventure-innodb-table-compression.html • Free Page Consumption by InnoDB Table Compression (Percona backoff Algo. part #2) https://jfg-mysql.blogspot.com/2022/12/free-pages-consumption-by-innodb-table-compression.html • And Valerii Kravchuk post: On InnoDB Data Compression in MySQL http://mysqlentomologist.blogspot.com/2018/06/on-innodb-data-compression-in-mysql.html
  49. Links - Other Good References • The problem with MySQL

    foreign key constraints in Online Schema Changes https://code.openark.org/blog/mysql/the-problem-with-mysql-foreign-key-constraints-in-online-schema-changes • Things that don’t work well with MySQL’s FOREIGN KEY implementation https://code.openark.org/blog/mysql/things-that-dont-work-well-with-mysqls-foreign-key-implementation • A MySQL Replication Load Average with Performance Schema http://www.markleith.co.uk/2012/07/24/a-mysql-replication-load-average-with-performance-schema/ • The container throttling problem: https://danluu.com/cgroup-throttling/
  50. Links - Other HubSpot Data Infrastructure • Embrace and Replace:

    Migrating ZooKeeper into Kubernetes https://product.hubspot.com/blog/zookeeper-to-kubernetes-migration • Improving Reliability: Building a Vitess Balancer to Minimize MySQL Downtime https://product.hubspot.com/blog/improving-reliability • Healing HBase Locality At Scale https://product.hubspot.com/blog/healing-hbase-locality-at-scale • How to Get Better at Updating Your Data Infrastructure https://product.hubspot.com/blog/updating-data-infrastructure • Our Journey to Multi-Region: Supporting Cross-Region Kafka Messaging https://product.hubspot.com/blog/kafka-aggregation • Saving Millions on Logging: Finding Relevant Savings & Delivering Savings https://product.hubspot.com/blog/savings-logging-part1 https://product.hubspot.com/blog/savings-logging-part2 • Launching HBase on ARM https://product.hubspot.com/blog/hbase-on-arm