Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Vitess: Online Schema Migration Automation

Vitess: Online Schema Migration Automation

For many, running an online schema migration operation is still a manual job: from building the correct command, through identifying where the migration should run and which servers are to be affected, to auditing progress and completing the migration. Sharded environment poses an additional burden, as any logical migration must be applied multiple times, once for each shard.

What if you could just issue an ALTER TABLE ... statement, and have all that complexity automated away?

Vitess, an open source sharding framework for MySQL, is in a unique position to do just that. This session shows how Vitess's proxy/agent/topology architecture, together with gh-ost or pt-online-schema-change, are used to hide schema change complexity, and carefully schedule and apply schema migrations.

168ccec72eee0530b818d44f3fedaacf?s=128

Shlomi Noach

October 21, 2020
Tweet

Transcript

  1. PerconaLive EU 2020 Vitess: online schema migration automation Shlomi Noach

  2. Engineer at PlanetScale Author of orchestrator, gh-ost, freno and others

    Blog at openark.org github.com/shlomi-noach @ShlomiNoach About me
  3. • Founded in February 2018 • Venture backed: a16z, SignalFire

    • ~40 employees distributed • Authors Vitess • Offers PlanetScaleDB for cloud offering and PlanetScaleDB BYOK: Bring Your Own Kubernetes PlanetScale
  4. Vitess • CNCF graduated project • Maintained by the community

    • PlanetScale is a major contributor A database clustering system for horizontal scaling of MySQL
  5. Vitess architecture Consider a simple replication topology

  6. Vitess architecture vttablet • A daemon/sidecar • Controls the mysqld

    process • Interacts with the mysqld server • Typically on same host as mysqld
  7. Vitess architecture

  8. Vitess architecture vtgate vtgate • A smart proxy • Speaks

    mysql protocol • Impersonates the database • Relays queries to tablets
  9. Vitess architecture vtgate • Stateless • Scale out

  10. Vitess architecture commerce shard 0 commerce shard 1 internal unsharded

  11. Vitess architecture commerce shard 0 commerce shard 1 internal unsharded

    USE commerce SELECT * FROM orders WHERE customer_id=4
  12. Vitess architecture commerce shard 0 commerce shard 1 internal unsharded

    topo topo • Vitess state backend • etcd/zk/consul • Info: • Sharding schemes • Schemas • Clusters • Tablets, servers, • Operations • ...
  13. Vitess architecture commerce shard 0 commerce shard 1 internal unsharded

    topo vtctld • Daemon • API server • Operations
  14. Vitess architecture commerce shard 0 commerce shard 1 internal unsharded

    topo
  15. Mystery card ? A hidden truth

  16. • ALTER TABLE is blocking Issues with schema migrations

  17. Issues with schema migrations • ALTER TABLE is blocking •

    gh-ost, pt-online-schema-change, fb-osc as alternatives Installing, formalizing, command line options
  18. Issues with schema migrations • ALTER TABLE is blocking •

    gh-ost, pt-online-schema-change, fb-osc as alternatives • Discovery commerce shard 0 commerce shard 1 internal unsharded
  19. Issues with schema migrations • ALTER TABLE is blocking •

    gh-ost, pt-online-schema-change, fb-osc as alternatives • Discovery • Running commerce shard 0 commerce shard 1
  20. Issues with schema migrations • ALTER TABLE is blocking •

    gh-ost, pt-online-schema-change, fb-osc as alternatives • Discovery • Running • Throttling commerce shard 0 commerce shard 1
  21. Issues with schema migrations • ALTER TABLE is blocking •

    gh-ost, pt-online-schema-change, fb-osc as alternatives • Discovery • Running • Throttling • Visibility commerce shard 0 commerce shard 1
  22. Issues with schema migrations • ALTER TABLE is blocking •

    gh-ost, pt-online-schema-change, fb-osc as alternatives • Discovery • Running • Throttling • Visibility • Control commerce shard 0 commerce shard 1
  23. Vitess’ architecture has the knowledge and the capability to address

    the above. Issues with schema migrations
  24. Vitess Online DDL ALTER WITH ‘gh-ost’ TABLE orders ADD COLUMN

    ts TIMESTAMP ZeroDependenciesNoReally™
  25. Vitess Online DDL ALTER WITH ‘pt-osc’ TABLE orders ADD COLUMN

    ts TIMESTAMP ZeroDependenciesNoReally™
  26. Vitess Online DDL DEMO

  27. Vitess Online DDL What just happened?

  28. Online DDL flow commerce shard 0 commerce shard 1 internal

    unsharded topo vtgate • Intercepts • Parses • Persists
  29. Online DDL flow commerce shard 0 commerce shard 1 internal

    unsharded topo topo • Stores
  30. Online DDL flow commerce shard 0 commerce shard 1 internal

    unsharded topo vtctld • Observes • Discovers • Distributes • Validates
  31. Online DDL flow commerce shard 0 commerce shard 1 internal

    unsharded topo vttablet • Has all the fun
  32. vttablet on primary • Schedules • Prepares script(s) • Runs

    gh-ost or pt-osc • Tracks • Throttles • Cleans up • Tracks state in local _vt meta schema Online DDL flow
  33. Vitess Online DDL DEMO, continued

  34. Vitess Online DDL ZeroDependenciesNoReally™

  35. Q: Which MySQL account and privileges do I need for

    the migration user? How do I tell Vitess the credentials? A: Vitess creates a migration user with random password for each migration, on your behalf. You do not need to do anything.
  36. Q: How do I setup the gh-ost hooks? A: Vitess

    creates gh-ost hooks directory and scripts for you. Vitess creates a pt-osc plugin for you. You do not need to do anything.
  37. Q: Where do I install gh-ost? A: On linux amd64

    and with recent glibc, Vitess comes with gh-ost binary pre-compiled and embedded, and will extract the binary on your behalf. You do not need to do anything.
  38. Q: Where do I install gh-ost? (Cont.) Otherwise use -gh-ost-path.

    pt-online-schema-change is not bundled. Install it on all MySQL servers. Use -pt-osc-path
  39. Q: How do I configure the throttling replicas? A: Each

    vttablet runs a throttler service, which automatically and dynamically detects the throttling replicas. It evaluates replication lag autonomically. Both gh-ost and pt-osc are configured to use it. You do not need to do anything (but you can configure -throttle_tablet_types="replica,rdonly" )
  40. Q: How do I drop pt-osc leftover triggers in case

    of failure? A: Vitess will identify a failed migration, even if vttablet itself fails, and will drop the triggers on your behalf. You do not need to do anything.
  41. Q: How do I drop the leftover tables? A: Each

    primary vttablet runs a table lifecycle process. It will collect the leftover tables (whether migration is successful or failed) and will safely and slowly (using the throttler mechanism) purge and drop them. You do not need to do anything.
  42. Mystery card reveal ? A hidden truth, In something not

    said
  43. Vitess as an infrastructure framework None of the above requires

    you to run traffic through Vitess. You may keep running your production traffic as normal, if you wish, and only use Vitess for infrastructure.
  44. • GitHub and GitHub team: • github.com/github/gh-ost • github.com/github/freno •

    Lessons learned • Percona • percona.com/doc/percona-toolkit • Community • Vitess maintainers • PlanetScale team Acknowledgements
  45. Resources • Docs: vitess.io/docs • Code: github.com/vitessio/vitess • Slack: vitess.slack.com

    • Demo: asciinema play https://git.io/JUhMa
  46. Questions? github.com/shlomi-noach @ShlomiNoach Thank you!