Upgrade to Pro — share decks privately, control downloads, hide ads and more …

DevOpsDaysPortugal 2019 - Pierre Vincent - Chan...

DevOpsDaysPortugal 2019 - Pierre Vincent - Changing tyres on a moving car: our journey to zero-downtime deployments

Applications built over the years carry historical design assumptions, such as: it is acceptable to take a system out for upgrade maintenance for a few hours every 6 months.

In today’s world, embracing continuous delivery practices means more frequent releases, which means more downtime. Besides, finding a good maintenance window becomes a struggle with worldwide users, as well as for the operators managing the upgrade out of business hours.

In this talk, I want to demonstrate that by mapping out complex deployments processes, it becomes possible to prioritise work and progressively reduce the deployment impact. I will also give practical advice on how to tackle blockers to zero-downtime deployments, such as:

Migrating database schemas while keeping an application running
Ensuring backward compatibility of messages and APIs
Dealing with long-running background jobs
Mitigating user session loss
Deploying without the comfort of a maintenance window also means that stability during the upgrade is a critical concern. I will go through how it can be achieved through systematic pipeline automation and good system visibility to help operators during the upgrade.

This talk comes directly from my personal experience: our core product used to need a 3 hours blackout for upgrades, every month, with somebody up doing it at night time. Today, we can deploy during working hours without users noticing and are finally able to break away from long release cycles. This was achieved thanks to a strong collaboration between developers, SREs and infrastructure engineers, applying the techniques from this talk.

Avatar for DevOpsDaysPortugal

DevOpsDaysPortugal

June 04, 2019
Tweet

More Decks by DevOpsDaysPortugal

Other Decks in Technology

Transcript

  1. @PierreVincent DevOpsDays Portugal 2019 Changing tyres on a moving car

    Changing tyres on a moving car Our journey to zero-downtime deployments June 4th, 2019 – Lisbon @PierreVincent pvincent.io
  2. @PierreVincent DevOpsDays Portugal 2019 Changing tyres on a moving car

    There has been a massive earthquake in New Zealand and I need to use Poppulo for regular updates. Please can you advise when it will be back online. “ ” – Poppulo customer
  3. @PierreVincent DevOpsDays Portugal 2019 Changing tyres on a moving car

    Pierre Vincent Infra. & Reliability Manager @PierreVincent pvincent.io
  4. @PierreVincent DevOpsDays Portugal 2019 Changing tyres on a moving car

    2009 2015 Deploying 10+ times/day Zero downtime Deloy on-demand, anytime Core Monolith (est. 2007) Microservices (est. 2015) Deploying every 3 to 6 months 4 hours downtime On Sunday at 5PM Deploying every 4 weeks 2 hours downtime On Sunday at 8PM
  5. @PierreVincent How can we hope to achieve Continuous Delivery, when

    more frequent deploys means more downtime?
  6. @PierreVincent DevOpsDays Portugal 2019 Changing tyres on a moving car

    Mapping the deployment process, and its impact on users
  7. @PierreVincent DevOpsDays Portugal 2019 Changing tyres on a moving car

    Run database migrations Enable maintenance mode Shut down services Upgrade services Start services Disable maintenance mode Wait for queued jobs to complete 15-60 mins 5-30 mins 15 mins User impact Limited functionality Downtime Wait for services startup Deployment steps
  8. @PierreVincent DevOpsDays Portugal 2019 Changing tyres on a moving car

    Keeping the application up and running while applying database schema migration
  9. @PierreVincent DevOpsDays Portugal 2019 Changing tyres on a moving car

    Use expand/contract to split breaking changes Application [N] must work with schema [N+1] Online database migration Decouple schema version from application version No destructive operations to tables/columns in use Ensure backward compatibility with non- breaking changes only Detect changes likely to cause locking problems Limit impact to live traffic
  10. @PierreVincent DevOpsDays Portugal 2019 Changing tyres on a moving car

    Expand/Contract example: renaming a column Create new column Write to both columns Migrate historical records Read from new column Remove old column Release N+1 N+2 N+3
  11. @PierreVincent DevOpsDays Portugal 2019 Changing tyres on a moving car

    More on schema migrations Baron Schwartz - DevOps for the database Chapter: Loosening the Application/Database coupling www.vividcortex.com/resources/devops-for-the-database-ebook Michiel Rook - Database Schema Migrations with Zero Downtime speakerdeck.com/mrook/database-schema-migrations-with-zero- downtime-continuous-lifecycle-london-2019
  12. @PierreVincent DevOpsDays Portugal 2019 Changing tyres on a moving car

    Keeping the application up and running with rolling-upgrades
  13. @PierreVincent DevOpsDays Portugal 2019 Changing tyres on a moving car

    Drain Stop Upgrade Start Up [N] Up [N+1] 1 2 Drain Stop Upgrade Start Up [N] Up [N+1] Feature downtime Drain Stop Upgrade Start Up [N] Up [N+1] 1 2 Drain Stop Upgrade Start Up [N] Up [N+1] Feature continuously available Full upgrade Rolling upgrade
  14. @PierreVincent DevOpsDays Portugal 2019 Changing tyres on a moving car

    Focusing on operability to confidently run upgrades while serving live traffic
  15. @PierreVincent DevOpsDays Portugal 2019 Changing tyres on a moving car

    Entire deployment pipeline in source control + Consistent and repeatable deployments No more manual operations ✓ Any change is code-reviewed ✓ ✓
  16. @PierreVincent DevOpsDays Portugal 2019 Changing tyres on a moving car

    Observable deployments Rolling-upgrade Progress Core healthchecks ✓ Synthetic journey monitoring ✓ ✓ Error rates & queues saturation ✓
  17. @PierreVincent DevOpsDays Portugal 2019 Changing tyres on a moving car

    Deploying every 3 to 6 months 4 hours downtime On Sunday at 8PM 2009 2015 2019 Deploying every 4 weeks 2 hours downtime On Sunday at 8PM Deploying anytime Zero downtime During working hours
  18. @PierreVincent DevOpsDays Portugal 2019 Changing tyres on a moving car

    Zero-downtime deployments don’t mean everything stays up or that everything is immediately running the latest version. Thank you! @PierreVincent pvincent.io They simply mean users don’t notice a thing while all this is happening.