Deploying Java Web Applications With Zero Downtime

DEPLOYING JAVA WEB APPLICATIONS WITH ZERO DOWNTIME Booster 2013 Stein
Inge Morisbak Me: Practice lead for Continuous Delivery and DevOps. Tech lead on the team developing Digipost, Norways new digital mailbox, developed by the Norwegian Postal Service. At Digipost we deliver continuously, and for the time being we deliver to production a couple of times every week. We have about a quarter of a million users. So it’s not a good idea to have to have downtime a couple of times every week.

Our highest priority is to satisfy the customer through early
and continuous delivery of valuable software. When you are doing agile you should deliver continuously. It’s the first principle of the agile manifesto.

Deliver working software frequently, from a couple of weeks to
a couple of months, with a preference to the shorter timescale. and rapidly! The more often, the better. The sentence in the middle is outdated and should be removed. But we have to understand the historical context when this was written. People were used to huge waterfall projects and delivery every half year or so. Or even once a year.

Working software is the primary measure of progress. Progress is
measured without exceptions from working software in production. And the longer it takes between every release, the harder it is to measure progress.

Continuous delivery is about putting the release schedule in the
hands of the business, not in the hands of IT. ... any build could potentially be released to users at the touch of a button using a fully automated process in a matter of seconds or minutes. - Jez Humble (http://continuousdelivery.com/) Continuous delivery is all about reducing lead time from idea to production. It’s business who decides what and when something should go into production. So, business can say that they want a change - and see it in production soon after. And then we should not have to say no.

Hence, putting things into production should be as easy as
pushing a button, and we cannot have downtime when we release often, and have a lot of users.

This is my customer and his button.

Simple and light weight technology is essential. That’s why we
use embedded Jetty as our servlet container.

public class MySessionHandler extends SessionHandler { public MySessionHandler(final Server server,
final DataSource dataSource) { super(); setSessionIdManager(new MyJDBCSessionIdManager(server, dataSource)); } } public class MyJDBCSessionIdManager extends JDBCSessionIdManager implements SessionIdManager { public MyJDBCSessionIdManager(final Server server, final DataSource ds) { super(server); setDatasource(ds); } } The way we have solved zero downtime deploys is by using session replication in the database. Jetty supports this out-of-the box. You simply add a JDBCSessionHandler which persists the user sessions in a database. If a session does not exist in memory, the database is checked to see if it has a session from another server there. If it does, it loads the session into memory on the new server. Simple - but we have to do one more thing.

/online /offline /online NODE 2 NODE 3 NODE 1 /online
/offline /online /online /offline /online BIG-IP LOAD BALANCER In front of our application servers we have a load balancer which directs the users to the same server that they last used. It can take a while before the load balancer notices that a node is down, so it will wait for quite a long time before it directs the user to the next server. The user then has to wait for a long time to get an answer. Therefore we have to deal with this some other way.

/online /offline NODE 2 NODE 3 NODE 1 /online /offline
/online /online /offline /online BIG-IP LOAD BALANCER It starts with selecting one of the servers which resigns from the load balancer pool by changing its return status from online to offline. Then we wait for a while and continue serving users until the load balancer has pinged us - and removed the server from the pool.

/online /offline NODE 2 NODE 3 NODE 1 > ./jetty-‐deploy.sh
web-app 1.7 /online /offline /online /online /offline /online BIG-IP LOAD BALANCER Then we deploy to the server.

/online /offline NODE 2 NODE 3 NODE 1 /online /offline
/online /online /offline /online BIG-IP LOAD BALANCER After that, you may want to run a test suite against the new application.

/online /offline /online NODE 2 NODE 3 NODE 1 /online
/offline /online /online /offline /online BIG-IP LOAD BALANCER before we let it into the cluster again. And we do the same for the rest of the servers. Automated of course.

#!/bin/bash # Usage: jetty-deploy.sh <artifact> <version> artifact=$1 version=$2 wget https://nexus.bekk.no/${artifact}/${version}/${artifact}-${version}.zip
unzip ${artifact}.zip /etc/init.d/${artifact} stop rm ${artifact} # softlink ln -s ${artifact}-${version} ${artifact} /etc/init.d/${artifact} start while ( ! curl http://localhost:8080/status 2>/dev/null | grep online ) do echo "Waiting for web-app to come online." sleep 10 done We have implemented the deploybutton with a bash-script. We send in artifact and version as parameter one and two - you get the artifact from the artifact repo, unpack the new version, stop the application, move the soft-link from the old application to the new and start up the new application. The while loop at the end waits for the application to become online before the script finishes. So if you call this script from another script for each server you want to deploy to, you are certain that each server is online before moving on to the next one.

#!/bin/bash # Usage: rollback.sh <artifact> <version> artifact=$1 version=$2 /etc/init.d/${artifact} stop
rm ${artifact} # softlink ln -s ${artifact}-${version} ${artifact} /etc/init.d/${artifact} start while ( ! curl http://localhost:8080/status 2>/dev/null | grep online ) do echo "Waiting for web-app to come online." sleep 10 done Rollback is even simpler. You stop the app, move the symlink from the new to the previous version, and start up again.

upstream my_webapp_upstream { server 127.0.0.1:8080; server 127.0.0.1:8081; server 127.0.0.1:8082; server
127.0.0.1:8083; keepalive 64; } ... location @webapp { ... proxy_next_upstream error timeout http_502; ... proxy_pass http://my_webapp_upstream; } ... Not everyone has BigIp as a load balancer. It’s an insanely expensive box. But the principle can still be applied using other reverse proxies in front of your applications. This is an example with Nginx. If Nginx discovers an errror - in this case a connection refused or a timeout on an upstream server, it fails over to the next upstream server. We throw a 502 to indicate that a graceful shutdown is ongoing. Nginx then does a failover to the next upstream server.

<VirtualHost *:80> ServerName example.com ProxyRequests Off ProxyPreserveHost
On ProxyPass / http://localhost: / ProxyPassReverse / http://localhost: / </VirtualHost> 8080 8080 Apache.

On ProxyPass / http://localhost: / ProxyPassReverse / http://localhost: / </VirtualHost> jetty-‐deploy.sh -‐port=8081 8080 8080 $ We deploy a new version of our application on a different port. Here 8081.

On ProxyPass / http://localhost: / ProxyPassReverse / http://localhost: / </VirtualHost> 8081 8081 Then we change the reverse proxies port to 8081.

/etc/init.d/apache reload <VirtualHost *:80> ServerName example.com ProxyRequests Off
ProxyPreserveHost On ProxyPass / http://localhost: / ProxyPassReverse / http://localhost: / </VirtualHost> 8081 8081 $ and reload the Apache config.

REST JSON Client Another great thing about having plain web
servers in front of your applications is enabling a clean separation between your client and your server. You can deploy your pure JavaScript/HTML5 application and communicate for instance JSON with the servers REST apis. No need for a servlet container on the client. When you do this you can deploy your frontend client whenever you want without having to deploy your server. You don’t have to do anything extra to enable zero downtime deploys of your frontend. An architecture like this is great because implementing new clients is really easy with a clean and separate REST api.

DATABASE MIGRATION (AND ROLLBACK) Finally database migration

DATABASE MIGRATION (AND ROLLBACK) and rollback. It’s kind of a
chicken and egg problem: Database changes cannot be applied without breaking existing functionality, and the new version does’t work without the changes you’re about to apply. Not good! Well there is solutions to this.

EXPAND/CONTRACT-PATTERN EXPAND: CONTRACT: For example we have what we call
the expand/contract pattern. Expansion scripts are database changes which are safe to apply without breaking backwards compatibility with the existing application. Changes like adding new tables and tweaking indexes. Contraction-scripts are changes which cleans up stuff in the database which are no longer needed after the upgrade. Deleting columns, tables or constraints - or adding constraints are examples of this. The expansion scripts are run before the upgrade, and the contraction-scripts are run after the application is upgraded and stable. It’s very seldom that we do stuff which breaks the compatibility with two version of the database. For instance to rename a column or move it to a different table. In those rare situations you may use for instance triggers and a migration job. But writing rollback is often a nightmare when you do this.

app v. 14 compatible with db v. 3 deploy TIME
db v. 3 db pre. 4 migrate app v. 15 compatible with pre 4 og 4 deploy db v. 5 migrate app v. 18 compatible with db v. 5 deploy app v. 17 compatible with db 4 og 5 deploy db v. 4 migrate Here are some examples of typical patterns for the evolution of databases and applications. We start with a database version 3. Then we deploy an application compatible with version 3. Then we deploy a database which works with both the previous and the next version of the application - expansion. Then we deploy an application compatible with versions pre 4 and 4. We clean up and deploy the complete version 4 - contraction. Next application is compatible with both database 4 and the next version 5. We migrate version 5. And deploy a new application compatible with just 5.

http://www.liquibase.org For our database scripts we use Liquibase.

<databaseChangeLog> <changeSet id="Adding column for account status" author="Stein Inge Morisbak">
<addColumn tableName="ACCOUNT"> <column type="VARCHAR(20)" name="STATUS" /> </addColumn> </changeSet> <changeSet id="Migrate account status" author="Stein Inge Morisbak"> <sql> update ACCOUNT set STATUS = 'AKTIVE' where ACTIVE = 1; update ACCOUNT set STATUS = 'DEACTIVATED' where ACTIVE = 0; </sql> <rollback> update ACCOUNT set ACTIVE = 1 where STATUS = 'ACTIVE'; update ACCOUNT set ACTIVE = 0 where STATUS = 'DEACTIVATED'; update ACCOUNT set ACTIVE = 0 where STATUS = 'CLOSED'; </rollback> </changeSet> <changeSet id="Removing old column for active" author="Stein Inge Morisbak"> <dropColumn tableName="ACCOUNT" columnName="ACTIVE"/> <rollback> <addColumn tableName="ACCOUNT"> <column type="NUMBER(1,0)" name="ACTIVE"/> </addColumn> </rollback> </changeSet> </databaseChangeLog> Liquibase is xml-based, and when you run it - it tells you if you haven’t written code for roll-back. It also tells you whether something is wrong with your scripts before you even try them against the database. Every release is tagged with a version number in metadata tables, and you can roll forth and back between versions with a simple command.

SUMMING UP You are not agile if you do not
deliver continuously! You cannot deliver continuously if you have downtime! Business decides what and when to go into production. Deploy should be as easy as pushing a button. Zero downtime deploy can be achieved by: - Session replication in the database. - Migrate backwards- and forward compatible database changes (expand). - Remove a node from load balancer/reverse proxy. - Deploy to the node. - Repeat for the rest of the nodes - Clean up the database (contract). A clean separation between the client(s) and the server makes things even more agile. If you want to say you’re doing agile, you have to deliver continuously. You cannot deliver continuously if you have downtime. We developers should not say no if business wants to see something in production within the week or even faster. Deployment should not be a heavy process with lots of manual steps. Zero downtime deploy can be achieved relatively simple by replicating sessions in the database, migrating database changes without breaking compatibility with the existing or the new version of the application, remove a node from the load balancer pool, and deploy to that node. Repeat for the rest of the servers. Clean up the database with migrations that can be done after the new version is stable in production. And of course. Make sure you script - and test - rollback. Finally: If you separate the client from the server, you can deploy each of them separate from each other, and new clients are easier to add.

THANK YOU! Stein Inge Morisbak Practice lead Continuous delivery and
DevOps [email protected] @steinim http://open.bekk.no/

Deploying Java Web Applications With Zero Downtime

Deploying Java Web Applications With Zero Downtime

Stein Inge Morisbak

More Decks by Stein Inge Morisbak

Other Decks in Programming

Featured

Transcript