Technology for Future Sunday, 17 March 2013 Kicking unix servers, swearing at websites and running Ops teams for the last 15 years Particularly interested in culture of teams and organisations, so a lot of talk will be about that, why we were in hell, and how we got out - from Ops perspective because that’s where I come from
from across the globe visiting our web properties monthly Online 50 millionver $1m gross revenue per month via all digital editions. Over 3m paid iPad editions sold in FY12 Tablets = $10,000 = 200,000 $1 million Future has 225 licensed properties available in 89 overseas partnerships Licenses = 1 License 225 = 200,000 = 1 purchase Over 24m printed copies sold in )<²WKDW·V45 every minute of every day Print 24 million 1 Last year our content had over 120m YLGHRYLHZV²WKDW·V228 every minute of every day Video = 10 million = 10 views 120 million 10m 10m 10m 10m 10m 10m 10m 10m 10m 10m 10m 10m 10m 1 Future has 4.1m followers across all the main social networks Social Media 2.5m 1.3m = 200,000 300k 4.1 million Sunday, 17 March 2013 Future have been publishing for a long time 28 years in print publishing and now digital - websites and digital editions Future specialise in content for enthusiasts - if you have a hobby, we probably make a magazine and website for it Operations in UK, US and Australia 50 million unique users, 300 million page impressions a month The nature of publishing leads to a company structure that’s not like the average web company - publishers/mag structure/lots of little silos
lied a bit in the title, there are about 40 platforms. It just feels like a thousand different platforms when you try and run it all :) 80 major sites, 200+ smaller Different languages, dev teams, offices, countries One ops team
config, manual deploys, manual rollbacks, manual pulling your hair out while sobbing in the corner and rocking Still managed to build some very successful websites like this Doing 250 deploys a month is nothing in the land of continuous deployment, but by hand it’s a hell of a lot of work : Over 10 deploys a day, failures, rollbacks, database changes. 2 people’s full time jobs! Hence the sobbing What else makes it stressful: Geographical and cultural separation between devs and ops Geographical separation makes all of the bad bits worse. All of the worst things you’d expect - silos, throw-it-over-the wall deploys, Blamestorming Ops tend to see devs as careless destroyers of their shiny servers Devs see Ops as grumpy old unix sysadmins who never want anything to change ever The business just wants the sites to get better more quickly and doesn’t want to referee
Manual config, manual deploys, manual rollbacks, manual pulling your hair out while sobbing in the corner and rocking Still managed to build some very successful websites like this Doing 250 deploys a month is nothing in the land of continuous deployment, but by hand it’s a hell of a lot of work : Over 10 deploys a day, failures, rollbacks, database changes. 2 people’s full time jobs! Hence the sobbing What else makes it stressful: Geographical and cultural separation between devs and ops Geographical separation makes all of the bad bits worse. All of the worst things you’d expect - silos, throw-it-over-the wall deploys, Blamestorming Ops tend to see devs as careless destroyers of their shiny servers Devs see Ops as grumpy old unix sysadmins who never want anything to change ever The business just wants the sites to get better more quickly and doesn’t want to referee
config, manual deploys, manual rollbacks, manual pulling your hair out while sobbing in the corner and rocking Still managed to build some very successful websites like this Doing 250 deploys a month is nothing in the land of continuous deployment, but by hand it’s a hell of a lot of work : Over 10 deploys a day, failures, rollbacks, database changes. 2 people’s full time jobs! Hence the sobbing What else makes it stressful: Geographical and cultural separation between devs and ops Geographical separation makes all of the bad bits worse. All of the worst things you’d expect - silos, throw-it-over-the wall deploys, Blamestorming Ops tend to see devs as careless destroyers of their shiny servers Devs see Ops as grumpy old unix sysadmins who never want anything to change ever The business just wants the sites to get better more quickly and doesn’t want to referee
stress was being parental: Stereotypical Dev/Ops interactions can be like bad parent/child interactions: “Do as I tell you, you’re not to be trusted” “I hate you and you smell and I’m going to do it anyway” “Well now you’ve broken everything. I’m not angry, I’m just very disappointed. I’m going to go and clean everything up after you now so you can do it all again” This is dysfunctional! Cultural norms for Ops folk can be very unhelpful. BoFH is supposed to be comedy, not a manual for interpersonal relationships at work Parental cleanup doesn’t encourage anyone to own their own messes and fix them Being a parent is exhausting and doesn’t scale
Another source of stress was being parental: Stereotypical Dev/Ops interactions can be like bad parent/child interactions: “Do as I tell you, you’re not to be trusted” “I hate you and you smell and I’m going to do it anyway” “Well now you’ve broken everything. I’m not angry, I’m just very disappointed. I’m going to go and clean everything up after you now so you can do it all again” This is dysfunctional! Cultural norms for Ops folk can be very unhelpful. BoFH is supposed to be comedy, not a manual for interpersonal relationships at work Parental cleanup doesn’t encourage anyone to own their own messes and fix them Being a parent is exhausting and doesn’t scale
do to fix it Things not to try: Adding more process Adding more rules Adding more restrictions e.g. complex signoffs, deploy windows, heavy process Like the legions of the damned trying to improve their lot by asking for better, sharper pitchforks to be used on them We probably tried all of these at one time or another. They universally made things worse. The tighter the controls, the worse everyone’s interactions were. Illusion of Control
do to fix it Things not to try: Adding more process Adding more rules Adding more restrictions e.g. complex signoffs, deploy windows, heavy process Like the legions of the damned trying to improve their lot by asking for better, sharper pitchforks to be used on them We probably tried all of these at one time or another. They universally made things worse. The tighter the controls, the worse everyone’s interactions were. Illusion of Control
for an answer, DevOps sounded great, but info seemed to kind of assume you were working on a single platform, at a VC funded startup in the heart of SF Strangely, the company wasn’t keen to relocate us all there, so we had to find some other answers Popular Devops tools, processes and patterns weren’t a natural fit for us Hard to find business case to justify big bang to new structures and processes with all of the hit to schedules it would involve
the vital parts of DevOps that we wanted to adopt. The core for us was to recognise that DevOps was about: People Collaborating together On the same goals: Deliver more value to the business, more quickly, and actually be able to sleep at night
easy to sell DevOps to management from a cold start If we couldn’t sell it to the technical teams, then it would be impossible to sell to the business So many different teams/management, hard to impose from above Businesses are (mostly) rational - if we could show how it really helped, it’s a much easier sell
than to get permission” Sunday, 17 March 2013 It’s not easy to sell DevOps to management from a cold start If we couldn’t sell it to the technical teams, then it would be impossible to sell to the business So many different teams/management, hard to impose from above Businesses are (mostly) rational - if we could show how it really helped, it’s a much easier sell
value after youʼve generated some • Bottom-up adoption is easier to drive than top-down • “Itʼs easier to ask forgiveness than to get permission” Sunday, 17 March 2013 It’s not easy to sell DevOps to management from a cold start If we couldn’t sell it to the technical teams, then it would be impossible to sell to the business So many different teams/management, hard to impose from above Businesses are (mostly) rational - if we could show how it really helped, it’s a much easier sell
out of hell - apart from “Well I wouldn’t start from here if I were you”? Plan of attack Accept things will break, that’s why you start small, breaking things ‘narrows the problem space’ Earliest adopters were ops, ‘eat your own dogfood’ We tested a lot of the toolset and ideas on our own puppet stuff Pick things that can be solved and will really improve things for people Evolve solutions piecemeal with Dev and Ops together Make a toolbox of things for other people to use and build on
• Eat your own dogfood Sunday, 17 March 2013 So how to get out of hell - apart from “Well I wouldn’t start from here if I were you”? Plan of attack Accept things will break, that’s why you start small, breaking things ‘narrows the problem space’ Earliest adopters were ops, ‘eat your own dogfood’ We tested a lot of the toolset and ideas on our own puppet stuff Pick things that can be solved and will really improve things for people Evolve solutions piecemeal with Dev and Ops together Make a toolbox of things for other people to use and build on
• Eat your own dogfood • Find solvable problems • Form partnerships • Evolve solutions together to make a toolbox Sunday, 17 March 2013 So how to get out of hell - apart from “Well I wouldn’t start from here if I were you”? Plan of attack Accept things will break, that’s why you start small, breaking things ‘narrows the problem space’ Earliest adopters were ops, ‘eat your own dogfood’ We tested a lot of the toolset and ideas on our own puppet stuff Pick things that can be solved and will really improve things for people Evolve solutions piecemeal with Dev and Ops together Make a toolbox of things for other people to use and build on
really wanted something different was the key to making this work Who do you pick for DevOps? Cyclists! - cyclingnews.com, bikeradar.com Mature platform, enthusiastic developers, supportive management What we had to offer: More control for the devs Quicker delivery for management Less pain for Operations Started working with them to improve both culture and toolset
really wanted something different was the key to making this work Who do you pick for DevOps? Cyclists! - cyclingnews.com, bikeradar.com Mature platform, enthusiastic developers, supportive management What we had to offer: More control for the devs Quicker delivery for management Less pain for Operations Started working with them to improve both culture and toolset
really wanted something different was the key to making this work Who do you pick for DevOps? Cyclists! - cyclingnews.com, bikeradar.com Mature platform, enthusiastic developers, supportive management What we had to offer: More control for the devs Quicker delivery for management Less pain for Operations Started working with them to improve both culture and toolset
changes are about communication and teamwork Ops had to learn to ‘relax and let go’ of the infrastructure etc. Knowledge sharing Sitting in the same place helps enormously, but good use of IRC or similar really helps with geographically separated teams Failcake - blame free post mortem encouragement If people cause breakages that affect others, be honest, own up, and bring cake. The situation is discussed, but no- one is angry, because, hey, there’s cake!
• Lunch and learn Sunday, 17 March 2013 Biggest and most effective changes are about communication and teamwork Ops had to learn to ‘relax and let go’ of the infrastructure etc. Knowledge sharing Sitting in the same place helps enormously, but good use of IRC or similar really helps with geographically separated teams Failcake - blame free post mortem encouragement If people cause breakages that affect others, be honest, own up, and bring cake. The situation is discussed, but no- one is angry, because, hey, there’s cake!
• Lunch and learn • Sitting in the same office! • IRC Sunday, 17 March 2013 Biggest and most effective changes are about communication and teamwork Ops had to learn to ‘relax and let go’ of the infrastructure etc. Knowledge sharing Sitting in the same place helps enormously, but good use of IRC or similar really helps with geographically separated teams Failcake - blame free post mortem encouragement If people cause breakages that affect others, be honest, own up, and bring cake. The situation is discussed, but no- one is angry, because, hey, there’s cake!
• Lunch and learn • Failcake • Sitting in the same office! • IRC Sunday, 17 March 2013 Biggest and most effective changes are about communication and teamwork Ops had to learn to ‘relax and let go’ of the infrastructure etc. Knowledge sharing Sitting in the same place helps enormously, but good use of IRC or similar really helps with geographically separated teams Failcake - blame free post mortem encouragement If people cause breakages that affect others, be honest, own up, and bring cake. The situation is discussed, but no- one is angry, because, hey, there’s cake!
• Statsd • Vagrant • Puppet Environments • Git • Liquibase • Jenkins Sunday, 17 March 2013 We’ve used a lot of the usual culprits, ActiveMQ is worth picking out because of the the effect it’s had on us technically, but also culturally
meant we accidentally gained a message queue ActiveMQ for us, but any queue will do really Once we had that on every server, we realised just how powerful it was, and how many new things you could build on that base Makes building bespoke tools out of distributed components easier Makes different teams writing tools that work together much easier
Sunday, 17 March 2013 MCollective meant we accidentally gained a message queue ActiveMQ for us, but any queue will do really Once we had that on every server, we realised just how powerful it was, and how many new things you could build on that base Makes building bespoke tools out of distributed components easier Makes different teams writing tools that work together much easier
• One standard way for everything to communicate Sunday, 17 March 2013 MCollective meant we accidentally gained a message queue ActiveMQ for us, but any queue will do really Once we had that on every server, we realised just how powerful it was, and how many new things you could build on that base Makes building bespoke tools out of distributed components easier Makes different teams writing tools that work together much easier
• One standard way for everything to communicate • Makes building bespoke tools out of smaller distributed components easier Sunday, 17 March 2013 MCollective meant we accidentally gained a message queue ActiveMQ for us, but any queue will do really Once we had that on every server, we realised just how powerful it was, and how many new things you could build on that base Makes building bespoke tools out of distributed components easier Makes different teams writing tools that work together much easier
• One standard way for everything to communicate • Makes building bespoke tools out of smaller distributed components easier • Makes different teams writing tools that work together much easier Sunday, 17 March 2013 MCollective meant we accidentally gained a message queue ActiveMQ for us, but any queue will do really Once we had that on every server, we realised just how powerful it was, and how many new things you could build on that base Makes building bespoke tools out of distributed components easier Makes different teams writing tools that work together much easier
on IRC Commit Made Events Sunday, 17 March 2013 John Hawkes-Reed did most of the work - mistakes in explanation are mine! Every push to the git repo triggers a post-commit hook that sends a message to a topic Subscribers to the topic get the message and do things Simplest subscriber is Eventbot: tells everyone on IRC about the events
EventBot on IRC Commit Made Tests Passed Events Commit Made Sunday, 17 March 2013 Jenkins: Gets message from the queue Gets its copy of the repo in sync Runs unit tests/build/whatever
EventBot on IRC Commit Made Tests Passed Events Commit Made Tests Passed Tests Passed Tests Passed Sunday, 17 March 2013 Message back to queue about success or failure - in this case success
OMGWTFBBQ EventBot on IRC Commit Made Tests Passed Events Commit Made Tests Passed Tests Passed Tests Passed Sunday, 17 March 2013 Servers with puppet/heira data that says they have that codebase do a git fetch Update local repo Means weʼre not dependent on the git master for deploys
Deploy TR-471 Sunday, 17 March 2013 For deploys, we have a thin web interface wrapping an MCollective client and agent Developers use web interface to pick what tag to deploy where - stage/live/whatever
Deploy Success Deploy TR-471 Deploy TR-471 Run pre-deploy script Checkout git work tree Run post-deploy script Sunday, 17 March 2013 MCollective agent triggers the deployment process on appropriate servers Deploy is all from the local git repo Predeploy script runs, sanity checks, whatever else is needed, if it passes ok Deploy - by checking out the git repo using the --work-tree flag to deploy to right place without git metadata in the dirtree Postdeploy script - database migrations, post deploy cleanup, etc Signal success back to the queue
Deploy Success Deploy TR-471 Deploy TR-471 Sunday, 17 March 2013 Rinse and repeat on the other servers. Weʼre doing them one at a time, but MCollective lets you do them all in parallel, in batches, with delays between them, itʼs very versatile Topic subscribers mark deploy in metrics, send messages to stakeholders, report in IRC, anything you want really Anyone can add other listeners and do things based on these events We ended up throwing pretty much everything onto the queue, and this had more interesting cultural effects
culture: mostly because of visibility Radical transparency - commits/deploys/metrics/failures/etc Make it harder to hide mistakes Make responsibility more shared Reduces 'because I say so' and makes it a discussion about a visible, shared bit of info Less siloing because it avoids everyone having their own special snowflake of every service Helps management : shows the business which sites/behaviour/etc are impacting on others People make interesting things you don’t anticipate once they have access to this kind of information and common toolset
is about ownership: The things you ‘own’ own you 'Owning things' - being the 'person who does X' is self-siloing The mail admin, the dba, etc Not useful Want to be toolmakers, not tools :) So must let go more Ops stop being the mystical holders of the information about the servers Transparency/Visibility again Inverting organisation - guts on display Solidarity
say? Should we all join the Socialist Workers Party? Again, JHR piped up Anarcho-syndicalism Principles of AS - 1) Workers' self-management, 2) Direct Action, 3) Workers' solidarity Good principles for DevOps teams
Hell - we found that there were some useful ‘sins’ that helped adoption: (credit to the Perl virtues here really) Envy: why do their deploys take less time than ours? why is their site more reliable? Pride: People want their projects to be better than other peoples Sloth: Better use of the tools and processes means less boring work for more effect Greed: How do I make my project more profitable? Gluttony: Failcake is really tasty, but fattening ;) Wrath: People are angry and frustrated when things don’t work, direct that at making more useful changes Lust: Must maintain a good work/life balance, so saving at least one sin for outside work is good Are we out of Hell yet?
of purification and temporary punishment on the way to Heaven How has DevOps been adopted, and how has it affected the business? Central, cross-functional teams Build services/APIs Search Mobile sites User management Commenting Etc Provide different tools with same principles - svn deployer Corp Java - came for the message queues, got git, jenkins, RESTful services etc
\ | .' '. , / ')\^_^/(' \ , \`--' . (_.> <._) . '--`/ '.__.' '._/ \_/ \_.' '.__.' / , _ , \ \ \_/|\_/ / \ //^\\ / \/` `\/ | | | | | | | | .. ..:::.| | ..::::. .. ..::::..::::... .::::::::| |:::::::::::::::. ::::::::::::::::::.:::::::::::| |:::::::::::::::::. ':::::::::::::::::::::::::::::| |::::::::::::::::::: ::::::::::::::::::::::::::::| |::::::::::::::::::' '':::' '::::::::::::::::\_.__./:::::::::::::::'' '':::::::::::::::::::::::::::::::::' jgs '::::::::::::::::::::::::::::::'' '':::::'' '''::::::'' Sunday, 17 March 2013 So, with the visible successes in place, it was much easier to approach the wider business about making these kind of changes, and we do that by Business needs: More content, More functionality, More quickly, More reliably: More revenue This is what DevOps adoption offers them We were able to show how the cultural and technical changes from devops practices provide what the business wants, not just technical teams Radical transparency makes the costs and benefits visible to all Don’t have to drive adoption so much: The developers on other sites wanted the same tools and privileges The project managers and publishers wanted the same increase in speed and decrease in cost Evangelising is done for you by the people you’d otherwise have to persuade to try it
__ /`_`\ __ .' '. | / \ | .' '. , / ')\^_^/(' \ , \`--' . (_.> <._) . '--`/ '.__.' '._/ \_/ \_.' '.__.' / , _ , \ \ \_/|\_/ / \ //^\\ / \/` `\/ | | | | | | | | .. ..:::.| | ..::::. .. ..::::..::::... .::::::::| |:::::::::::::::. ::::::::::::::::::.:::::::::::| |:::::::::::::::::. ':::::::::::::::::::::::::::::| |::::::::::::::::::: ::::::::::::::::::::::::::::| |::::::::::::::::::' '':::' '::::::::::::::::\_.__./:::::::::::::::'' '':::::::::::::::::::::::::::::::::' jgs '::::::::::::::::::::::::::::::'' '':::::'' '''::::::'' Sunday, 17 March 2013 So, with the visible successes in place, it was much easier to approach the wider business about making these kind of changes, and we do that by Business needs: More content, More functionality, More quickly, More reliably: More revenue This is what DevOps adoption offers them We were able to show how the cultural and technical changes from devops practices provide what the business wants, not just technical teams Radical transparency makes the costs and benefits visible to all Don’t have to drive adoption so much: The developers on other sites wanted the same tools and privileges The project managers and publishers wanted the same increase in speed and decrease in cost Evangelising is done for you by the people you’d otherwise have to persuade to try it
how teams that make these cultural and technical changes deliver these goals _ *"_"* __ /`_`\ __ .' '. | / \ | .' '. , / ')\^_^/(' \ , \`--' . (_.> <._) . '--`/ '.__.' '._/ \_/ \_.' '.__.' / , _ , \ \ \_/|\_/ / \ //^\\ / \/` `\/ | | | | | | | | .. ..:::.| | ..::::. .. ..::::..::::... .::::::::| |:::::::::::::::. ::::::::::::::::::.:::::::::::| |:::::::::::::::::. ':::::::::::::::::::::::::::::| |::::::::::::::::::: ::::::::::::::::::::::::::::| |::::::::::::::::::' '':::' '::::::::::::::::\_.__./:::::::::::::::'' '':::::::::::::::::::::::::::::::::' jgs '::::::::::::::::::::::::::::::'' '':::::'' '''::::::'' Sunday, 17 March 2013 So, with the visible successes in place, it was much easier to approach the wider business about making these kind of changes, and we do that by Business needs: More content, More functionality, More quickly, More reliably: More revenue This is what DevOps adoption offers them We were able to show how the cultural and technical changes from devops practices provide what the business wants, not just technical teams Radical transparency makes the costs and benefits visible to all Don’t have to drive adoption so much: The developers on other sites wanted the same tools and privileges The project managers and publishers wanted the same increase in speed and decrease in cost Evangelising is done for you by the people you’d otherwise have to persuade to try it
how teams that make these cultural and technical changes deliver these goals • Make the costs and benefits transparent _ *"_"* __ /`_`\ __ .' '. | / \ | .' '. , / ')\^_^/(' \ , \`--' . (_.> <._) . '--`/ '.__.' '._/ \_/ \_.' '.__.' / , _ , \ \ \_/|\_/ / \ //^\\ / \/` `\/ | | | | | | | | .. ..:::.| | ..::::. .. ..::::..::::... .::::::::| |:::::::::::::::. ::::::::::::::::::.:::::::::::| |:::::::::::::::::. ':::::::::::::::::::::::::::::| |::::::::::::::::::: ::::::::::::::::::::::::::::| |::::::::::::::::::' '':::' '::::::::::::::::\_.__./:::::::::::::::'' '':::::::::::::::::::::::::::::::::' jgs '::::::::::::::::::::::::::::::'' '':::::'' '''::::::'' Sunday, 17 March 2013 So, with the visible successes in place, it was much easier to approach the wider business about making these kind of changes, and we do that by Business needs: More content, More functionality, More quickly, More reliably: More revenue This is what DevOps adoption offers them We were able to show how the cultural and technical changes from devops practices provide what the business wants, not just technical teams Radical transparency makes the costs and benefits visible to all Don’t have to drive adoption so much: The developers on other sites wanted the same tools and privileges The project managers and publishers wanted the same increase in speed and decrease in cost Evangelising is done for you by the people you’d otherwise have to persuade to try it
how teams that make these cultural and technical changes deliver these goals • Make the costs and benefits transparent • Deliver first, then evangelise _ *"_"* __ /`_`\ __ .' '. | / \ | .' '. , / ')\^_^/(' \ , \`--' . (_.> <._) . '--`/ '.__.' '._/ \_/ \_.' '.__.' / , _ , \ \ \_/|\_/ / \ //^\\ / \/` `\/ | | | | | | | | .. ..:::.| | ..::::. .. ..::::..::::... .::::::::| |:::::::::::::::. ::::::::::::::::::.:::::::::::| |:::::::::::::::::. ':::::::::::::::::::::::::::::| |::::::::::::::::::: ::::::::::::::::::::::::::::| |::::::::::::::::::' '':::' '::::::::::::::::\_.__./:::::::::::::::'' '':::::::::::::::::::::::::::::::::' jgs '::::::::::::::::::::::::::::::'' '':::::'' '''::::::'' Sunday, 17 March 2013 So, with the visible successes in place, it was much easier to approach the wider business about making these kind of changes, and we do that by Business needs: More content, More functionality, More quickly, More reliably: More revenue This is what DevOps adoption offers them We were able to show how the cultural and technical changes from devops practices provide what the business wants, not just technical teams Radical transparency makes the costs and benefits visible to all Don’t have to drive adoption so much: The developers on other sites wanted the same tools and privileges The project managers and publishers wanted the same increase in speed and decrease in cost Evangelising is done for you by the people you’d otherwise have to persuade to try it
work our way out of hell It’s not finished yet, there’s still lots of work to do It’s not an easy road It starts from the bottom of the pit, and the only way is upwards For us it was about the culture and the people Give people the chance to try something better and see it working for others Give them the tools and flexibility to climb out themselves Let the early adopters lead the rest
Future: http://futureplc.com/ • Future DevOps Blog: http://ops.failcake.net/ • Contact me: [email protected], @thesamoth Sunday, 17 March 2013 To help others out of hell, we’ve open sourced a lot of the tools we’ve built along the road I hope seeing some of our journey has been useful for you Thank you - Any questions?