Why continuous delivery needs devops, and why devops needs infrastructure-as-code

Why continuous delivery needs devops, and why devops needs infrastructure-as-code

Continuous delivery and devops have gone mainstream, at least in terms of mindshare. As a result, a lot of vendors have jumped onto the bandwagon. Most products that have anything to do with deployment now try to associate themselves with devops and continuous delivery. In this webinar sponsored by ThoughtWorks Studios, I try to clear the air in a product independent manner. I also cover common devops anti-patterns and explain the idea of infrastructure as code.
As it turns out, this is as much a talk on the design of an effective Agile IT Organization design as it is a talk on the stated topic. Things are inter-related.

53a11eae40e7036ed17960e0705855b8?s=128

Sriram Narayan

October 25, 2012
Tweet

Transcript

  1. 1

  2. 2

  3. This webinar is brought to you by Studios – the

    products division of TW. Studios currently has a portfolio of 3 products. Mingle is our Agile project mgmt tool. Go is our CD tool. Twist is an automated functional testing tool that can work with test drivers such as selenium, sahi and webdriver. Dave Farley and Jez Humble from TW wrote the award winning book on continuous delivery that has now become a de-facto industry reference. Studios also offers a variety of training programs the details of which are available form www.thoughtworks-studios.com/services/agile-workshops 3
  4. ThoughtWorks also has a group of technologists called the Technology

    Advisory Board – members include the CTO Rebecca Parsons and Martin Fowler. This group helps with tech strategy for tw and also publishes a radar of trends and recommendations. The latest version was just published two days ago and it is available from thoughtworks.com/radar The anthology is a compilation of essays on s/w technology and innovation. The first anthology was published in March 2008. Its first chapter called ‘solving the business s/w last mile’ is quite relevant to my topic today. The second anthology has just been released this month. 4
  5. I’ll quickly summarize what I am going to cover in

    detail. Why does CD need devops? Because you can’t have CD with a siloed org and devops helps blur the strict boundary between dev and ops. So I’m going to talk about what a silo is, how silos hinder CD, what causes silos, how tools encourage silos and how devops is meant to down the dev and ops silos. Next I’ll explain what IaC means. Devops needs IaC because IaC helps create a common currency between dev and ops so that transactions between dev and ops is more fluid. IaC helps skill crossovers happen in both directions. 5
  6. Before we go futher, it is useful to quickly review

    definitions of CD and devops 6
  7. The first definition is from conversations within TW. Continuous delivery

    is an approach to delivering software that reduces the cost, time, and risk of delivering incremental changes to users. I think of it as agile software delivery/release instead of agile software development. Agile software development stops with the handover of a tested build to a team responsible for deploying it into production. Agile software delivery addresses the last mile of software development as described in the first TW Anthology. Unfortunately even the term delivery has now come to mean delivery of software from IT vendor to client without regard for whether it is actually deployed into production and made available to users. In the context of continuous delivery, delivery means release into production. Some people say this definition is incomplete because it doesn’t say a word about the customer or the business. I think this is to be understood as the context within which this definition makes sense. Agile is all about customer centricity and business value and CD is nothing but agile s/w delivery. 7
  8. So if you are doing CD, Your software is releasable

    throughout its lifecycle Note that it says releasable, not just deployable. A release is a deployment to production. A deployment to production is often different from a deployment to any other environment in a number of ways. We may need to have a tested rollback plan, we may need to do the rollout within strict downtime windows, we may have to execute and verify data migration, we may need to update our infrastructure inventory, change firewall and DNS config and so on. 8
  9. This is a tough ask because we always have time

    to market pressures. Some examples - Data migration scripts up to date - Automated tests up to date - Deployment scripts up to date with any new components, libraries - Note that software is not in a releasable state if it is being developed in a branch and needs to be merged into trunk in order to push to production This often calls for automation 9
  10. For every commit, production readiness check is: - Automated -

    Fast - Available to anybody on the team 10
  11. This implies the ability to automatically provision the target infrastructure,

    deploy, configure and validate 11
  12. Yes it is a high bar and it isn’t easy.

    We can’t get there in one leap. The continuous delivery book suggests a five stage maturity model that progresses from regressive to repeatable to consistent to quantitatively managed to optimizing. I won’t go into further detail on the maturity model as it is not relevant for this talk. 12
  13. Next we come to the definition of devops Devops is

    culture and practices and we’ll see that culture is influenced by the presence of organizational silos. 13
  14. 14

  15. It is also useful to understand what devops is not.

    A typical agile set up looks like this. From an IT point of view, we have 3 broad silos – business, development and operations. A lot of places have many more silos but this picture is enough to understand what devops is not. What happens in a number of places is that the VP-operations reads about devops and continuous delivery and decides that his team should now acquire devops capability. Accordingly they evaluate and buy some product claiming to be a devops enabler, do a bit of research on tools like Chef and Puppet, start version controlling their scripts and then rename their department to devops. 15
  16. It isn’t devops if you still have a dev silo

    and an ops silo – Jez recently reiterated this point in his post called, ‘there is no such thing as a devops team’ The whole point of devops is to merge the dev and ops silo 16
  17. Let’s dig deeper into silos. Is it a silo only

    if there are different departments? What is the problem with silos? After all, they seem to be a sensible enough way of organizing labour. Once we recognize the problem, we’ll explore various causes of Silo formation. Once we appreciate the causes, it should be possible to take preventive or remedial action. 17
  18. I guess we all recognize this to a smaller or

    greater extent in our own organizations. By delivery value stream, I mean the end-to-end chain of interactions and value additions from concept to cash or from requirements to release. 18
  19. So what is the problem with silos? N number of

    silos require N-1 number of handoffs for a work-item to pass through the value stream. In our context, this work item is a build. If the testing team is separate from the development team, they will not accept builds on a continuous basis but rather have their own calendar by which to take new builds. Different teams means a communication protocol enforced by a work tracking tool or a single point of contact. It means meetings between team representatives with documented minutes of meetings. Feedback loops lengthen. Team managers try to showcase their team’s performance with team level metrics. This means incoming work gets queued and prioritized based on some centrally conceived criteria. The dependent teams get frustrated with turn around times and attempt priority escalations. Not a healthy collaborative climate. But the biggest problem with handoffs is they are only feasible with large batch sizes. A separate database team will not entertain piecemeal requests for query optimization. They’d rather own the data model and enforce indexing conventions across the board. They won’t review or help with unit level database migration scripts. They’d rather review the whole set of migrations when the application is ready for UAT or some other similar state of maturity. But large batch sizes are a problem. 19
  20. We all want to take our features to the market

    faster than ever and this means shorter cycle times in the delivery value stream. Short cycles require small batch sizes. In his book The Principles of Product Development Flow, Donald Reinertsen argues that reducing batch size helps reduce cycle time, helps prevent scope creep, helps reduce risk and increases team motivation. Now, you can’t have small batches with too many silos because it multiples the number of handoffs. Small batches with many silos makes the system unresponsive. This is analogous to the problems associated with chatty service design in a service oriented architecture. So we have to tear down silos. But this does not just mean addressing the design of teams and organizations. We need to address all the things that contribute to silo-like behaviour. http://www.informit.com/articles/article.aspx?p=1833567&seqNum=3 20
  21. Org structure and reporting hierarchy are a clear and visible

    cause of silos. Some places have a VP -dev, VP – QA and VP – ops each owning their resources and renting them out to projects. This creates un-necessary politics and handoffs. Geographic separation also contributes to silos because it encourages batching of work. It doesn’t matter if people are separated across buildings or continents Specialty tools Different work tracking/planning systems, VCS Separate tools for CI and deploy Commercial tool licensing – ill effects Specialty teams Database team, build & deploy team, frameworks team, architecture team A good way to address speciality teams and org structure is to move towards cross- functional teams 21
  22. So here we move from a structure that has 7

    silos each with its own VP to a number of cross functional teams each with a delivery manager and a product owner 22
  23. So if you are an e-commerce shop with business divisions

    such as sourcing, marketing, fulfillment then it makes sense to have IT teams along these verticals rather than around technology horizontals Cycle time is all about responsiveness. When specialists start contributing outside their speciality, we call them generalizing specialists. 23
  24. product watch out: Products that claim to be devops products

    are likely to be just rebranded ops products. A separate ops product is likely to encourage an ops silo. The latest tw radar has specific advice around tools pretending to be general purpose CD tools. 24
  25. How can you make out if a tool supports a

    pipeline as a first-class feature. It should be possible to trigger a pipeline as a unit, it should be possible to make one pipeline depend on another. It should be possible for artifacts to flow through pipelines. It should be possible to have some access control at the level of a pipeline. It should be possible to associate pipelines with environments. As an example, here is a screenshot of a delivery value stream modeled in Go. 25
  26. End to end - Artifact traceability - Deployment orchestration -

    Audit trail Flexibile modeling of your value stream One tool for all specialists – no silos When you have a specialized deployment tool, you often wonder where the artifact came from but the tool can’t help you because the artifact came from another universe called the build tool or the CI server. In the universe of the deployment tool, the birth of the artifact is a singularity and it is invalid to ask what happened before the Big Bang. 26
  27. Now we come to second part of this presentation. Before

    we go further, lets see what we mean by infra-as-code 27
  28. http://blog.csanchez.org/2012/03/13/infrastructure-as-code/ What is infrastructure? At a basic level, it is

    physical and virtualized hardware. Then we have different flavours of server operating systems. Next we have specific roles such as DNS servers, firewalls, load balancers, caches, web servers, database servers and so on. To manage all this, what we have today is a class of mostly open source tools like Chef, Puppet and Ansible that call themselves as infrastructure-configuration-management- tools. They provide a domain model of infrastructure and a way to declaratively describe a deployment topology. We’ll see some examples of this shortly. Once you have your deployments described using these tools, it becomes possible to clear what is called the Phoenix test. How long does it take after a server crash to recreate another server from scratch? Assuming you have data backups and the server configuration is completely described in the language of these tools and available in version control, it should be possible to re-create a server in about an hour. Contrast this with the state of many operations teams today – every server is a unique snowflake that may not be re-created at all. 28
  29. 29

  30. Not going to describe chef in detail, just using its

    domain model as an example A resource is an abstraction that represents a particular thing that needs to be configured, such as a package or a service. A recipe is a Ruby DSL configuration file that you write to encapsulate resources that should be configured by Chef. The chef client communicates with a Chef server to download the cookbooks it needs to compile and run its configuration. 30
  31. 1. This is a simple example of declaratively provisioning a

    service. Chef provides a ruby based domain specific language to do this. This snippet is part of a recipe. Package and service are Chef primitives that let you declaratively install packages and control services. 2. This shows how to accommodate variations in configuration across environments and yet avoid duplication in deployment scripts 3. Finally, we see how to script a master-slave config for a database without a static specification of the slave ip addresses. Note that this is not what is called as ad-hoc scripting. Firstly they are quite declarative and therefore very unlike scripts. Second, they are written on top of a commonly understood domain model. Third, they are meant to be version-controlled and subject to continuous integration just like application code. CI in this case means triggering early stage deployments for every commit of infrastructure code. Fourth, the base recipies are quite re-usable. Chef and Puppet provide a community supported repository of base recipies for a wide variety of infrastructure software. It is common today for software authors to provide the base recipies for their software on these repositories. Ad-hoc scripts still exist in many organizations. Typically they are long, hard to understand, shell scripts written by sysadmins. They reside in the author’s laptop, not in version control and only the author knows what servers have been configured with it. What we have here is far from ad-hoc scripting. I’d like to call out the messaging of deployment tool vendors. They muddy the waters by referring to this as labour intensive ad-hoc scripting. They might as well refer to a well factored application codebase as ad- 31
  32. hoc coding. 31

  33. When build and deployment people use a programming language similar

    to application developers, it helps bridge the gap between them. When we use the same source code repository for application and infrastructure code, we create opportunities to seamlessly address deployment as part of application development. We thus establish a common currency for transactions between devs and ops. Specialized deployment tools don’t provide this common currency. They are often not even designed with this sort of text descriptor based versioning in mind. They provide graphical user interfaces to describe the deployment of a java webapp to a tomcat server or some such simple landscape. You are out of luck if your landscape is diverse. Some proprietary tools even offer doodleware by which you can visually map a package to its target node etc. No more hand-coded scripts, they say. For those of us who have been around for a while, we know where we’ve heard this story before – business process execution language, visual rules engines, visual programming. In due course, people realize that when it comes to programming, text is deceptively powerful. 32
  34. Text is a great example of common currency. There is

    so much good free tooling around text that it makes collaboration and automation much easier. Independent version control + text is a powerful combination. 33
  35. Once you have common currency and cross-functional teams, skill crossover

    becomes possible. 34
  36. 5 pipelines in this sample value stream Last 3 dep

    pipelines have their own environments and target nodes The input to this VS is via the app src repo or the infra code repo Of course this is simplistic, a more realistic example will include an integration pipeline 35
  37. I’ve tried to sum it up with this illustration. Continuous

    delivery and devops span the entire value stream from requirements to monitoring a release in production. But it is not a linear progression from left to right. It has to be iterative. Different roles in a team can iterate together if there are no silos and if specialists are also open to generalizing a bit. All this is facilitated by right organization and team design and also by the right choice of what I call, silo-discouraging tools. Establishing a common currency in the form of version controlled text descriptors helps with collaboration and cross-pollination of skills. Finally, it is valuable to have one over-arching tool that lets you visualize, orchestrate and trace through your entire value stream. 36
  38. 37