$30 off During Our Annual Pro Sale. View Details »

DevOps - Transforming the way you think about IT

DevOps - Transforming the way you think about IT

IT is the business. DevOps the cultural an professional movement that embodies the new approaches to IT.

This was presented at the Wharton Web Conference. #wwc2013

Nathen Harvey

July 30, 2013
Tweet

More Decks by Nathen Harvey

Other Decks in Technology

Transcript

  1. DevOps Transforming the way you think about IT Nathen Harvey

    @nathenharvey 1
  2. Nathen Harvey ‣ Technical Community Manager at Opscode ‣ Co-host

    of the Food Fight Show Podcast ‣ Meetup Organizer ‣ Formerly: Web Operations ‣ @nathenharvey 2
  3. Who are you? ‣ Developers? ‣ Systems Administrators? ‣ DevOps?

    ‣ “Business” People? ‣ Executives? 3
  4. Are you sure? 4

  5. DevOps 5 ‣ Cultural and Professional movement ‣ Development and

    Operations working together
  6. Aligning Objectives 6 http://www.flickr.com/photos/amylovesyah/5042999235 http://www.flickr.com/photos/maryamandathompson/4810162947/

  7. DevOps 7 ‣ Cultural and Professional movement ‣ Development and

    Operations working together ‣ Leveraging ideas & processes from other industries ‣ To enable the business
  8. Technology – The Way Business Engages Customers

  9. Globalization ‣ 40 years for container ships to move 70%

    of seaborne trade ‣ 22 years for internet access to reach 78% penetration in North America 9 WTO Trends in Globalization http://www.wto.org/english/res_e/booksp_e/anrep_e/wtr08-2b_e.pdf http://www.flickr.com/photos/duke_raoul/2261478794/sizes/l/in/photostream/
  10. Globalization ‣ Online retail sales are 7% of all retail

    sales ‣ 75% of 2011 Thanksgiving shoppers did so online ‣ 42% of all retail purchases were influenced by online research – accounting for ~50% of total retail spending. 10 WTO Trends in Globalization http://www.wto.org/english/res_e/booksp_e/anrep_e/wtr08-2b_e.pdf http://www.flickr.com/photos/duke_raoul/2261478794/sizes/l/in/photostream/
  11. 95% of the western world own cell phones ‣ 42%

    are smartphones ‣ 58% will be on next purchase ‣ 4.2 Billion phones globally for 7.09 Billion people (USCB) 11 http://ssiknowledgewatch.com/2012/05/09/cell-phones-approach-total-penetration-globally-with-smartphones-moving-toward-market-dominance-2/ http://www.brightsideofnews.com/news/2011/1/26/digital-divide-global-household-penetration-rates-for-technology.aspx?pageid=1
  12. Software is the interface for consumption 12

  13. None
  14. None
  15. None
  16. None
  17. None
  18. None
  19. The Result: The Coded Business How: Redefinition of how to

    use technology to create business value Why: To rapidly deliver experiences, goods and services to customers What: Consumer-facing businesses Drivers of IT Innovation
  20. The Rise of the Coded Business ‣ Changes are outpacing

    skills development ‣ IT is moving from the back office to the front office ‣ Customers prefer digital consumption ‣ Technology directly supports customer interactions ‣ Accelerated pace of change ‣ Companies must move faster to compete
  21. Patterns of the Coded Business Business Agility Development Velocity and

    Consistency Continuous Delivery IT Automation IT enables Business Agility and becomes a strategic advantage rather than a cost center.
  22. Manufacturing Financial Services Retail Media and Entertainment High Technology Healthcare

    The Coded Business is Coming To Every Enterprise
  23. Scale x Complexity > Skills 23

  24. Is the cultural and professional movement that grew directly from

    the collective experience of the pioneers of this transition It’s application to traditional IT is 1:1 The business adaptations encapsulated in Devops will eventually be ubiquitous ....At least, if you want to be great at the next couple decades of global economic growth DevOps
  25. ‣ Culture ‣ Automation ‣ Measurement ‣ Sharing DevOps -

    CAMS 25
  26. Open Communication ‣ Developers & Operations talk and listen to

    one another ‣ Production & build metrics are available to all ‣ Current infrastructure is documented 26 Walls, Mandi. O’Reilly Media. 2013
  27. Incentive & Responsibility Alignment ‣ Create awesome customer experiences ‣

    Focus on responsibility & accountability, not authority ‣ You are responsible for your own uptime 27 Walls, Mandi. O’Reilly Media. 2013
  28. Respect ‣ You don’t have to like each other but

    you do need to recognize contributions and treat each other well 28 Walls, Mandi. O’Reilly Media. 2013
  29. Trust ‣ Trust that everyone is competent and working toward

    the common goals ‣ Without trust, the tools don’t matter 29 Walls, Mandi. O’Reilly Media. 2013
  30. Trust Others to Behave Responsibly 30

  31. Stop Tolerating Assholes 31

  32. You’re an Asshole if: 32 ‣ After encountering you, people

    feel oppressed, humiliated, or otherwise worse about themselves ‣ You target people less powerful than you Sutton, Robert. Business Plus. 2007
  33. Assholes are incompetent - let’s shun them 33

  34. Effective Communication 34 ‣ Lead with questions, not statements ‣

    Understand the effort and time others have invested ‣ Avoid the passive- aggressive snark http://www.flickr.com/photos/aloha75/4753674243/sizes/l/in/photostream/
  35. ‣ Culture ‣ Automation ‣ Measurement ‣ Sharing DevOps -

    CAMS 35
  36. Scale x Complexity > Skills 36

  37. Managing Complexity Then Web Servers Application Servers Database Add 1

    server 20+ Changes To Add a New Server… • 2x Web Server Configurations • 2 Web Server Restarts • 4x Database Configurations • 8x Firewall Configurations • DNS Service • Network Configuration • Deployer • 8x Monitoring Changes The Bottom Line… 20+ Changes 12+ New Infrastructure Dependencies 4+ Hours
  38. Managing Complexity Later We added: • Load Balancers • MemCache

    • Search Appliances • Lots of VM’s • More Scale Exponential Increase In: • Configuration Changes • Infrastructure Dependencies • Skills Needed • Greater Risk
  39. How Do we Manage This at Cloud Scale? • Thousands

    of infrastructure dependencies and configurations needed for each change. • Huge Amounts of Time • Increased Cost of Correction of Manual Errors • Huge Need for Talent • Risk of Critical Skills Shortage Managing Complexity Today
  40. Full Automation Common Automation Tasks: Scripts, OS Compliance, Updates &

    Patches Configuration Management Discovery and Visibility Application Management Continuous Deployment Automation is a People, Process, and Technology Journey The Path to the Coded Business
  41. Chef is Infrastructure as Code •Programmatically provision and configure •Treat

    like any other code base •Reconstruct business from code repository, data backup, and bare metal resources. http://www.flickr.com/photos/louisb/4555295187/
  42. Programs •Chef generates configurations directly on nodes from their run

    list •Reduce management complexity through abstraction •Store the configuration of your programs in version control http://www.flickr.com/photos/ssoosay/5126146763/
  43. Declarative Interface to Resources •Define Policy •Say what, not how

    •Pull not Push http://www.flickr.com/photos/bixentro/2591838509/
  44. That Looks Like This package "apache2" template "/etc/apache2/apache2.conf" do source

    "apache2.conf.erb" owner "root" group "root" mode "0644" variables(:allow_override => "All") notifies :reload, "service[apache2]" end service "apache2" do action [:enable,:start] supports :reload => true end
  45. Ohai "languages": { "ruby": { }, "perl": { "version": "5.14.2",

    "archname": "x86_64-linux-gnu-thread-multi" }, "python": { "version": "2.7.3", "builddate": "Aug 1 2012, 05:14:39" }, "php": { "version": "5.3.10-1ubuntu3.6", "builddate": "(cli) (built: Mar" } }, "kernel": { "name": "Linux", "release": "3.2.0-32-virtual", "version": "#51-Ubuntu SMP Wed Sep 26 21:53:42 UTC 2012", "machine": "x86_64", "modules": { "isofs": { "size": "40257", "refcount": "0" }, "acpiphp": { "size": "24231", "refcount": "0" } }, "os": "GNU/Linux" }, "os": "linux", "os_version": "3.2.0-32-virtual", "ohai_time": 1369328621.3456137, "network": { "interfaces": { "lo": { "mtu": "16436", "flags": [ "LOOPBACK", "UP", "LOWER_UP" ], "encapsulation": "Loopback", "addresses": { "127.0.0.1": { "family": "inet", "prefixlen": "8", "netmask": "255.0.0.0", "scope": "Node" }, "::1": { "family": "inet6", "prefixlen": "128", "scope": "Node" } }, "state": "unknown" }, "eth0": { "type": "eth", "number": "0", "mtu": "1500",
  46. execute "load sysctl" do command "/sbin/sysctl -p" action :nothing end

    bytes = node['memory']['total'].split("kB")[0].to_i * 1024 / 3, pages = node['memory']['total'].split("kB")[0].to_i * 1024 / 3 / 2048 # adjust shared memory and semaphores template "/etc/sysctl.conf" do source "sysctl.conf.erb" variables( :shmmax_in_bytes => bytes, :shmall_in_pages => pages ) notifies :run, "execute[load sysctl]", :immediately end Decide what to declare
  47. execute "load sysctl" do command "/sbin/sysctl -p" action :nothing end

    bytes = node['memory']['total'].split("kB")[0].to_i * 1024 / 3, pages = node['memory']['total'].split("kB")[0].to_i * 1024 / 3 / 2048 # adjust shared memory and semaphores template "/etc/sysctl.conf" do source "sysctl.conf.erb" variables( :shmmax_in_bytes => bytes, :shmall_in_pages => pages ) notifies :run, "execute[load sysctl]", :immediately end Decide what to declare
  48. execute "load sysctl" do command "/sbin/sysctl -p" action :nothing end

    bytes = node['memory']['total'].split("kB")[0].to_i * 1024 / 3, pages = node['memory']['total'].split("kB")[0].to_i * 1024 / 3 / 2048 # adjust shared memory and semaphores template "/etc/sysctl.conf" do source "sysctl.conf.erb" variables( :shmmax_in_bytes => bytes, :shmall_in_pages => pages ) notifies :run, "execute[load sysctl]", :immediately end Decide what to declare
  49. Recipes and Cookbooks •Recipes are collections of Resources •Cookbooks contain

    recipes, templates, files, custom resources, etc •Code re-use and modularity http://www.flickr.com/photos/shutterhacks/4474421855/
  50. Roles Server Server Server Server chef-server API chef-client “role[webserver]” node

    ntp client.rb openssh server.rb apache default.rb php default.rb chef-client “role[database]” node ntp client.rb openssh server.rb mysql server.rb
  51. http://www.flickr.com/photos/kathycsus/ 2686772625 • IP addresses • Hostnames • FQDNs •

    Search for nodes with Roles • Find configuration data Search
  52. Search for Nodes pool_members = search("node","role:webserver") template "/etc/haproxy/haproxy.cfg" do source

    "haproxy-app_lb.cfg.erb" owner "root" group "root" mode 0644 variables :pool_members => pool_members.uniq notifies :restart, "service[haproxy]" end
  53. Pass results into Templates # Set up application listeners here.

    listen application 0.0.0.0:80 balance roundrobin <% @pool_members.each do |member| -%> server <%= member[:hostname] %> <%= member[:ipaddress] %>:> weight 1 maxconn 1 check <% end -%> <% if node["haproxy"]["enable_admin"] -%> listen admin 0.0.0.0:22002 mode http stats uri / <% end -%>
  54. Pass results into Templates # Set up application listeners here.

    listen application 0.0.0.0:80 balance roundrobin <% @pool_members.each do |member| -%> server <%= member[:hostname] %> <%= member[:ipaddress] %>:> weight 1 maxconn 1 check <% end -%> <% if node["haproxy"]["enable_admin"] -%> listen admin 0.0.0.0:22002 mode http stats uri / <% end -%>
  55. Pass results into Templates # Set up application listeners here.

    listen application 0.0.0.0:80 balance roundrobin <% @pool_members.each do |member| -%> server <%= member[:hostname] %> <%= member[:ipaddress] %>:> weight 1 maxconn 1 check <% end -%> <% if node["haproxy"]["enable_admin"] -%> listen admin 0.0.0.0:22002 mode http stats uri / <% end -%>
  56. munin::server example node.set[:munin][:server] = true munin_clients = search(:node, "munin_client:true") cookbook_file

    "/etc/cron.d/munin" do source "munin-cron" mode "0644" owner "root" group "root" end template "/etc/munin/munin.conf" do source "munin.conf.erb" mode 0644 variables(:munin_clients => munin_clients) end
  57. Jboss App Memcache Postgres Slaves Postgres Master Nagios Graphite So

    when this...
  58. Jboss App Memcache Postgres Slaves Postgres Master Nagios Graphite ...

    becomes this ...
  59. Jboss App Memcache Postgres Slaves Postgres Master Nagios Graphite ...this

    can happen automatically
  60. Nagios Graphite Jboss App Memcache Postgres Slaves • Load balancer

    config • Nagios host ping • Nagios host ssh • Nagios host HTTP • Nagios host app health • Graphite CPU • Graphite Memory • Graphite Disk • Graphite SNMP • Memcache firewall • Postgres firewall • Postgres authZ config •12+ resource changes for 1 node addition Count the Resources
  61. Continuous Delivery 61 ‣ Business needs to deliver a better

    customer experience as quickly and safely as possible. http://www.thoughtworks.com/imgs/continuous-delivery.jpg
  62. Continuous Delivery ‣ Version Control System 62

  63. Not Version Control ‣ cp foo{,.bak} ‣ cp foo{,.`date “+%Y%m%d%%H%M%S”`}

    63
  64. Continuous Delivery ‣ Distributed Version Control System ‣ Dependency management

    ‣ Software Configuration ‣ Environments ‣ Continuous Integration 64
  65. Deployment Pipeline 65

  66. Commit Code 66 Application Devs! Infrastructure Devs! Software Configuration Management!

    (SCM)!
  67. Automated Build Infrastructure Devs! Software Configuration Management! (SCM)! Build! Pulling!

    Tag! Payload! N! Payload! 3! Payload! 2! Payload! 1! 67 Application Devs! Infrastructure Devs! Software Configuration Management! (SCM)!
  68. Deployment Pipeline 68 1! 2! ….! Software Configuration Management! (SCM)!

    Build! Pulling! Tag! Payload! N! Payload! 3! Payload! 2! Payload! 1! Create Data (#)! Upload Cookbook! Autodeploy to Local Host! Update DEV! Request Portal! Chef Server! Bootstrap & Autodeploy! Infrastructure Devs! QA! DEV! …..! PROD! 1, 2, … N! …..! N! ! ! Promote! Promote!
  69. ‣ Culture ‣ Automation ‣ Measurement ‣ Sharing DevOps -

    CAMS 69
  70. What to measure? ‣ Measure everything! ‣ Performance metrics ‣

    Process metrics ‣ People metrics 70
  71. Availability ‣ A = MTTF/MTBF = MTTF / (MTTF +

    MTTD + MTTR) ‣ MTTD - Mean time to Diagnose ‣ MTTR - Mean time to Repair ‣ MTTF - Mean time to Failure ‣ MTBF - Mean time between Failures 71
  72. Availability ‣ A = MTTF/MTBF = MTTF / (MTTF +

    MTTD + MTTR) ‣ MTTD - Mean time to Diagnose ‣ MTTR - Mean time to Repair 72
  73. Failure ‣ It’s not “if” but “when” ‣ Focus on

    ‣ MTTD - Mean time to Diagnose ‣ MTTR - Mean time to Repair ‣ MTTR > MTBF! 73
  74. Measure the right things! ‣ Is CPU usage important enough

    to measure? ‣ Should you care about an individual host / server? ‣ Are the web servers responding quickly? ‣ How many deploys were completed today? ‣ Are customers able to checkout? 74
  75. Measurement & Monitoring ‣ Service availability > Server availability ‣

    Measure everything, alert on the important metrics ‣ Monitors and measurements are code, treat them as such 75
  76. Drill

  77. ‣ Culture ‣ Automation ‣ Measurement ‣ Sharing DevOps -

    CAMS 77
  78. Internally ‣ Successes ‣ Failures ‣ Metrics ‣ Ideas ‣

    Code 78
  79. Externally ‣ Successes ‣ Failures ‣ Metrics ‣ Ideas ‣

    Code 79
  80. Sharing ‣ Conferences ‣ Blogs, papers, articles, etc. ‣ Podcasts

    80
  81. ‣ Culture ‣ Automation ‣ Measurement ‣ Sharing DevOps -

    CAMS 81
  82. DevOps Practices ‣ Value Chain Mapping ‣ Virtualization ‣ Configuration

    Management ‣ Disposable Infrastructure 82
  83. DevOps Practices ‣ Developers on call ‣ System Administrator Coders

    83
  84. When is it done? ‣ Committed to version control ‣

    Tests are passing ‣ Deployed to production ‣ Monitored in production ‣ Customers getting value 84
  85. Blameless Post Mortems ‣ Include all stakeholders ‣ Agree on

    timeline ‣ Identify the conditions that led to the failure ‣ Create tickets 85
  86. IT - Extended Family ‣ Infrastructure as a Service ‣

    Platform as a Service ‣ Software as a Service 86
  87. Is the cultural and professional movement that grew directly from

    the collective experience of the pioneers of this transition It’s application to traditional IT is 1:1 The business adaptations encapsulated in Devops will eventually be ubiquitous ....At least, if you want to be great at the next couple decades of global economic growth DevOps
  88. Businesses must deliver better customer experience as quickly and safely

    as possible. Safety matters! Failure to do so will have serious impacts on customer satisfaction and loyalty – just like it did when Sam Walton was the Ghengis Kahn of rural retail. http://www.flickr.com/photos/huffstutterrobertl/5088855119/lightbox/ Continuous Delivery
  89. DevOps is a response to a shift in the functional

    meaning of IT Continuous Delivery is a response to a shift in the pace of innovation The map is not the territory http://www.flickr.com/photos/huffstutterrobertl/4209372378/sizes/l/in/photostream/
  90. Focus on responsibility and accountability, rather than authority Functional teams

    have responsibility for design, implementation, and administration of their products and services – cradle to grave. Architecture, Security, Systems Administration, and QA become universal responsibilities, with experts who set standards and build tools to enable the business to do the right thing. Business leaders set priorities and direction, and have close communication loops with teams doing implementation work. Build a culture of personal empowerment and accountability
  91. Have a strong reliance on centralized decision making and environmental

    gates. Cannot ever point at individuals who are responsible for outcomes Have few, if any, capable “full stack” engineers “Architects” responsible for high level design, but no real commitment to implementation Companies that get this wrong…
  92. “Progress on safety coincides with learning from failure. This makes

    punishment and learning two mutually exclusive activities: Organizations can either learn from an accident or punish the individuals involved in it, but hardly do both at the same time. ... Learning challenges and potentially changes the belief about what creates safety. Moreover, punishment emphasizes that failures are deviant, that they do not naturally belong in the organization...” Sidney W.A. Dekker, Ten Questions about Human Error: A New View of No blame post-mortems Treat failure as a learning opportunity
  93. http://www.flickr.com/photos/lighttable/4981112645/sizes/o/in/photostream/ “The number 1 thing we can’t do is get

    in people’s way.” Phil Dibowitz, Facebook Become allergic to things that make you slow
  94. Metrics are collected obsessively Decisions are based on data rather

    than emotions Measure, evaluate, tweak, and iterate based on observable outcomes. http://www.flickr.com/photos/stevenharris/4775722590/sizes/z/in/photostream/ Stop arguing. Start measuring!
  95. Successfully navigating this transition means changing the fundamental workflows by

    which the business operates Understand the full scope of the transition How much or how little depends on the shape of the company - but all disciplines are deeply impacted
  96. Existing business structures and technology choices are reflections of the

    problems of their era A fundamental shift in the problem requires a re-consideration of structural and technological choices Are choices made because of solid technical reasons, or faux business requirements? Do not confuse existing structures for hard business requirements
  97. Accept that you cannot transform the entire organization at once

    Undertaking smaller changes organization wide often leads to mediocrity Successful transitions happen in sections of the business Confine the blast radius, but don’t limit the magnitude of the explosion
  98. Example: Choice of source code control system deeply impacts the

    development workflows and continuous integration platform. These impact asset creation and storage, which can impact production deployment methodologies, which impact audit and remediation, etc. http://www.flickr.com/photos/usnavy/7494170678/sizes/l/in/photostream/ Take a whole-systems view of your technology platform
  99. Re-enforce culture with technology, and vice versa “Tooling is culture

    institutionalized” - Adam Jacob
  100. ‣ "In ten years, I'm certain every COO worth their

    salt will have come from IT. Any COO who doesn't intimately understand the IT systems that actually run the business is just an empty suit, relying on someone else to do their job." Essential Reading 100 Kim, Gene; Behr, Kevin ; Spafford, George (2013-01-10). The Phoenix Project: A Novel About IT, DevOps, and Helping Your Business Win (Kindle Locations 5805-5807). IT Revolution Press. Kindle Edition.
  101. ‣ “The Web is changing the way we live and

    touches every person alive. As more and more people depend on the Web, they depend on us. Web Operations is work that matters” Essential Reading 101 John Allspaw & Jesse Robbins Web Operations: Keeping the Data on Time O’Reilly. 2010.
  102. Essential Reading 102

  103. Thank You! ‣ What questions do you have? ‣ Nathen

    Harvey ‣ nharvey@opscode.com ‣ @nathenharvey 103