Upgrade to Pro — share decks privately, control downloads, hide ads and more …

DevOps - Transforming the way you think about IT

DevOps - Transforming the way you think about IT

IT is the business. DevOps the cultural an professional movement that embodies the new approaches to IT.

This was presented at the Wharton Web Conference. #wwc2013

Nathen Harvey

July 30, 2013
Tweet

More Decks by Nathen Harvey

Other Decks in Technology

Transcript

  1. Nathen Harvey ‣ Technical Community Manager at Opscode ‣ Co-host

    of the Food Fight Show Podcast ‣ Meetup Organizer ‣ Formerly: Web Operations ‣ @nathenharvey 2
  2. Who are you? ‣ Developers? ‣ Systems Administrators? ‣ DevOps?

    ‣ “Business” People? ‣ Executives? 3
  3. DevOps 7 ‣ Cultural and Professional movement ‣ Development and

    Operations working together ‣ Leveraging ideas & processes from other industries ‣ To enable the business
  4. Globalization ‣ 40 years for container ships to move 70%

    of seaborne trade ‣ 22 years for internet access to reach 78% penetration in North America 9 WTO Trends in Globalization http://www.wto.org/english/res_e/booksp_e/anrep_e/wtr08-2b_e.pdf http://www.flickr.com/photos/duke_raoul/2261478794/sizes/l/in/photostream/
  5. Globalization ‣ Online retail sales are 7% of all retail

    sales ‣ 75% of 2011 Thanksgiving shoppers did so online ‣ 42% of all retail purchases were influenced by online research – accounting for ~50% of total retail spending. 10 WTO Trends in Globalization http://www.wto.org/english/res_e/booksp_e/anrep_e/wtr08-2b_e.pdf http://www.flickr.com/photos/duke_raoul/2261478794/sizes/l/in/photostream/
  6. 95% of the western world own cell phones ‣ 42%

    are smartphones ‣ 58% will be on next purchase ‣ 4.2 Billion phones globally for 7.09 Billion people (USCB) 11 http://ssiknowledgewatch.com/2012/05/09/cell-phones-approach-total-penetration-globally-with-smartphones-moving-toward-market-dominance-2/ http://www.brightsideofnews.com/news/2011/1/26/digital-divide-global-household-penetration-rates-for-technology.aspx?pageid=1
  7. The Result: The Coded Business How: Redefinition of how to

    use technology to create business value Why: To rapidly deliver experiences, goods and services to customers What: Consumer-facing businesses Drivers of IT Innovation
  8. The Rise of the Coded Business ‣ Changes are outpacing

    skills development ‣ IT is moving from the back office to the front office ‣ Customers prefer digital consumption ‣ Technology directly supports customer interactions ‣ Accelerated pace of change ‣ Companies must move faster to compete
  9. Patterns of the Coded Business Business Agility Development Velocity and

    Consistency Continuous Delivery IT Automation IT enables Business Agility and becomes a strategic advantage rather than a cost center.
  10. Is the cultural and professional movement that grew directly from

    the collective experience of the pioneers of this transition It’s application to traditional IT is 1:1 The business adaptations encapsulated in Devops will eventually be ubiquitous ....At least, if you want to be great at the next couple decades of global economic growth DevOps
  11. Open Communication ‣ Developers & Operations talk and listen to

    one another ‣ Production & build metrics are available to all ‣ Current infrastructure is documented 26 Walls, Mandi. O’Reilly Media. 2013
  12. Incentive & Responsibility Alignment ‣ Create awesome customer experiences ‣

    Focus on responsibility & accountability, not authority ‣ You are responsible for your own uptime 27 Walls, Mandi. O’Reilly Media. 2013
  13. Respect ‣ You don’t have to like each other but

    you do need to recognize contributions and treat each other well 28 Walls, Mandi. O’Reilly Media. 2013
  14. Trust ‣ Trust that everyone is competent and working toward

    the common goals ‣ Without trust, the tools don’t matter 29 Walls, Mandi. O’Reilly Media. 2013
  15. You’re an Asshole if: 32 ‣ After encountering you, people

    feel oppressed, humiliated, or otherwise worse about themselves ‣ You target people less powerful than you Sutton, Robert. Business Plus. 2007
  16. Effective Communication 34 ‣ Lead with questions, not statements ‣

    Understand the effort and time others have invested ‣ Avoid the passive- aggressive snark http://www.flickr.com/photos/aloha75/4753674243/sizes/l/in/photostream/
  17. Managing Complexity Then Web Servers Application Servers Database Add 1

    server 20+ Changes To Add a New Server… • 2x Web Server Configurations • 2 Web Server Restarts • 4x Database Configurations • 8x Firewall Configurations • DNS Service • Network Configuration • Deployer • 8x Monitoring Changes The Bottom Line… 20+ Changes 12+ New Infrastructure Dependencies 4+ Hours
  18. Managing Complexity Later We added: • Load Balancers • MemCache

    • Search Appliances • Lots of VM’s • More Scale Exponential Increase In: • Configuration Changes • Infrastructure Dependencies • Skills Needed • Greater Risk
  19. How Do we Manage This at Cloud Scale? • Thousands

    of infrastructure dependencies and configurations needed for each change. • Huge Amounts of Time • Increased Cost of Correction of Manual Errors • Huge Need for Talent • Risk of Critical Skills Shortage Managing Complexity Today
  20. Full Automation Common Automation Tasks: Scripts, OS Compliance, Updates &

    Patches Configuration Management Discovery and Visibility Application Management Continuous Deployment Automation is a People, Process, and Technology Journey The Path to the Coded Business
  21. Chef is Infrastructure as Code •Programmatically provision and configure •Treat

    like any other code base •Reconstruct business from code repository, data backup, and bare metal resources. http://www.flickr.com/photos/louisb/4555295187/
  22. Programs •Chef generates configurations directly on nodes from their run

    list •Reduce management complexity through abstraction •Store the configuration of your programs in version control http://www.flickr.com/photos/ssoosay/5126146763/
  23. Declarative Interface to Resources •Define Policy •Say what, not how

    •Pull not Push http://www.flickr.com/photos/bixentro/2591838509/
  24. That Looks Like This package "apache2" template "/etc/apache2/apache2.conf" do source

    "apache2.conf.erb" owner "root" group "root" mode "0644" variables(:allow_override => "All") notifies :reload, "service[apache2]" end service "apache2" do action [:enable,:start] supports :reload => true end
  25. Ohai "languages": { "ruby": { }, "perl": { "version": "5.14.2",

    "archname": "x86_64-linux-gnu-thread-multi" }, "python": { "version": "2.7.3", "builddate": "Aug 1 2012, 05:14:39" }, "php": { "version": "5.3.10-1ubuntu3.6", "builddate": "(cli) (built: Mar" } }, "kernel": { "name": "Linux", "release": "3.2.0-32-virtual", "version": "#51-Ubuntu SMP Wed Sep 26 21:53:42 UTC 2012", "machine": "x86_64", "modules": { "isofs": { "size": "40257", "refcount": "0" }, "acpiphp": { "size": "24231", "refcount": "0" } }, "os": "GNU/Linux" }, "os": "linux", "os_version": "3.2.0-32-virtual", "ohai_time": 1369328621.3456137, "network": { "interfaces": { "lo": { "mtu": "16436", "flags": [ "LOOPBACK", "UP", "LOWER_UP" ], "encapsulation": "Loopback", "addresses": { "127.0.0.1": { "family": "inet", "prefixlen": "8", "netmask": "255.0.0.0", "scope": "Node" }, "::1": { "family": "inet6", "prefixlen": "128", "scope": "Node" } }, "state": "unknown" }, "eth0": { "type": "eth", "number": "0", "mtu": "1500",
  26. execute "load sysctl" do command "/sbin/sysctl -p" action :nothing end

    bytes = node['memory']['total'].split("kB")[0].to_i * 1024 / 3, pages = node['memory']['total'].split("kB")[0].to_i * 1024 / 3 / 2048 # adjust shared memory and semaphores template "/etc/sysctl.conf" do source "sysctl.conf.erb" variables( :shmmax_in_bytes => bytes, :shmall_in_pages => pages ) notifies :run, "execute[load sysctl]", :immediately end Decide what to declare
  27. execute "load sysctl" do command "/sbin/sysctl -p" action :nothing end

    bytes = node['memory']['total'].split("kB")[0].to_i * 1024 / 3, pages = node['memory']['total'].split("kB")[0].to_i * 1024 / 3 / 2048 # adjust shared memory and semaphores template "/etc/sysctl.conf" do source "sysctl.conf.erb" variables( :shmmax_in_bytes => bytes, :shmall_in_pages => pages ) notifies :run, "execute[load sysctl]", :immediately end Decide what to declare
  28. execute "load sysctl" do command "/sbin/sysctl -p" action :nothing end

    bytes = node['memory']['total'].split("kB")[0].to_i * 1024 / 3, pages = node['memory']['total'].split("kB")[0].to_i * 1024 / 3 / 2048 # adjust shared memory and semaphores template "/etc/sysctl.conf" do source "sysctl.conf.erb" variables( :shmmax_in_bytes => bytes, :shmall_in_pages => pages ) notifies :run, "execute[load sysctl]", :immediately end Decide what to declare
  29. Recipes and Cookbooks •Recipes are collections of Resources •Cookbooks contain

    recipes, templates, files, custom resources, etc •Code re-use and modularity http://www.flickr.com/photos/shutterhacks/4474421855/
  30. Roles Server Server Server Server chef-server API chef-client “role[webserver]” node

    ntp client.rb openssh server.rb apache default.rb php default.rb chef-client “role[database]” node ntp client.rb openssh server.rb mysql server.rb
  31. http://www.flickr.com/photos/kathycsus/ 2686772625 • IP addresses • Hostnames • FQDNs •

    Search for nodes with Roles • Find configuration data Search
  32. Search for Nodes pool_members = search("node","role:webserver") template "/etc/haproxy/haproxy.cfg" do source

    "haproxy-app_lb.cfg.erb" owner "root" group "root" mode 0644 variables :pool_members => pool_members.uniq notifies :restart, "service[haproxy]" end
  33. Pass results into Templates # Set up application listeners here.

    listen application 0.0.0.0:80 balance roundrobin <% @pool_members.each do |member| -%> server <%= member[:hostname] %> <%= member[:ipaddress] %>:> weight 1 maxconn 1 check <% end -%> <% if node["haproxy"]["enable_admin"] -%> listen admin 0.0.0.0:22002 mode http stats uri / <% end -%>
  34. Pass results into Templates # Set up application listeners here.

    listen application 0.0.0.0:80 balance roundrobin <% @pool_members.each do |member| -%> server <%= member[:hostname] %> <%= member[:ipaddress] %>:> weight 1 maxconn 1 check <% end -%> <% if node["haproxy"]["enable_admin"] -%> listen admin 0.0.0.0:22002 mode http stats uri / <% end -%>
  35. Pass results into Templates # Set up application listeners here.

    listen application 0.0.0.0:80 balance roundrobin <% @pool_members.each do |member| -%> server <%= member[:hostname] %> <%= member[:ipaddress] %>:> weight 1 maxconn 1 check <% end -%> <% if node["haproxy"]["enable_admin"] -%> listen admin 0.0.0.0:22002 mode http stats uri / <% end -%>
  36. munin::server example node.set[:munin][:server] = true munin_clients = search(:node, "munin_client:true") cookbook_file

    "/etc/cron.d/munin" do source "munin-cron" mode "0644" owner "root" group "root" end template "/etc/munin/munin.conf" do source "munin.conf.erb" mode 0644 variables(:munin_clients => munin_clients) end
  37. Nagios Graphite Jboss App Memcache Postgres Slaves • Load balancer

    config • Nagios host ping • Nagios host ssh • Nagios host HTTP • Nagios host app health • Graphite CPU • Graphite Memory • Graphite Disk • Graphite SNMP • Memcache firewall • Postgres firewall • Postgres authZ config •12+ resource changes for 1 node addition Count the Resources
  38. Continuous Delivery 61 ‣ Business needs to deliver a better

    customer experience as quickly and safely as possible. http://www.thoughtworks.com/imgs/continuous-delivery.jpg
  39. Continuous Delivery ‣ Distributed Version Control System ‣ Dependency management

    ‣ Software Configuration ‣ Environments ‣ Continuous Integration 64
  40. Automated Build Infrastructure Devs! Software Configuration Management! (SCM)! Build! Pulling!

    Tag! Payload! N! Payload! 3! Payload! 2! Payload! 1! 67 Application Devs! Infrastructure Devs! Software Configuration Management! (SCM)!
  41. Deployment Pipeline 68 1! 2! ….! Software Configuration Management! (SCM)!

    Build! Pulling! Tag! Payload! N! Payload! 3! Payload! 2! Payload! 1! Create Data (#)! Upload Cookbook! Autodeploy to Local Host! Update DEV! Request Portal! Chef Server! Bootstrap & Autodeploy! Infrastructure Devs! QA! DEV! …..! PROD! 1, 2, … N! …..! N! ! ! Promote! Promote!
  42. Availability ‣ A = MTTF/MTBF = MTTF / (MTTF +

    MTTD + MTTR) ‣ MTTD - Mean time to Diagnose ‣ MTTR - Mean time to Repair ‣ MTTF - Mean time to Failure ‣ MTBF - Mean time between Failures 71
  43. Availability ‣ A = MTTF/MTBF = MTTF / (MTTF +

    MTTD + MTTR) ‣ MTTD - Mean time to Diagnose ‣ MTTR - Mean time to Repair 72
  44. Failure ‣ It’s not “if” but “when” ‣ Focus on

    ‣ MTTD - Mean time to Diagnose ‣ MTTR - Mean time to Repair ‣ MTTR > MTBF! 73
  45. Measure the right things! ‣ Is CPU usage important enough

    to measure? ‣ Should you care about an individual host / server? ‣ Are the web servers responding quickly? ‣ How many deploys were completed today? ‣ Are customers able to checkout? 74
  46. Measurement & Monitoring ‣ Service availability > Server availability ‣

    Measure everything, alert on the important metrics ‣ Monitors and measurements are code, treat them as such 75
  47. When is it done? ‣ Committed to version control ‣

    Tests are passing ‣ Deployed to production ‣ Monitored in production ‣ Customers getting value 84
  48. Blameless Post Mortems ‣ Include all stakeholders ‣ Agree on

    timeline ‣ Identify the conditions that led to the failure ‣ Create tickets 85
  49. IT - Extended Family ‣ Infrastructure as a Service ‣

    Platform as a Service ‣ Software as a Service 86
  50. Is the cultural and professional movement that grew directly from

    the collective experience of the pioneers of this transition It’s application to traditional IT is 1:1 The business adaptations encapsulated in Devops will eventually be ubiquitous ....At least, if you want to be great at the next couple decades of global economic growth DevOps
  51. Businesses must deliver better customer experience as quickly and safely

    as possible. Safety matters! Failure to do so will have serious impacts on customer satisfaction and loyalty – just like it did when Sam Walton was the Ghengis Kahn of rural retail. http://www.flickr.com/photos/huffstutterrobertl/5088855119/lightbox/ Continuous Delivery
  52. DevOps is a response to a shift in the functional

    meaning of IT Continuous Delivery is a response to a shift in the pace of innovation The map is not the territory http://www.flickr.com/photos/huffstutterrobertl/4209372378/sizes/l/in/photostream/
  53. Focus on responsibility and accountability, rather than authority Functional teams

    have responsibility for design, implementation, and administration of their products and services – cradle to grave. Architecture, Security, Systems Administration, and QA become universal responsibilities, with experts who set standards and build tools to enable the business to do the right thing. Business leaders set priorities and direction, and have close communication loops with teams doing implementation work. Build a culture of personal empowerment and accountability
  54. Have a strong reliance on centralized decision making and environmental

    gates. Cannot ever point at individuals who are responsible for outcomes Have few, if any, capable “full stack” engineers “Architects” responsible for high level design, but no real commitment to implementation Companies that get this wrong…
  55. “Progress on safety coincides with learning from failure. This makes

    punishment and learning two mutually exclusive activities: Organizations can either learn from an accident or punish the individuals involved in it, but hardly do both at the same time. ... Learning challenges and potentially changes the belief about what creates safety. Moreover, punishment emphasizes that failures are deviant, that they do not naturally belong in the organization...” Sidney W.A. Dekker, Ten Questions about Human Error: A New View of No blame post-mortems Treat failure as a learning opportunity
  56. http://www.flickr.com/photos/lighttable/4981112645/sizes/o/in/photostream/ “The number 1 thing we can’t do is get

    in people’s way.” Phil Dibowitz, Facebook Become allergic to things that make you slow
  57. Metrics are collected obsessively Decisions are based on data rather

    than emotions Measure, evaluate, tweak, and iterate based on observable outcomes. http://www.flickr.com/photos/stevenharris/4775722590/sizes/z/in/photostream/ Stop arguing. Start measuring!
  58. Successfully navigating this transition means changing the fundamental workflows by

    which the business operates Understand the full scope of the transition How much or how little depends on the shape of the company - but all disciplines are deeply impacted
  59. Existing business structures and technology choices are reflections of the

    problems of their era A fundamental shift in the problem requires a re-consideration of structural and technological choices Are choices made because of solid technical reasons, or faux business requirements? Do not confuse existing structures for hard business requirements
  60. Accept that you cannot transform the entire organization at once

    Undertaking smaller changes organization wide often leads to mediocrity Successful transitions happen in sections of the business Confine the blast radius, but don’t limit the magnitude of the explosion
  61. Example: Choice of source code control system deeply impacts the

    development workflows and continuous integration platform. These impact asset creation and storage, which can impact production deployment methodologies, which impact audit and remediation, etc. http://www.flickr.com/photos/usnavy/7494170678/sizes/l/in/photostream/ Take a whole-systems view of your technology platform
  62. ‣ "In ten years, I'm certain every COO worth their

    salt will have come from IT. Any COO who doesn't intimately understand the IT systems that actually run the business is just an empty suit, relying on someone else to do their job." Essential Reading 100 Kim, Gene; Behr, Kevin ; Spafford, George (2013-01-10). The Phoenix Project: A Novel About IT, DevOps, and Helping Your Business Win (Kindle Locations 5805-5807). IT Revolution Press. Kindle Edition.
  63. ‣ “The Web is changing the way we live and

    touches every person alive. As more and more people depend on the Web, they depend on us. Web Operations is work that matters” Essential Reading 101 John Allspaw & Jesse Robbins Web Operations: Keeping the Data on Time O’Reilly. 2010.