Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Human Factors And DevOps

Human Factors And DevOps

DevOpsDays Kansas City 2016
Human Factors applies knowledge of human performance to the design of technology. DevOps changes the way we create and deliver software and infrastructure through the development of a myriad of new tools. Are the tools making the best use of how people work? What will it take to make more progress?

Kevin O'Brien

October 20, 2016
Tweet

More Decks by Kevin O'Brien

Other Decks in Technology

Transcript

  1. Human Factors and DevOps Kevin O’Brien Ph.D. O’Brien Consulting, Inc.

    Kevin O’Brien - Human Factors Engineer at O’Brien Consulting, Inc. Extensive experience designing systems monitoring and administration tools for NASA, Pacific Bell, Sun Microsystems, Brocade, and Hewlett Packard. Hands-on experience migrating IT operations of an web services company to a Cloud-based DevOps (AWS, Git, Jenkins, Puppet for PHP/Mysql applications). http://obrien-consulting.com https://www.linkedin.com/in/kevinobrienhfe
  2. Thanks DevOps Kansas City • Great Meetup group • Great

    Leadership • Aaron Blythe • Dan Barker • Conference Organizers • Sponsors Thanks to the DevOps Kansas City Meet-up group, the team that put together the DevOpsDays KC conference, and the conference sponsors.
  3. Thanks Kansas City KC - great jazz and especially great

    sax players. Left to right: - Coleman Hawkins (St Joe, MO) - Ben Webster (KC, MO) - Charlie Parker (KC, KS) - Kerry Strayer (Nebraska, but long time leader in KC jazz community)
  4. DevOps “Adaptation” DevOps is the result of a lot of

    smart people ADAPTING to the hard problem of delivering computing technology. Kohsuke Kawaguchi, responsible for development, making an end-run around an operational constraint - lack of available testbed system time at Sun. Automating the deployment of OS, code, and code dependencies on a VM using the abandoned workstation he found in the hallway.
  5. DevOps “Boundaries” How things change… Remove Boundaries Between Development &

    Production. Suddenly, Development is in Operation’s business. Operations is in Development’s business. Better sharing, better response, improved practices, broader perspectives.
  6. DevOps “Practice”[1] A team from Brazil conducted a literature research

    project and generated this graphic describing DevOps. Note the emphasis (bottom right) on quality improvement and the central role of “Principles and Practices” as opposed to a list of tools.
  7. DevOps “System” We start the day with Dev and Ops

    as disjoint sets in the universe of service delivery and we end the day…
  8. DevOps “System” New sets intertwined and overlapping. Whose work is

    in what set and who has final responsibility?
  9. DevOps 
 “Integration” Now comes the task of putting everything

    together or “re-tooling” if you already had it together
  10. DevOps “Integration” Relying on big open source services - Git,

    Vagrant, Puppet, Jenkins… that depend on supporting services - SSH, SSL, iptables, Maven, Composer…
  11. DevOps 
 “Integration” AND from one perspective the situation is

    nothing new because putting all that stuff together is what we do BUT the new world seems like a GIFT compared to where we have been (no structure or the structure that is in the project notebook of the coworker who just left for a six month no electronics retreat in Bali) In my experience, development improved dramatically because developers had better control over code and dependancies Operations improved dramatically because needed services worked, “mystery” services went away, and I could readily adapt configuration for new applications, domains, or service loads As a team we became much more effective at delivering service and service upgrades to customers Which means it is time to take aim at the dark, ugly corners of code and configuration that keep you up at night. Right?
  12. DevOps “Opportunity” And the boss observes “now we can really

    go fast” The promise of Faster, Better, Cheaper seems to be at hand Jump to light speed… and wonder what happens next.
  13. Human Factors Engineering Meanwhile, I am still a Human Factors

    Engineer. If I understand how we make sense out of the world, I can design technology to assist not burden. So I need to look at things from a different perspective.
  14. Human Information Processing [2] What does it mean to be

    a Human Factors Engineer? First and foremost it means taking a solid, science-based view of how people process information Gather - Process - Respond People gather information through the senses, process information using rules stored in memory, and respond with words and actions - we learn what people can reliably see and hear, what conceptual categories they form and what generalizations they make, how they store and retrieve information, how they make decisions and how they perfect their responses (Food Pyramid) Now we can proceed to people executing tasks and design technology that helps.
  15. Design For Performance Effective Efficient Satisfactory (ISO 9241-210) We design

    for performance. We focus on the people who use the system as the center of the system. If the user can’t drive the car, fly the plane, monitor the power grid then the “system” isn’t much good Good design means taking advantage of human information processing AND also being explicit AND public about how the system will work and what level of performance is expected WITH a person using it Setting performance expectations in terms of “how effective”, “how efficient”, and “how satisfactory” are both industry standards and a good way to prime the project for the need to verify the design
  16. Verify the Design • Tie Design to Performance • Quantify

    Usability • Opportunity To Fail Good design better produce good performance Think of design as a bet. The design is a bet that the development team knows how to build a tool that supports a person executing a task. To carry the betting analogy a step further, you don’t get to grab the pot and walk away just because you think you have a good hand - you have to finish the game The same research methods used to learn how people process information are effective at testing the performance of systems with users - the methods demand the chance that the design can fail. The design team learns from the outcome
  17. Status Check… • I see the functionality • I recognize

    the concepts • I anticipate the relationships • I know the possible outcomes Here is a list of some basic expectations we could have for the usability of an application. And the icons for some of the configuration management tools. Hopefully you have had experience with at least one of these. I’d like you to think how you would rate the tool on each of these objectives - True False, One to Five scale whatever. How do they stand up? My experience has been that while I could quickly see the promise, a clear picture came only with hard work - User Adaptation
  18. Status Check… • I see the functionality • I recognize

    the concepts • I anticipate the relationships • I know the possible outcomes Hopefully you have had experience with at least one of these. I’d like you to think how you would rate the tool on each of these objectives - True False, One to Five scale whatever. How do they stand up? My experience has been that while I could quickly see the promise, a clear picture came only with hard work - User Adaptation Some users see Puppet as a single node configuration management tool sucking up existing service config files. Comments by Kohsuke K at Jenkins World 2014 - Jenkins as a big open marketplace where interested vendors bring their wares. Visitors to the market vote on quality by usage frequency. Jenkins supplies the market place much like Sun-Netscape B2B project
  19. Design For Resilience [3] • Rebound • Robustness • Graceful

    Extensibility • Sustained Adaptability Lets take a further Human Factors Engineering challenge. At the recent Human Factors & Ergonomics annual conference I had the chance to attend a series of talks chaired by David Woods (Cognitive Systems Engineering Laboratory at Ohio State University). Dr Woods has been at the forefront of examining failures of complex systems since 3 Mile Island. The David and his colleagues emphasize looking at the context of use, how people adapt using systems that inevitably fail to meet their needs at some point (Because things will fail!). The adaptive actions of people who gave us Jenkins and Puppet and Git and all the rest created a new BIG system and now present us with new complexity - the integration and inter reliance of all these services. As the system becomes more formalized (optimized), we need to question whether the system is Resilient. And the cornerstones of Operations - rebound, robustness, and extensibility - are at the focus again. How will lego land handle unexpected events? How steep or graceful is the slope we find ourselves on when things start to go south? An where on the curve are we operating at any given time. Simply put, we have a curve that represents system degradation in the face of unexpected events. Steep or graceful slope?
  20. Search For Surprises [4] • Reduce Complexity • Reveal Effects

    • Focus Attention A colleague of Woods, Emily Patterson points back to some critical thinking on work that requires coordination among co-workers. Since we can’t know the unexpected failure, how can we protect our services? By actively looking for surprises. Working to reduce complexity, we confront the irrelevant distraction built into the system AND we are surprised to find excess complexity is hiding poorly understood relationships By intentionally displaying the effects of system events, we invite broader understanding AND we are surprised to find some development or operations expert “Terrified” by seeing something they view as a red flag By finding ways to focus attention on DevOps status, we publicly state where we think the risks are - where the current state of operations is too close to a steep drop off - AND we are surprised when a new event adds to the risk list
  21. Status Check… • I know the comfort zone • I

    recognize signs of degradation • I see how things are going • I track known risks • I can still adapt when needed Here is a list of some basic expectations we could have for the RESILIENCE of a system.
  22. Status Check… • I know the comfort zone • I

    recognize signs of degradation • I see how things are going • I track known risks • I can still adapt when needed And the icons for some of the configuration management tools. Hopefully you have had experience with at least one of these. I’d like you to think how you would rate the tool on each of these objectives - True False, One to Five scale whatever. How do they stand up? My experience has been that I was always learning new signs of degradation, always learning new risks, and that I was concerned about the speed with which I could walk away from some of the new infrastructure if things went south - User Adaptation
  23. Summary • DevOps Is Adaptive Behavior • DevOps Still Relies

    On the Expertise of Practitioners • DevOps Systems Need Better Representation of Objects, Actions, Outcomes, Status • DevOps Systems Need Resilience Checks
  24. References 1. Breno B. Nicolau de França, Helvio Jeronimo, Junior,

    and Guilherme Horta Travassos. 2016. Characterizing DevOps by Hearing Multiple Voices. In Proceedings of the 30th Brazilian Symposium on Software Engineering (SBES '16), Eduardo Santanda de Almeida (Ed.). ACM, New York, NY, USA, 53-62. DOI: http://dx.doi.org/10.1145/2973839.2973845 2. Meyer, D. E., & Kieras, D. E. (1997). A computational theory of executive control processes and human multiple-task performance: Part 1. Basic Mechanisms. Psychological Review, 104, 3-65. 3. Woods DD. Four concepts for resilience and the implications for the future of resilience engineering. Reliability Engineering and System Safety (2015), http://dx.doi.org/10.1016/j.ress.2015.03.018 4. Patterson ES. Communication Strategies From High-reliability Organizations: Translation is Hard Work. Annals of Surgery. 2007;245(2):170-172. doi: 10.1097/01.sla.0000253331.27897.fe. Breno B. Nicolau de França, Helvio Jeronimo, Junior, and Guilherme Horta Travassos. 2016. Characterizing DevOps by Hearing Multiple Voices. In Proceedings of the 30th Brazilian Symposium on Software Engineering (SBES '16), Eduardo Santanda de Almeida (Ed.). ACM, New York, NY, USA, 53-62. DOI: http://dx.doi.org/ 10.1145/2973839.2973845 Meyer, D. E., & Kieras, D. E. (1997). A computational theory of executive control processes and human multiple-task performance: Part 1. Basic Mechanisms. Psychological Review, 104, 3-65. Woods DD. Four concepts for resilience and the implications for the future of resilience engineering. Reliability Engineering and System Safety (2015), http://dx.doi.org/ 10.1016/j.ress.2015.03.018 Patterson ES. Communication Strategies From High-reliability Organizations: Translation is Hard Work. Annals of Surgery. 2007;245(2):170-172. doi:10.1097/01.sla. 0000253331.27897.fe.
  25. HFE - What Matters • See - early and reliably

    • Generalize - probabilistic similarity judgments • Attend - capacity varies with workload • Form Concepts - learned expectation of object and characteristics See - (David Marr) - Primal Sketch Generalize - (Roger Shepard) - probabilistic similarity judgments Attend - (Posner, Kahneman) - capacity to attend depends performance Concept Formation - (William Estes, Josh Tennenbaum) - learned expectations of objects, classes, characteristics
  26. Resilience - What Matters • Look for Adaptive Behavior •

    Reduce Complexity • Reveal Effects • Focus Attention Reduce Complexity Reveal Effects Focus Attention
  27. First Hand Experience • High Level of Effort • Big

    Learning Curve • Incremental Return • Improved Development • Improved Deployment • Improved Delivery