Human Factors And DevOps

Human Factors and DevOps Kevin O’Brien Ph.D. O’Brien Consulting, Inc.
Kevin O’Brien - Human Factors Engineer at O’Brien Consulting, Inc. Extensive experience designing systems monitoring and administration tools for NASA, Paciﬁc Bell, Sun Microsystems, Brocade, and Hewlett Packard. Hands-on experience migrating IT operations of an web services company to a Cloud-based DevOps (AWS, Git, Jenkins, Puppet for PHP/Mysql applications). http://obrien-consulting.com https://www.linkedin.com/in/kevinobrienhfe

Thanks DevOps Kansas City • Great Meetup group • Great
Leadership • Aaron Blythe • Dan Barker • Conference Organizers • Sponsors Thanks to the DevOps Kansas City Meet-up group, the team that put together the DevOpsDays KC conference, and the conference sponsors.

Thanks Kansas City KC - great jazz and especially great
sax players. Left to right: - Coleman Hawkins (St Joe, MO) - Ben Webster (KC, MO) - Charlie Parker (KC, KS) - Kerry Strayer (Nebraska, but long time leader in KC jazz community)

DevOps “Adaptation” DevOps is the result of a lot of
smart people ADAPTING to the hard problem of delivering computing technology. Kohsuke Kawaguchi, responsible for development, making an end-run around an operational constraint - lack of available testbed system time at Sun. Automating the deployment of OS, code, and code dependencies on a VM using the abandoned workstation he found in the hallway.

DevOps “Boundaries” How things change… Remove Boundaries Between Development &
Production. Suddenly, Development is in Operation’s business. Operations is in Development’s business. Better sharing, better response, improved practices, broader perspectives.

DevOps “Practice”[1] A team from Brazil conducted a literature research
project and generated this graphic describing DevOps. Note the emphasis (bottom right) on quality improvement and the central role of “Principles and Practices” as opposed to a list of tools.

DevOps “System” We start the day with Dev and Ops
as disjoint sets in the universe of service delivery and we end the day…

DevOps “System” New sets intertwined and overlapping. Whose work is
in what set and who has ﬁnal responsibility?

DevOps   “Integration” Now comes the task of putting everything
together or “re-tooling” if you already had it together

DevOps “Integration” Relying on big open source services - Git,
Vagrant, Puppet, Jenkins… that depend on supporting services - SSH, SSL, iptables, Maven, Composer…

DevOps “Integration” that depend on code base variations & requirements…

DevOps “Integration” that depend on OS variations & requirements.

DevOps   “Integration” AND from one perspective the situation is
nothing new because putting all that stuff together is what we do BUT the new world seems like a GIFT compared to where we have been (no structure or the structure that is in the project notebook of the coworker who just left for a six month no electronics retreat in Bali) In my experience, development improved dramatically because developers had better control over code and dependancies Operations improved dramatically because needed services worked, “mystery” services went away, and I could readily adapt configuration for new applications, domains, or service loads As a team we became much more effective at delivering service and service upgrades to customers Which means it is time to take aim at the dark, ugly corners of code and configuration that keep you up at night. Right?

DevOps “Opportunity” And the boss observes “now we can really
go fast” The promise of Faster, Better, Cheaper seems to be at hand Jump to light speed… and wonder what happens next.

Human Factors Engineering Meanwhile, I am still a Human Factors
Engineer. If I understand how we make sense out of the world, I can design technology to assist not burden. So I need to look at things from a diﬀerent perspective.

Human Information Processing [2] What does it mean to be
a Human Factors Engineer? First and foremost it means taking a solid, science-based view of how people process information Gather - Process - Respond People gather information through the senses, process information using rules stored in memory, and respond with words and actions - we learn what people can reliably see and hear, what conceptual categories they form and what generalizations they make, how they store and retrieve information, how they make decisions and how they perfect their responses (Food Pyramid) Now we can proceed to people executing tasks and design technology that helps.

Design For Performance Effective Efficient Satisfactory (ISO 9241-210) We design
for performance. We focus on the people who use the system as the center of the system. If the user can’t drive the car, fly the plane, monitor the power grid then the “system” isn’t much good Good design means taking advantage of human information processing AND also being explicit AND public about how the system will work and what level of performance is expected WITH a person using it Setting performance expectations in terms of “how effective”, “how efficient”, and “how satisfactory” are both industry standards and a good way to prime the project for the need to verify the design

Verify the Design • Tie Design to Performance • Quantify
Usability • Opportunity To Fail Good design better produce good performance Think of design as a bet. The design is a bet that the development team knows how to build a tool that supports a person executing a task. To carry the betting analogy a step further, you don’t get to grab the pot and walk away just because you think you have a good hand - you have to ﬁnish the game The same research methods used to learn how people process information are eﬀective at testing the performance of systems with users - the methods demand the chance that the design can fail. The design team learns from the outcome

Status Check… • I see the functionality • I recognize
the concepts • I anticipate the relationships • I know the possible outcomes Here is a list of some basic expectations we could have for the usability of an application. And the icons for some of the conﬁguration management tools. Hopefully you have had experience with at least one of these. I’d like you to think how you would rate the tool on each of these objectives - True False, One to Five scale whatever. How do they stand up? My experience has been that while I could quickly see the promise, a clear picture came only with hard work - User Adaptation

Status Check… • I see the functionality • I recognize
the concepts • I anticipate the relationships • I know the possible outcomes Hopefully you have had experience with at least one of these. I’d like you to think how you would rate the tool on each of these objectives - True False, One to Five scale whatever. How do they stand up? My experience has been that while I could quickly see the promise, a clear picture came only with hard work - User Adaptation Some users see Puppet as a single node configuration management tool sucking up existing service config files. Comments by Kohsuke K at Jenkins World 2014 - Jenkins as a big open marketplace where interested vendors bring their wares. Visitors to the market vote on quality by usage frequency. Jenkins supplies the market place much like Sun-Netscape B2B project

Design For Resilience [3] • Rebound • Robustness • Graceful
Extensibility • Sustained Adaptability Lets take a further Human Factors Engineering challenge. At the recent Human Factors & Ergonomics annual conference I had the chance to attend a series of talks chaired by David Woods (Cognitive Systems Engineering Laboratory at Ohio State University). Dr Woods has been at the forefront of examining failures of complex systems since 3 Mile Island. The David and his colleagues emphasize looking at the context of use, how people adapt using systems that inevitably fail to meet their needs at some point (Because things will fail!). The adaptive actions of people who gave us Jenkins and Puppet and Git and all the rest created a new BIG system and now present us with new complexity - the integration and inter reliance of all these services. As the system becomes more formalized (optimized), we need to question whether the system is Resilient. And the cornerstones of Operations - rebound, robustness, and extensibility - are at the focus again. How will lego land handle unexpected events? How steep or graceful is the slope we ﬁnd ourselves on when things start to go south? An where on the curve are we operating at any given time. Simply put, we have a curve that represents system degradation in the face of unexpected events. Steep or graceful slope?

Search For Surprises [4] • Reduce Complexity • Reveal Effects
• Focus Attention A colleague of Woods, Emily Patterson points back to some critical thinking on work that requires coordination among co-workers. Since we can’t know the unexpected failure, how can we protect our services? By actively looking for surprises. Working to reduce complexity, we confront the irrelevant distraction built into the system AND we are surprised to find excess complexity is hiding poorly understood relationships By intentionally displaying the effects of system events, we invite broader understanding AND we are surprised to find some development or operations expert “Terrified” by seeing something they view as a red flag By finding ways to focus attention on DevOps status, we publicly state where we think the risks are - where the current state of operations is too close to a steep drop off - AND we are surprised when a new event adds to the risk list

Status Check… • I know the comfort zone • I
recognize signs of degradation • I see how things are going • I track known risks • I can still adapt when needed Here is a list of some basic expectations we could have for the RESILIENCE of a system.

Status Check… • I know the comfort zone • I
recognize signs of degradation • I see how things are going • I track known risks • I can still adapt when needed And the icons for some of the conﬁguration management tools. Hopefully you have had experience with at least one of these. I’d like you to think how you would rate the tool on each of these objectives - True False, One to Five scale whatever. How do they stand up? My experience has been that I was always learning new signs of degradation, always learning new risks, and that I was concerned about the speed with which I could walk away from some of the new infrastructure if things went south - User Adaptation

Summary • DevOps Is Adaptive Behavior • DevOps Still Relies
On the Expertise of Practitioners • DevOps Systems Need Better Representation of Objects, Actions, Outcomes, Status • DevOps Systems Need Resilience Checks

References 1. Breno B. Nicolau de França, Helvio Jeronimo, Junior,
and Guilherme Horta Travassos. 2016. Characterizing DevOps by Hearing Multiple Voices. In Proceedings of the 30th Brazilian Symposium on Software Engineering (SBES '16), Eduardo Santanda de Almeida (Ed.). ACM, New York, NY, USA, 53-62. DOI: http://dx.doi.org/10.1145/2973839.2973845 2. Meyer, D. E., & Kieras, D. E. (1997). A computational theory of executive control processes and human multiple-task performance: Part 1. Basic Mechanisms. Psychological Review, 104, 3-65. 3. Woods DD. Four concepts for resilience and the implications for the future of resilience engineering. Reliability Engineering and System Safety (2015), http://dx.doi.org/10.1016/j.ress.2015.03.018 4. Patterson ES. Communication Strategies From High-reliability Organizations: Translation is Hard Work. Annals of Surgery. 2007;245(2):170-172. doi: 10.1097/01.sla.0000253331.27897.fe. Breno B. Nicolau de França, Helvio Jeronimo, Junior, and Guilherme Horta Travassos. 2016. Characterizing DevOps by Hearing Multiple Voices. In Proceedings of the 30th Brazilian Symposium on Software Engineering (SBES '16), Eduardo Santanda de Almeida (Ed.). ACM, New York, NY, USA, 53-62. DOI: http://dx.doi.org/ 10.1145/2973839.2973845 Meyer, D. E., & Kieras, D. E. (1997). A computational theory of executive control processes and human multiple-task performance: Part 1. Basic Mechanisms. Psychological Review, 104, 3-65. Woods DD. Four concepts for resilience and the implications for the future of resilience engineering. Reliability Engineering and System Safety (2015), http://dx.doi.org/ 10.1016/j.ress.2015.03.018 Patterson ES. Communication Strategies From High-reliability Organizations: Translation is Hard Work. Annals of Surgery. 2007;245(2):170-172. doi:10.1097/01.sla. 0000253331.27897.fe.

HFE - What Matters • See - early and reliably
• Generalize - probabilistic similarity judgments • Attend - capacity varies with workload • Form Concepts - learned expectation of object and characteristics See - (David Marr) - Primal Sketch Generalize - (Roger Shepard) - probabilistic similarity judgments Attend - (Posner, Kahneman) - capacity to attend depends performance Concept Formation - (William Estes, Josh Tennenbaum) - learned expectations of objects, classes, characteristics

Resilience - What Matters • Look for Adaptive Behavior •
Reduce Complexity • Reveal Effects • Focus Attention Reduce Complexity Reveal Eﬀects Focus Attention

First Hand Experience • High Level of Effort • Big
Learning Curve • Incremental Return • Improved Development • Improved Deployment • Improved Delivery

Human Factors And DevOps

Human Factors And DevOps

Kevin O'Brien

More Decks by Kevin O'Brien

Other Decks in Technology

Featured

Transcript

Human Factors and DevOps Kevin O’Brien Ph.D. O’Brien Consulting, Inc.

Thanks DevOps Kansas City • Great Meetup group • Great

Thanks Kansas City KC - great jazz and especially great

DevOps “Adaptation” DevOps is the result of a lot of

DevOps “Boundaries” How things change… Remove Boundaries Between Development &

DevOps “Practice”[1] A team from Brazil conducted a literature research

DevOps “System” We start the day with Dev and Ops

DevOps “System” New sets intertwined and overlapping. Whose work is

DevOps   “Integration” Now comes the task of putting everything

DevOps “Integration” Relying on big open source services - Git,

DevOps “Integration” that depend on code base variations & requirements…

DevOps “Integration” that depend on OS variations & requirements.

DevOps   “Integration” AND from one perspective the situation is

DevOps “Opportunity” And the boss observes “now we can really

Human Factors Engineering Meanwhile, I am still a Human Factors

Human Information Processing [2] What does it mean to be

Design For Performance Effective Efﬁcient Satisfactory (ISO 9241-210) We design

Verify the Design • Tie Design to Performance • Quantify

Status Check… • I see the functionality • I recognize

Status Check… • I see the functionality • I recognize

Design For Resilience [3] • Rebound • Robustness • Graceful

Search For Surprises [4] • Reduce Complexity • Reveal Effects

Status Check… • I know the comfort zone • I

Status Check… • I know the comfort zone • I

Summary • DevOps Is Adaptive Behavior • DevOps Still Relies

References 1. Breno B. Nicolau de França, Helvio Jeronimo, Junior,

HFE - What Matters • See - early and reliably

Resilience - What Matters • Look for Adaptive Behavior •

First Hand Experience • High Level of Effort • Big