$30 off During Our Annual Pro Sale. View Details »

Human Factors And DevOps

Human Factors And DevOps

DevOpsDays Kansas City 2016
Human Factors applies knowledge of human performance to the design of technology. DevOps changes the way we create and deliver software and infrastructure through the development of a myriad of new tools. Are the tools making the best use of how people work? What will it take to make more progress?

Kevin O'Brien

October 20, 2016
Tweet

More Decks by Kevin O'Brien

Other Decks in Technology

Transcript

  1. Human Factors
    and DevOps
    Kevin O’Brien Ph.D.
    O’Brien Consulting, Inc.
    Kevin O’Brien - Human Factors Engineer at O’Brien Consulting, Inc. Extensive experience designing systems monitoring and administration tools for NASA, Pacific Bell,
    Sun Microsystems, Brocade, and Hewlett Packard. Hands-on experience migrating IT operations of an web services company to a Cloud-based DevOps (AWS, Git,
    Jenkins, Puppet for PHP/Mysql applications).

    http://obrien-consulting.com

    https://www.linkedin.com/in/kevinobrienhfe

    View Slide

  2. Thanks DevOps Kansas City
    • Great Meetup group
    • Great Leadership
    • Aaron Blythe
    • Dan Barker
    • Conference Organizers
    • Sponsors
    Thanks to the DevOps Kansas City Meet-up group, the team that put together the DevOpsDays KC conference, and the conference sponsors.

    View Slide

  3. Thanks Kansas City
    KC - great jazz and especially great sax players. Left to right:

    - Coleman Hawkins (St Joe, MO)

    - Ben Webster (KC, MO)

    - Charlie Parker (KC, KS)

    - Kerry Strayer (Nebraska, but long time leader in KC jazz community)

    View Slide

  4. DevOps “Adaptation”
    DevOps is the result of a lot of smart people ADAPTING to the hard problem of delivering computing technology.

    Kohsuke Kawaguchi, responsible for development, making an end-run around an operational constraint - lack of available testbed system time at Sun. Automating the
    deployment of OS, code, and code dependencies on a VM using the abandoned workstation he found in the hallway.

    View Slide

  5. DevOps
    “Boundaries”
    How things change…

    Remove Boundaries Between Development & Production.

    Suddenly, Development is in Operation’s business.

    Operations is in Development’s business.

    Better sharing, better response, improved practices, broader perspectives.

    View Slide

  6. DevOps “Practice”[1]
    A team from Brazil conducted a literature research project and generated this graphic describing DevOps. Note the emphasis (bottom right) on quality improvement and
    the central role of “Principles and Practices” as opposed to a list of tools.

    View Slide

  7. DevOps “System”
    We start the day with Dev and Ops as disjoint sets in the universe of service delivery and we end the day…

    View Slide

  8. DevOps “System”
    New sets intertwined and overlapping. Whose work is in what set and who has final responsibility?

    View Slide

  9. DevOps 

    “Integration”
    Now comes the task of putting everything together or “re-tooling” if you already had it together

    View Slide

  10. DevOps “Integration”
    Relying on big open source services - Git, Vagrant, Puppet, Jenkins…

    that depend on supporting services - SSH, SSL, iptables, Maven, Composer…

    View Slide

  11. DevOps “Integration”
    that depend on code base variations & requirements…

    View Slide

  12. DevOps “Integration”
    that depend on OS variations & requirements.

    View Slide

  13. DevOps 

    “Integration”
    AND from one perspective the situation is nothing new because putting all that stuff together is what we do

    BUT the new world seems like a GIFT compared to where we have been (no structure or the structure that is in the project notebook of the coworker who just left for a six
    month no electronics retreat in Bali)

    In my experience, development improved dramatically because developers had better control over code and dependancies

    Operations improved dramatically because needed services worked, “mystery” services went away, and I could readily adapt configuration for new applications,
    domains, or service loads

    As a team we became much more effective at delivering service and service upgrades to customers

    Which means it is time to take aim at the dark, ugly corners of code and configuration that keep you up at night. Right?

    View Slide

  14. DevOps “Opportunity”
    And the boss observes “now we can really go fast”

    The promise of Faster, Better, Cheaper seems to be at hand

    Jump to light speed… and wonder what happens next.

    View Slide

  15. Human Factors Engineering
    Meanwhile, I am still a Human Factors Engineer.

    If I understand how we make sense out of the world, I can design technology to assist not burden.

    So I need to look at things from a different perspective.

    View Slide

  16. Human Information Processing [2]
    What does it mean to be a Human Factors Engineer?

    First and foremost it means taking a solid, science-based view of how people process information

    Gather - Process - Respond

    People gather information through the senses, process information using rules stored in memory, and respond with words and actions - we learn what people can reliably
    see and hear, what conceptual categories they form and what generalizations they make, how they store and retrieve information, how they make decisions and how they
    perfect their responses (Food Pyramid)

    Now we can proceed to people executing tasks and design technology that helps.

    View Slide

  17. Design For
    Performance
    Effective
    Efficient
    Satisfactory
    (ISO 9241-210)
    We design for performance. We focus on the people who use the system as the center of the system. If the user can’t drive the car, fly the plane, monitor the power grid
    then the “system” isn’t much good

    Good design means taking advantage of human information processing AND also being explicit AND public about how the system will work and what level of
    performance is expected WITH a person using it

    Setting performance expectations in terms of “how effective”, “how efficient”, and “how satisfactory” are both industry standards and a good way to prime the project for
    the need to verify the design

    View Slide

  18. Verify the Design
    • Tie Design to Performance
    • Quantify Usability
    • Opportunity To Fail
    Good design better produce good performance Think of design as a bet.

    The design is a bet that the development team knows how to build a tool that supports a person executing a task.

    To carry the betting analogy a step further, you don’t get to grab the pot and walk away just because you think you have a good hand - you have to finish the game

    The same research methods used to learn how people process information are effective at testing the performance of systems with users - the methods demand the
    chance that the design can fail. The design team learns from the outcome

    View Slide

  19. Status Check…
    • I see the functionality
    • I recognize the concepts
    • I anticipate the relationships
    • I know the possible outcomes
    Here is a list of some basic expectations we could have for the usability of an application. And the icons for some of the configuration management tools. Hopefully you
    have had experience with at least one of these. I’d like you to think how you would rate the tool on each of these objectives - True False, One to Five scale whatever. How
    do they stand up?

    My experience has been that while I could quickly see the promise, a clear picture came only with hard work - User Adaptation

    View Slide

  20. Status Check…
    • I see the functionality
    • I recognize the concepts
    • I anticipate the relationships
    • I know the possible outcomes
    Hopefully you have had experience with at least one of these. I’d like you to think how you would rate the tool on each of these objectives - True False, One to Five scale
    whatever. How do they stand up?

    My experience has been that while I could quickly see the promise, a clear picture came only with hard work - User Adaptation

    Some users see Puppet as a single node configuration management tool sucking up existing service config files.

    Comments by Kohsuke K at Jenkins World 2014 - Jenkins as a big open marketplace where interested vendors bring their wares. Visitors to the market vote on quality by
    usage frequency. Jenkins supplies the market place much like Sun-Netscape B2B project

    View Slide

  21. Design For Resilience [3]
    • Rebound
    • Robustness
    • Graceful Extensibility
    • Sustained Adaptability
    Lets take a further Human Factors Engineering challenge. At the recent Human Factors & Ergonomics annual conference I had the chance to attend a series of talks
    chaired by David Woods (Cognitive Systems Engineering Laboratory at Ohio State University). Dr Woods has been at the forefront of examining failures of complex
    systems since 3 Mile Island. The David and his colleagues emphasize looking at the context of use, how people adapt using systems that inevitably fail to meet their
    needs at some point (Because things will fail!). The adaptive actions of people who gave us Jenkins and Puppet and Git and all the rest created a new BIG system and
    now present us with new complexity - the integration and inter reliance of all these services. As the system becomes more formalized (optimized), we need to question
    whether the system is Resilient. And the cornerstones of Operations - rebound, robustness, and extensibility - are at the focus again. How will lego land handle
    unexpected events? How steep or graceful is the slope we find ourselves on when things start to go south? An where on the curve are we operating at any given time.
    Simply put, we have a curve that represents system degradation in the face of unexpected events. Steep or graceful slope?

    View Slide

  22. Search For Surprises [4]
    • Reduce Complexity
    • Reveal Effects
    • Focus Attention
    A colleague of Woods, Emily Patterson points back to some critical thinking on work that requires coordination among co-workers. Since we can’t know the unexpected
    failure, how can we protect our services? By actively looking for surprises.

    Working to reduce complexity, we confront the irrelevant distraction built into the system AND we are surprised to find excess complexity is hiding poorly understood
    relationships

    By intentionally displaying the effects of system events, we invite broader understanding AND we are surprised to find some development or operations expert “Terrified”
    by seeing something they view as a red flag

    By finding ways to focus attention on DevOps status, we publicly state where we think the risks are - where the current state of operations is too close to a steep drop off
    - AND we are surprised when a new event adds to the risk list

    View Slide

  23. Status Check…
    • I know the comfort zone
    • I recognize signs of
    degradation
    • I see how things are going
    • I track known risks
    • I can still adapt when needed
    Here is a list of some basic expectations we could have for the RESILIENCE of a system.

    View Slide

  24. Status Check…
    • I know the comfort zone
    • I recognize signs of
    degradation
    • I see how things are going
    • I track known risks
    • I can still adapt when needed
    And the icons for some of the configuration management tools. Hopefully you have had experience with at least one of these. I’d like you to think how you would rate the
    tool on each of these objectives - True False, One to Five scale whatever. How do they stand up?

    My experience has been that I was always learning new signs of degradation, always learning new risks, and that I was concerned about the speed with which I could
    walk away from some of the new infrastructure if things went south - User Adaptation

    View Slide

  25. Summary
    • DevOps Is Adaptive Behavior
    • DevOps Still Relies On the
    Expertise of Practitioners
    • DevOps Systems Need Better
    Representation of Objects,
    Actions, Outcomes, Status
    • DevOps Systems Need
    Resilience Checks

    View Slide

  26. References
    1. Breno B. Nicolau de França, Helvio Jeronimo, Junior, and Guilherme Horta
    Travassos. 2016. Characterizing DevOps by Hearing Multiple Voices. In
    Proceedings of the 30th Brazilian Symposium on Software Engineering (SBES
    '16), Eduardo Santanda de Almeida (Ed.). ACM, New York, NY, USA, 53-62.
    DOI: http://dx.doi.org/10.1145/2973839.2973845
    2. Meyer, D. E., & Kieras, D. E. (1997). A computational theory of executive
    control processes and human multiple-task performance: Part 1. Basic
    Mechanisms. Psychological Review, 104, 3-65.
    3. Woods DD. Four concepts for resilience and the implications for the future of
    resilience engineering. Reliability Engineering and System Safety (2015),
    http://dx.doi.org/10.1016/j.ress.2015.03.018
    4. Patterson ES. Communication Strategies From High-reliability Organizations:
    Translation is Hard Work. Annals of Surgery. 2007;245(2):170-172. doi:
    10.1097/01.sla.0000253331.27897.fe.
    Breno B. Nicolau de França, Helvio Jeronimo, Junior, and Guilherme Horta Travassos. 2016. Characterizing DevOps by Hearing Multiple Voices. In Proceedings of the
    30th Brazilian Symposium on Software Engineering (SBES '16), Eduardo Santanda de Almeida (Ed.). ACM, New York, NY, USA, 53-62. DOI: http://dx.doi.org/
    10.1145/2973839.2973845

    Meyer, D. E., & Kieras, D. E. (1997). A computational theory of executive control processes and human multiple-task performance: Part 1. Basic Mechanisms.
    Psychological Review, 104, 3-65.

    Woods DD. Four concepts for resilience and the implications for the future of resilience engineering. Reliability Engineering and System Safety (2015), http://dx.doi.org/
    10.1016/j.ress.2015.03.018

    Patterson ES. Communication Strategies From High-reliability Organizations: Translation is Hard Work. Annals of Surgery. 2007;245(2):170-172. doi:10.1097/01.sla.
    0000253331.27897.fe.

    View Slide

  27. HFE - What Matters
    • See - early and reliably
    • Generalize - probabilistic
    similarity judgments
    • Attend - capacity varies with
    workload
    • Form Concepts - learned
    expectation of object and
    characteristics
    See - (David Marr) - Primal Sketch

    Generalize - (Roger Shepard) - probabilistic similarity judgments

    Attend - (Posner, Kahneman) - capacity to attend depends performance

    Concept Formation - (William Estes, Josh Tennenbaum) - learned expectations of objects, classes, characteristics

    View Slide

  28. Resilience - What Matters
    • Look for Adaptive Behavior
    • Reduce Complexity
    • Reveal Effects
    • Focus Attention
    Reduce Complexity

    Reveal Effects

    Focus Attention

    View Slide

  29. First Hand Experience
    • High Level of Effort
    • Big Learning Curve
    • Incremental Return
    • Improved Development
    • Improved Deployment
    • Improved Delivery

    View Slide