DODDFW2017 - I'm Hunting Sasquatch – Finding Intermittent Issues Using Periodic Automation

DODDFW2017 - I'm Hunting Sasquatch – Finding Intermittent Issues Using Periodic Automation

Most test automation approaches with continuous integration are based on events: when a build is pushed, run the automated tests. By supplementing this approach with non-event-based automation, we increase our chances of reproducing intermittent issues provided we are judicious in our application.

In American pop culture, Sasquatch (also known as Bigfoot) is likely a non-existent, ape-like, creature infrequently seen in the Pacific Northwest of North America. In the software realm, we have our own version of Sasquatch: that irritating, “intermittent issue” occurring in the system. These kinds of issues are typically difficult to find and often blamed on anything other than a product defect.

We typically run our automated tests on event boundaries, i.e. when we have a successful build and deployment; we look for problems when we think we may have introduced problems. Logically, these points of change are when we expect to have injected new issues, so, we only look for issues at those times. This approach alone, however, only gives us limited opportunities to reproduce our intermittent issues. If we also ran our automation periodically, we would have additional opportunities to reproduce these types of issues; we simply call this approach periodic automation.

Using a real-world example from his own experience, the speaker will explain how this periodic automation can help hunt down these elusive targets. For additional context, he will explain how this approach relates to High-Volume Automated Testing (HiVAT). He will also explore some considerations of which we need to be mindful when implementing periodic automation in order to avoid desensitization to failures.

Though we may never find “the real” Sasquatch, applying periodic automation increases our chances of finding our own intermittent issues.

01cb962dcdc528b53f824092b4d9ab7c?s=128

DevOpsDays DFW

August 30, 2017
Tweet

Transcript

  1. 2.

    2 » Paul Grizzaffi » Principal Automation Architect at Magenic

    » “Software Pediatrician” » Career focused on automation » Advisor to Software Test Professionals (STP) paulg@magenic.com http://www.linkedin.com/in/paulgrizzaffi @pgrizzaffi http://responsibleautomation.wordpress.com Who Is This Guy?
  2. 4.

    4 What’s a Sasquatch? » “…name given to an ape-like

    creature that some people believe inhabits forests, mainly in the Pacific Northwest region of North America”, per Wikipedia » An issue that never happens when you are looking for it » An issue that the developers don’t believe exists
  3. 6.

    6 Integration Hell » Developers A and B write features

    in separate branches » Common code is modified – more divergent over time » Lots of time spent merging code » Then testing begins…Issues! › Merge issues › Functional issues › Resolve issues › Lather, rinse, repeat » Imagine 10 developers on 5 branches! » Make “Sweet Mother of Merge” release
  4. 8.

    8 Salvation? » What if we merge and integrate earlier?

    » Break the problem into smaller chunks › Then we could test earlier › And find defects earlier › Can require less effort to resolve » “Vote Integrate early and often” » “Continuous integration”
  5. 9.

    9 Well, Sherman, these good people are at a DevOps

    conference; they don’t need an explanation of CI and CD. Mr. Peabody, what’s “continuous integration”? Oh
  6. 10.

    10 » As we mature evolve our CI/CD practice, we

    add things › Automated deploy, provision, etc. › Automated testing › Alert on any failure » More issues found sooner, easier to address Just To Set The Stage, Though… Success!!! T-Shirts For Everyone, Right?
  7. 11.

    11 Well… OK… Maybe not a COMPLETE fail… …but there

    are some issues that this approach has difficulty catching.
  8. 13.

    13 What Am I Worried About? » Configuration changes ›

    Network-related › Tool › System Under Test (SUT) related » Intermittent issues › Periodic maintenance activities › Quirky test environments › Race conditions
  9. 14.

    14 Well Sherman, it’s a condition that occurs when events

    can be received in non-deterministic order. Gee Mr. Peabody, what’s a race condition? Huh?
  10. 17.

    17 Wait A Minute… Wouldn’t you have found me during

    your functional or feature testing?
  11. 21.

    21 Why Is This So Difficult? » Humans are flawed

    creatures » High number of combinations and permutations » Enumerating is tedious and error prone » Limitations of available people and resources » 3rd party interaction can be unpredictable
  12. 23.

    23 In Our Sights… » Keep running the “on every

    event” scripts » “Periodic Automation” » Periodically rerun the scripts » Increases chance of seeing the elusive beast » Investigate every sighting » Development/Test partnership initiative
  13. 25.

    25 » Issue: return to login screen during purchase »

    Dev says: It’s fixed now, I can’t reproduce it » Periodic automation reproduces » It must be an automation problem » Gnashing of teeth and rending of cloth » Surprise! Unhandled race condition! » Periodic automation helped An Actual Sighting!
  14. 27.

    27 Beware Failure Fatigue » It’s “extra work” » Triage

    effort › ALL results need to be reviewed › Focus on failures, don’t ignore successes › Time is of the essence » Trust issues › “We only see this with automation” › “Did you reproduce this manually?” › “Oh, that scenario fails sometimes, just ignore it” › “Run it again, if it passes, we’re good”
  15. 29.

    29 Desensitization » Are trust issues THAT bad? » They

    can be » Less likely to › Review the all results › Investigate or report failures › Actually find these issues
  16. 31.

    31 Keep Up Your Strength » Keep noise to a

    minimum › Only alert “usual suspects” › Reports non-abrasive but prominent › Better error/log messages » Fix it › Fix the tool › Fix the scripts › Fix the product » Only test what will be fixed
  17. 33.

    33 » Fixing issues requires effort » Testing fixes requires

    effort » Business may not want a fix » This can lead to… › Constant or intermittent failures › Which leads to more triage effort › Which leads to more failure fatigue Stop Testing?!?!
  18. 35.

    35 Confession » I sort of backed into this »

    I’m lazy (the good kind) » Needed a CI-like environment, not ready yet » Script to run scripts every 30 minutes » Noticed it was good at finding Sasquatch » Based in High Volume Automated Testing (HiVAT)
  19. 37.

    37 » Interesting facets of HiVAT › Long running ›

    Random › Human investigation of results » “Scud” » Two implementations › GameStop – Randomly clicks web page links › MedAssets – Randomly clicks menu items » Reports on things that “don’t seem right” » Testers audit results » Does not lend itself to traditional “pass/fail” » Takes a “long time” Outside Of Continuous Integration
  20. 39.

    39 » Um…pretty much all of it » “Automated functional

    testing” is a common component » Few teams take advantage automation “corner cases” Um…Where’s The DevOps Part? Development Operations Testing ?
  21. 41.

    41 Takeaways » Some issues are intermittent…they just are »

    Keep “run on deploy”, add periodic automation » You must trust your automation (the truth is out there) » Look at results – fail and pass » Beware failure fatigue