Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Testing Distributed Systems with Fuzzy Monkey T...

Sponsored · Your Podcast. Everywhere. Effortlessly. Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.

Testing Distributed Systems with Fuzzy Monkey Testing

One of the keys to good software is good testing. There are well-known testing suites for back end code – things like junit and py.test. There are also good front-end testing tools – things like Selenium. But for testing distributed systems there aren’t so many well-known tools – because the problem is quite different, and harder. These slides cover the “Fuzzy Monkey” methodology used for testing three different successful distributed systems (including the Assimilation Suite) – its history and how and why it works.

http://bit.ly/FuzzyMonkey

Avatar for Alan Robertson

Alan Robertson

October 03, 2016
Tweet

More Decks by Alan Robertson

Other Decks in Programming

Transcript

  1. Resilience-Testing Distributed Systems Resilience-Testing Distributed Systems with “Fuzzy Monkey” testing

    with “Fuzzy Monkey” testing #AssimProj @OSSAlanR Alan Robertson <[email protected]> Assimilation Systems Limited owasp.org/index.php/OWASP_Assimilation_Project http://AssimilationSystems.com
  2. 2/19 Biography Biography • 35+ years in IT/development – 10

    years in system management (SysAdmin) • Founded Linux-HA project - led 1998-2007 – aka “Heartbeat” - now called Pacemaker • Founded Assimilation Project in 2010 • Founded Assimilation Systems Limited in 2013 • Alumnus of Bell Labs, SuSE, IBM
  3. 3/19 What Is Fuzzy Monkey Testing? What Is Fuzzy Monkey

    Testing? • A method of testing of distributed systems • Specializes in resilience testing – testing for robustness in the presence of failures
  4. 4/19 Fuzzy Monkey History Fuzzy Monkey History • First conceived

    in Fall of 2001 • Initially implemented by CTS as part of the Linux-HA project – CTS == Cluster Testing System • Continued into the Pacemaker Project • CTS Adopted by Corosync • Fuzzy Monkey method re-implemented for Assimilation Project in 2014 • Came up with Fuzzy Monkey name in May 2016
  5. 5/19 Why create a unique testing method? Why create a

    unique testing method? • Testing distributed systems is hard • Manual testing is rarely successful • No good tools out there • Eliminate Embarrassment: – I was tired of having egg on my face when I put out a release of Linux-HA with bugs that I should have caught – I hated doing manual testing – and I was bad at it
  6. 6/19 Why is Automated Testing Important? Why is Automated Testing

    Important? • Automated testing speeds up product releases • Automated testing available to developers decreases end- user-visible bugs • Continuous Integration needs automated testing • Continuous Deployment requires automated testing Modern software development cannot rely on manual testing
  7. 7/19 How do normal automated tests work? How do normal

    automated tests work? • Fixed list of tests which do fixed things • Tests are typically synchronous • Each test tests one thing – often just calls a function and looks at the result • Each test expects one correct answer • When tests complete, they leave things like when they started • It’s easy to tell when a test is complete – when the function returns • Tests are not subject to timing problems • Tests are complete when the last test completes
  8. 8 Why is Automated Distributed System Why is Automated Distributed

    System Testing Hard? Testing Hard? • Tests are always asynchronous • If you run the same test twice, you might get two different correct answers • The results of the test depend on the current distributed configuration • Events happen when they do, and are observed at a random time later • Specifically trying to create timing issues • You need randomness in the tests to make it more likely you hit all the timing windows • It’s hard to tell when a test is “done” • It’s hard to know when you’ve tested enough
  9. 9/19 What does this mean for Distributed testing? What does

    this mean for Distributed testing? • Tests need to be run multiple times • Tests need to be selected at random • Tests need to include randomness in them • Some tests will deliberately make configuration changes • Tests need to expect all the correct possible outcomes • Tests need to allow for the fact that things might not happen in a particular order • Often need white box testing to aim at timing windows
  10. 10/19 How does “Fuzzy Monkey” testing work? How does “Fuzzy

    Monkey” testing work? • Direct all syslogs to a central machine (using TCP) • Key results come through syslog to central machine • Tests exercise the systems over ssh or similar • Each test typically randomly picks one or more systems to use in the test • Each test expects certain syslog messages indicating success, allowing for possible alternatives typically ignoring ordering • “Oh, darn!” messages cause tests to be marked as failed • Each test must wait for systems to stabilize before marking it complete • Audits of system state are performed after each test
  11. 11/19 When does a test succeed? When does a test

    succeed? • It finds all the regular expressions it expects in syslog before the timeout • It doesn’t find any “Oh Darn!” messages • It passes the post-test system sanity audit
  12. 13/19 What does this require from applications? What does this

    require from applications? • They have to be willing to log important things to syslog • They need to be defensive – and output “Oh Darn!” messages • Need to be consistent in “Oh Darn!” messages • You will likely have to log a few more things that originally planned • You need to know what kinds of things are likely to stress the application • You need to know what kinds of sanity audits you can perform quickly, and others that maybe you only run infrequently
  13. 14/19 How does this work out? How does this work

    out? • Basically once we test a certain kind of interaction – and pass the test – it never fails in the field • If used with good unit testing, you likely need dozens of tests, not hundreds • Expect your product to work more reliably than any competitor’s – RedHat left the business of producing their own unique HA solution • Expect that to learn things about how the application works – maybe some you didn’t know even as author • Expect that some tests will need tweaking to make sure they’re really “done” before going to next test • Syslog regex testing can be a little fragile – you will have to update your tests as the code changes • If you use Docker, you may run into docker bugs or unexpected interactions – which will change over time... • These kinds of tests can take days to run
  14. 15/19 Why “Re-Invent” CTS for Assimilation? Why “Re-Invent” CTS for

    Assimilation? • CTS assumed you set up the syslog and real/virtual machines “somehow” – Distributing software and setting up syslog correctly is a pain – In Assimilation system testing, that’s all automated – We used Docker which is lower overhead than virtual machines and designed from the ground up for automation • I Added test-specific post-queries to validate the database – in CTS all post-query audits were identical • I was concerned about intellectual property issues • I did reuse (and improve) the LogWatcher class from CTS
  15. 16/19 Why call it “Fuzzy Monkey” Testing? Why call it

    “Fuzzy Monkey” Testing? • It’s a tribute to two other related testing methods – Fuzz testing – having tests not always give the same input – Chaos Monkey testing – which also tries to break the system – in production (we predate Chaos Monkey by several years) • I liked it :-D • It’s unrelated to the vodka-based cocktail ;-)
  16. © 2015 Assimilation Systems Limited 17/19 • http://assimilationsystems.com/2016/05/24/testing-distribu ted-systems-with-fuzzy-monkey-testing/ •

    OR http://bit.ly/FuzzyMonkey • Assimilation Source Code: – https://github.com/assimilation/assimilation-official /tree/master/cma/systemtests • These Slides: – https://bit.ly/FuzzyMonkeySlides2016 Where to find this online? Where to find this online?
  17. © 2015 Assimilation Systems Limited 18/19 Get Involved (Assimilation Project)!

    Get Involved (Assimilation Project)! • Get Assimilated! • Contribute! – Users – give it a try – Security best practice experts – Testers, System Management, Continuous Integration – Designers – Developers (C,Python, Shell, PowerShell, JavaScript) – Porters (esp Windows) – Promoters, Publicists, Packagers, etc.
  18. © 2015 Assimilation Systems Limited 19/19 Resistance Is Futile! Resistance

    Is Futile! These slides: bit.ly/FuzzyMonkeySlides16 Mailing List: bit.ly/AssimML @OSSAlanR #assimilation on irc.freenode.net Assimilation Web Site: assimproj.org https://www.owasp.org/index.php/OWASP_Assimilation_Project Company Web Site: assimilationsystems.com Download: assimilationsystems.com/download