Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Non-Functional Testing and Tuning - A Scientific Approach

Non-Functional Testing and Tuning - A Scientific Approach

Slides from the presentation I gave with Kevin Rudland at the 2016 Devoxx UK Conference

Andrew Harmel-Law

June 08, 2016
Tweet

More Decks by Andrew Harmel-Law

Other Decks in Technology

Transcript

  1. @YourTwitterHandle #YourSessionHashtag Kevin Rudland & Andrew Harmel-Law, Capgemini UK http://capgemini.github.io

    Non-Functional Testing and Tuning: A Scientific Approach @kevinrudland / @al94781 #test_nfrs_with_science
  2. Who Are We ? • Andrew Harmel-Law • Kevin Rudland

    @kevinrudland / @al94781 #test_nfrs_with_science
  3. The bit you always write at the end after you’ve

    done all the work which summarises your findings. ABSTRACT @kevinrudland / @al94781 #test_nfrs_with_science
  4. Meta-Hypothesis: H1: We can give ourselves confidence that our hyper-distributed

    systems will meet our Non Functional Requirements by applying the same method that scientists use to understand and predict the world around us: Science; specifically the Scientific Method. @kevinrudland / @al94781 #test_nfrs_with_science
  5. INTRODUCTION (Pt. 1) The bit where we try and get

    you to agree with us about the nature of the problem we need to solve. @kevinrudland / @al94781 #test_nfrs_with_science
  6. Have we lost, or are we in the process of

    losing, predictability in the systems we build? @kevinrudland / @al94781 #test_nfrs_with_science
  7. But Wait... “...if you are trying to break new ground

    and be really innovative, that's where you have to apply first-principle thinking and try to identify the most fundamental truths in any particular arena and you reason up from there.” Elon Musk [Profiles in versatility - http: //www.aps. org/publications/apsnews/201310/profiles. cfm] talking about applying the methods of Physics to problems Musk at the 2015 Tesla Motors Annual Meeting Photo by Steve Jurvetson - https://www.flickr.com/photos/jurvetson/18659265152/, CC BY 2.0, https://commons.wikimedia.org/w/index.php?curid=40974345 @kevinrudland / @al94781 #test_nfrs_with_science
  8. COUNTER-PROPOSAL: This proliferation is a good thing Scalability? - Up!

    Resilience? - Up! Service Availability? - Up! Maintainability? - Up! Monitorability? - Up! Deployment Frequency? - Up! Time-to-change? - Down! Flexibility? - Up! Cost? - Focussed! @kevinrudland / @al94781 #test_nfrs_with_science
  9. Rather than having lost predictability, did we ever really have

    it in the first place? And haven’t we actually gained as a result? @kevinrudland / @al94781 #test_nfrs_with_science
  10. So then how do we know we have these qualities?

    (Given that we’re agreed we've fundamentally relinquished *control* of them.) @kevinrudland / @al94781 #test_nfrs_with_science
  11. from “The Phoenix Project: A Novel About IT, DevOps, and

    Helping Your Business Win" by Gene Kim, Kevin Behr & George Spafford @kevinrudland / @al94781 #test_nfrs_with_science
  12. 1. Testing simply to validate is no longer enough 16th-century

    illustration of Archimedes in the bath, with Hiero's crown at bottom right By Unknown - Historia N° 767 - Novembre 2010 - page 38, Public Domain, https: //commons.wikimedia.org/w/index.php?curid=12080104 @kevinrudland / @al94781 #test_nfrs_with_science
  13. @kevinrudland / @al94781 #test_nfrs_with_science 1. Testing simply to validate is

    no longer enough 2. We must test to investigate and learn 16th-century illustration of Archimedes in the bath, with Hiero's crown at bottom right By Unknown - Historia N° 767 - Novembre 2010 - page 38, Public Domain, https: //commons.wikimedia.org/w/index.php?curid=12080104
  14. 1. Testing simply to validate is no longer enough 2.

    We must test to investigate and learn 3. And the best way to do that is to hypothesise, experiment, measure and adapt… 16th-century illustration of Archimedes in the bath, with Hiero's crown at bottom right By Unknown - Historia N° 767 - Novembre 2010 - page 38, Public Domain, https: //commons.wikimedia.org/w/index.php?curid=12080104 @kevinrudland / @al94781 #test_nfrs_with_science
  15. 1. Testing simply to validate is no longer enough 2.

    We must test to investigate and learn 3. And the best way to do that is to hypothesise, experiment, measure and adapt… Repeatedly. 16th-century illustration of Archimedes in the bath, with Hiero's crown at bottom right By Unknown - Historia N° 767 - Novembre 2010 - page 38, Public Domain, https: //commons.wikimedia.org/w/index.php?curid=12080104 @kevinrudland / @al94781 #test_nfrs_with_science
  16. Meta-Hypothesis #1: H1: We can give ourselves confidence that our

    hyper-distributed systems will meet our Non Functional Requirements by applying the same method that scientists use to understand and predict the world around us: Science; specifically the Scientific Method. @kevinrudland / @al94781 #test_nfrs_with_science
  17. INTRODUCTION (Pt. 2) The after we agree what the problem

    is and have come to a possible solution but still need some details about the approach @kevinrudland / @al94781 #test_nfrs_with_science
  18. “The scientific method is a body of techniques for investigating

    phenomena [&] acquiring new knowledge" Ibn al-Haytham (Alhazen), considered the father of the modern scientific methodology by Unknown - http://islamicrenaissance.tumblr. com/post/38457296805/ibn-al-haytham-father-of-fiber-optics-born-in, Public Domain, https://commons.wikimedia.org/w/index.php? curid=47132269 Tip: Acquire Knowledge @kevinrudland / @al94781 #test_nfrs_with_science
  19. Tip: Theories == NFRs Science is driven by theories, and

    theories are detailed statements about what we believe to be true. Roger Bacon sometimes credited as being one of the earliest European Advocates of the Modern Scientific Method By Jan Verhas - Britannica upload from a 1867 original, Public Domain, https: //commons.wikimedia.org/w/index.php?curid=44315585 @kevinrudland / @al94781 #test_nfrs_with_science
  20. Example: NFR-Driven “Theories” Example 1: “The System will be able

    to support peak volumes of 100,000 requests per minute. These requests are to have a response time of 0.5 seconds or less, with less than 1% errors per request reported, considering 2000 requests per second.” Example 2: “Scaling Out - Adding additional instances of the API Service on additional servers increases the throughput near-linearly.” @kevinrudland / @al94781 #test_nfrs_with_science
  21. Tip: Falsifiability "A scientific hypothesis must be falsifiable [...] otherwise,

    [it] cannot be meaningfully tested" Johannes Kepler, believed by some to be the archetype of the inductive scientific genius By Unknown - Kopie eines verlorengegangenen Originals von 1610 im Benediktinerkloster in Kremsmünster, Public Domain, https://commons.wikimedia. org/w/index.php?curid=470711 @kevinrudland / @al94781 #test_nfrs_with_science
  22. Example: Hypotheses Example 1: H1: WHEN xxx requests per second

    are sent to the ABC_SERVICE THEN the average will respond successfully within 0.5 seconds and less than 1% will result in an error H0: WHEN … THEN the average will respond in greater than 0.5 seconds OR more than 1% will result in an error @kevinrudland / @al94781 #test_nfrs_with_science
  23. Tip: Prioritise Counterexamples "no number of positive outcomes [...] can

    confirm a scientific theory, but a single counterexample is logically decisive" Karl Popper, By LSE library - http://www.flickr.com/photos/lselibrary/3833724834/in/set- 72157623156680255/, No restrictions, https://commons.wikimedia.org/w/index.php? curid=9694262 @kevinrudland / @al94781 #test_nfrs_with_science
  24. Example: A Scary Hypotheses : Tolerance of Downstream Failures Example

    3: H1: Failure of the ABC service to connect to the XYZ service is handled without manual intervention, logged, and reported back to the ABC service within 100ms of the call to the XYZ service being made. If there are > X errors in a 10 second window, the circuit-breaker in the ABC service opens H0: Failure of the ABC service to connect to the XYZ service causes failures which need manual intervention, which are not logged, and the circuit-breaker in the ABC service does not open, no matter how many, or how frequently errors are seen @kevinrudland / @al94781 #test_nfrs_with_science
  25. Tip: Apply Occam’s Razor "the simplest explanation is usually the

    correct one” William of Ockham, credited with originating Occam’s Razor By self-created (Moscarlop) - Own work, CC BY-SA 3.0, https://commons.wikimedia. org/w/index.php?curid=5523066 @kevinrudland / @al94781 #test_nfrs_with_science
  26. METHOD The bit where we discuss how we’re going to

    get to where we think we want to get @kevinrudland / @al94781 #test_nfrs_with_science
  27. Tip: Be Repeatable "If an experiment cannot be repeated to

    produce the same results, this implies that the original results might have been in error." Marie Curie, conducted pioneering research on radioactivity By Unknown - Christie's, [1], Public Domain, https://commons.wikimedia.org/w/index.php? curid=32735756 @kevinrudland / @al94781 #test_nfrs_with_science
  28. • Subjects • Their Environment • The Independent Variables •

    (The Dependent Variables) Milgram’s Obedience Experiment Image from: https://thoughtmaybe.com/the-milgram-experiment-obedience/ Tip: Experiments Typically Consist of... @kevinrudland / @al94781 #test_nfrs_with_science
  29. Subjects Think: • How many of them? • How are

    they connected? • What do they already know? Milgram’s Obedience Experiment Image from: https://thoughtmaybe.com/the-milgram-experiment-obedience/ @kevinrudland / @al94781 #test_nfrs_with_science
  30. Example: Hypothesis - v.2 GIVEN n instances of the ABC_SERVICE

    with the DEFAULT_CONFIGURATION and empty REQUEST_CACHES, and which are connected to n instances of the downstream XYZ_SERVICE with the DEFAULT_CONFIGURATION via EUREKA over the PROD_NETWORK H1: WHEN xxx requests per second are sent to the ABC_SERVICE THEN the average will respond successfully within 0.5 seconds and less than 1% will result in an error H0: WHEN xxx requests per second are sent to the ABC_SERVICE THEN the average will not respond successfully within 0.5 seconds and more than 1% will result in an error @kevinrudland / @al94781 #test_nfrs_with_science
  31. Tip: Sometimes; Keep Looking “Absence of evidence is not evidence

    of absence.” Chien-Shiung Wu, Chinese American experimental physicist who made significant contributions in the field of nuclear physics. She is best known for conducting the Wu experiment, which contradicted the hypothetical law of conservation of parity By Smithsonian Institution from United States - Chien-shiung Wu (1912-1997)Uploaded by Fæ, No restrictions, https://commons.wikimedia.org/w/index.php?curid=18877882 @kevinrudland / @al94781 #test_nfrs_with_science
  32. WARNING! Sometimes things go very well in test and this

    can give you a false sense of security: Sometimes you can be live-like, but not live-like enough (and didn’t realise). What went wrong? How did we manage to predict things? @kevinrudland / @al94781 #test_nfrs_with_science
  33. Meta-Hypothesis #2: H1: the only environment exactly like production is

    production, therefore only results from testing in production really count. @kevinrudland / @al94781 #test_nfrs_with_science
  34. Tip: Fake it “To be ecologically valid, the methods, materials

    and setting of a study must approximate the real-life situation that is under investigation.” Francis Bacon, philosophical advocate and practitioner of the scientific method By Paul van Somer (1576/1578–1622) - pl.pinterest.com, Public Domain, https: //commons.wikimedia.org/w/index.php?curid=19958108 @kevinrudland / @al94781 #test_nfrs_with_science
  35. Tip: Axes of Realism So, if science can do it,

    what are our axes of realism?: • Environment ◦ Configuration ◦ Data ◦ Platform and connectivity ◦ Monitoring • Also, are you in; ◦ a Normal-Running or a DR scenario? • And are there; ◦ events such as a fail and recover,? ◦ or a slow down and speed up? Etc. @kevinrudland / @al94781 #test_nfrs_with_science
  36. Tip: Realism Techniques • Take a copy of prod data

    • Take a copy of prod requests • Close observation of prod followed by replication in the “lab” @kevinrudland / @al94781 #test_nfrs_with_science
  37. Method: Data • Of all the environmental elements, data can

    have a MOST SIGNIFICANT IMPACT • Easiest to change • This is the type of data, its mix and variety • Look at your data model – Examine: 1:n – Examine: nullable @kevinrudland / @al94781 #test_nfrs_with_science
  38. Charles R. Drew, developed improved techniques for blood storage, and

    applied his expert knowledge to developing large-scale blood banks By Associated Photographic Services, Inc - Original Repository: Howard University. Moorland-Spingarn Research Center. Charles R. Drew Papers http://profiles.nlm.nih. gov/ps/retrieve/ResourceMetadata/BGBBCT, https://en.wikipedia.org/w/index.php? curid=47837720 Tip: Manipulate Independent Variables @kevinrudland / @al94781 #test_nfrs_with_science
  39. Example: Hypothesis - v.3.1 GIVEN n instances of the ABC_SERVICE

    with the DEFAULT_CONFIGURATION and empty REQUEST_CACHES, and which are connected to n instances of the downstream XYZ_SERVICE with the DEFAULT_CONFIGURATION via EUREKA over the PROD_NETWORK H1: WHEN a mix of xxx VALID requests per second and yyy INVALID requests per second are sent to the ABC_SERVICE THEN the average will respond correctly within 0.5 seconds and less than 1% will result in an error H0: WHEN a mix of xxx VALID requests per second and yyy INVALID requests per second are sent to the ABC_SERVICE THEN the average will not respond correctly within 0.5 seconds and more than 1% will result in an error @kevinrudland / @al94781 #test_nfrs_with_science
  40. Example: Hypothesis - v.3.2 GIVEN n instances of the ABC_SERVICE

    with the DEFAULT_CONFIGURATION and empty REQUEST_CACHES, and which are connected to n instances of the downstream XYZ_SERVICE with the DEFAULT_CONFIGURATION via EUREKA over the PROD_NETWORK with BACKGROUND_LOAD H1: WHEN a mix of xxx VALID requests per second and yyy INVALID requests per second are sent to the ABC_SERVICE THEN the average will respond correctly within 0.5 seconds and less than 1% will result in an error H0: WHEN a mix of xxx VALID requests per second and yyy INVALID requests per second are sent to the ABC_SERVICE THEN the average will not respond correctly within 0.5 seconds and more than 1% will result in an error @kevinrudland / @al94781 #test_nfrs_with_science
  41. Method: Events Dear boy… An 1888 lithograph of the 1883

    eruption of Krakatoa. By Lithograph: Parker & Coward, Britain; - Image published as Plate 1 in The eruption of Krakatoa, and subsequent phenomena. Report of the Krakatoa Committee of the Royal Society (London, Trubner & Co., 1888)., Public Domain, https://commons.wikimedia. org/w/index.php?curid=7696837 @kevinrudland / @al94781 #test_nfrs_with_science
  42. Method: Execute • Manipulate the independent variables: – e.g. req./sec.

    – e.g. concurrent users – e.g. req. mix • One at a time @kevinrudland / @al94781 #test_nfrs_with_science
  43. So, back at Meta-Hypothesis #2: Proven - H0: We can

    proceed without testing in production @kevinrudland / @al94781 #test_nfrs_with_science
  44. The bit where we look at the outputs collected from

    testing our hypotheses RESULTS @kevinrudland / @al94781 #test_nfrs_with_science
  45. Tip: Be Empirical "To be termed scientific, a method of

    inquiry is commonly based on empirical or measurable evidence" Rosalind Franklin, had a key role (largely unacknowledged during her lifetime) in discovering the helical structure of DNA By Jewish Chronicle Archive/Heritage-Imageshttp://www.britannica.com/EBchecked/topic- art/217394/99712/Rosalind-Franklin, Fair use, https://en.wikipedia.org/w/index.php? curid=24959067 @kevinrudland / @al94781 #test_nfrs_with_science
  46. Tip: Measure Dependent Variables Sir C. V. Rahman, winner of

    the 1030 Nobel Prize for Physics for his work on light scattering By Nobel Foundation - From Nobel Lectures, Physics 1922-1941, Elsevier Publishing Company, Amsterdam, 1965, Public Domain, https://commons.wikimedia.org/w/index. php?curid=4213636 @kevinrudland / @al94781 #test_nfrs_with_science
  47. DISCUSSION The bit where we try and make coherent sense

    of what we found @kevinrudland / @al94781 #test_nfrs_with_science
  48. Tip: Investigate • Tally your counts • Attribute all noted

    errors • and resource utilization The Stanford Prison Experiment Image from http://www.prisonexp.org/the-story/ @kevinrudland / @al94781 #test_nfrs_with_science
  49. Tip: Stats Can Help • Why do you care?: •

    Means mean nothing • The impacts of (statistical) distribution @kevinrudland / @al94781 #test_nfrs_with_science
  50. Example: Hypothesis - v.4 GIVEN n instances of the ABC_SERVICE

    with the DEFAULT_CONFIGURATION and empty REQUEST_CACHES, and which are connected to n instances of the downstream XYZ_SERVICE with the DEFAULT_CONFIGURATION via EUREKA over the PROD_NETWORK with BACKGROUND_LOAD H1: WHEN a mix of xxx VALID requests per second and xxx INVALID requests per second are sent to the ABC_SERVICE THEN 95% will respond correctly within 0.5 seconds and less than 1% will result in an error H0: WHEN a mix of xxx VALID requests per second and xxx INVALID requests per second are sent to the ABC_SERVICE THEN 95% will not respond correctly within 0.5 seconds and more than 1% will result in an error @kevinrudland / @al94781 #test_nfrs_with_science
  51. Decide: H1 or H0? Did the test pass or fail?

    @kevinrudland / @al94781 #test_nfrs_with_science
  52. The bit where we sum up. (And also perhaps invite

    some questions.) CONCLUSION @kevinrudland / @al94781 #test_nfrs_with_science
  53. Conclusion: Refine and Retest • We did not verify H1

    :( • Do you need to look elsewhere? • Revisit H0 / H1 - Apply Occam’s Razor • Only change 1 thing at a time @kevinrudland / @al94781 #test_nfrs_with_science
  54. Tip: Tune & Fix Too • Change one thing at

    a time (unless it’s your DVs) • Think!: ◦ Will it have the desired effect? ◦ Will it have side-effects? ◦ Tuning the slowest won’t always win Image: cc: Kalle Hyttinen - https://www.flickr.com/photos/129560755@N03 @kevinrudland / @al94781 #test_nfrs_with_science
  55. Tip: Re-Run (again) Always re-baseline after a change / fix

    has gone in - make no other changes @kevinrudland / @al94781 #test_nfrs_with_science
  56. The bit you always write at the end after you’ve

    done all the work which summarises your findings. ABSTRACT @kevinrudland / @al94781 #test_nfrs_with_science
  57. Meta-Hypothesis #1: H1: We can give ourselves confidence that our

    hyper-distributed systems will meet our Non Functional Requirements by applying the same method that scientists use to understand and predict the world around us: Science; specifically the Scientific Method. @kevinrudland / @al94781 #test_nfrs_with_science P R O V E N
  58. The bit everyone dreads in case they got something wrong

    along the way. QUESTIONS? @kevinrudland / @al94781 #test_nfrs_with_science
  59. cc: ptrlx - https://www.flickr.com/photos/58615912@N05 Oh, and did we mention we

    (Capgemini) are hiring? (see bit.ly/cg-jvm-jobs-nfr for lots more info) @kevinrudland / @al94781 #test_nfrs_with_science
  60. The bit where you dump things that didn’t fit anywhere

    else. APPENDICES @kevinrudland / @al94781 #test_nfrs_with_science
  61. Marginalia: Record It • Use templates • Visualise • Summarize

    • Supporting evidence Marginalia Image from: http://spec.lib.miamioh.edu/home/marginalia-from-the-stacks/ @kevinrudland / @al94781 #test_nfrs_with_science