Slide 1

Slide 1 text

Mutation Testing Chris Sinjakli

Slide 2

Slide 2 text

Testing is a good thing But how do we know our tests are good?

Slide 3

Slide 3 text

Code coverage is a start But it can give a “good” score with really dreadful tests

Slide 4

Slide 4 text

Really dreadful tests class Adder def self.add (x, y) return x - y end end describe Adder do it "should add the two arguments" do Adder.add(1, 1) end end Coverage: 100% Usefulness: 0

Slide 5

Slide 5 text

No content

Slide 6

Slide 6 text

A contrived example But how could we detect it?

Slide 7

Slide 7 text

Mutation Testing! “Who watches the watchmen?”

Slide 8

Slide 8 text

If you can change the code, and a test doesn’t fail, either the code is never run or the tests are wrong.

Slide 9

Slide 9 text

How? 1. Run test suite 2. Change code (mutate) 3. Run test suite again If tests now fail, mutant dies. Otherwise it survives.

Slide 10

Slide 10 text

Going with our previous example class Adder def self.add (x, y) return x - y end end describe Adder do it "should add the two arguments" do Adder.add(1, 1) end end Let’s change something

Slide 11

Slide 11 text

Going with our previous example class Adder def self.add (x, y) return x + y end end describe Adder do it "should add the two arguments" do Adder.add(1, 1) end end This still passes

Slide 12

Slide 12 text

Success We know something is wrong

Slide 13

Slide 13 text

So what? It caught a really rubbish test How about something slightly less obvious?

Slide 14

Slide 14 text

Slightly less obvious (and I mean slightly) class ConditionChecker def self.check(a, b) if a && b return 42 else return 0 end end end describe ConditionChecker do it "should return 42 when both arguments are true" do ConditionChecker.check(true, true).should == 42 end it "should return 0 when both arguments are false" do ConditionChecker.check(false, false).should == 0 end end Coverage: 100% Usefulness: >0 But still wrong

Slide 15

Slide 15 text

Slightly less obvious (and I mean slightly) class ConditionChecker def self.check(a, b) if a && b return 42 else return 0 end end end describe ConditionChecker do it "should return 42 when both arguments are true" do ConditionChecker.check(true, true).should == 42 end it "should return 0 when both arguments are false" do ConditionChecker.check(false, false).should == 0 end end Mutate

Slide 16

Slide 16 text

Slightly less obvious (and I mean slightly) class ConditionChecker def self.check(a, b) if a || b return 42 else return 0 end end end describe ConditionChecker do it "should return 42 when both arguments are true" do ConditionChecker.check(true, true).should == 42 end it "should return 0 when both arguments are false" do ConditionChecker.check(false, false).should == 0 end end Passing tests

Slide 17

Slide 17 text

Mutation testing caught our mistake :D

Slide 18

Slide 18 text

Useful technique But still has its flaws

Slide 19

Slide 19 text

The downfall of mutation (Equivalent Mutants) index = 0 while index != 100 do doStuff() index += 1 end index = 0 while index < 100 do doStuff() index += 1 end Mutates to But the programs are equivalent, so no test will fail

Slide 20

Slide 20 text

There is no possible test which can “kill” the mutant The programs are equivalent

Slide 21

Slide 21 text

Also (potentially) • Infinite loops • More memory used • Compile/run time errors – tools should minimise these

Slide 22

Slide 22 text

How bad is it? • Good paper assessing the problem [SZ10] • Took 7 widely used, “large” projects • Found: – 15 mins to assess one mutation – 45% uncaught mutations are equivalent – Better tested project -> worse signal-to-noise ratio

Slide 23

Slide 23 text

Can we detect the equivalents? • Not in the general case [BA82] • Some specific cases can be detected – Using compiler optimisation techniques [BS79] – Using mathematical constraints [DO91] – Line coverage changes [SZ10] • All heuristic algorithms – not seen any claiming to kill all equivalent mutants

Slide 24

Slide 24 text

Tools Some Ruby, then a Java one I liked

Slide 25

Slide 25 text

Ruby • Looked into Heckle • Seemed unmaintained (nothing since 2009) • Then I saw...

Slide 26

Slide 26 text

Ruby

Slide 27

Slide 27 text

Ruby • Mutant seems to be the new favourite • Runs in Rubinius (1.8 or 1.9 mode) • Only supports RSpec • Easy to set up rvm install rbx-head rvm use rbx-head gem install mutant • And easy to use mutate “ClassName#method_to_test” spec

Slide 28

Slide 28 text

Java • Loads of tools to choose from • Bytecode vs source mutation • Will look at PIT (seems like one of the better ones)

Slide 29

Slide 29 text

PIT - pitest.org • Works with “everything” – Command line – Ant – Maven • Bytecode level mutations (faster) • Very customisable – Exclude classes/packages from mutation – Choose which mutations you want – Timeouts • Makes pretty HTML reports (line/mutation coverage)

Slide 30

Slide 30 text

Summary • Can point at weak areas in your tests • At the same time, can be prohibitively noisy • Try it and see

Slide 31

Slide 31 text

Questions?

Slide 32

Slide 32 text

References • [BA82] - T. A. Budd and D. Angluin. Two notions of correctness and their relation to testing. Acta Informatica, 18(1):31-45, November 1982. • [BS79] - D. Baldwin and F. Sayward. Heuristics for determining equivalence of program mutations. Research report 276, Department of Computer Science, Yale University, 1979. • [DO91] - R. A. DeMillo and A. J. O utt. Constraint-based automatic test data generation. IEEE Transactions on Software Engineering, 17(9):900-910, September 1991. • [SZ10] - D. Schuler and A. Zeller. (Un-)Covering Equivalent Mutants. Third International Conference on Software Testing, Verification and Validation (ICST), pages 45-54. April 2010.

Slide 33

Slide 33 text

Also interesting • [AHH04] – K. Adamopoulos, M. Harman and R. M. Hierons. How to Overcome the Equivalent Mutant Problem and Achieve Tailored Selective Mutation Using Co-evolution. Genetic and Evolutionary Computation -- GECCO 2004, pages 1338-1349. 2004.