Really dreadful tests class Adder def self.add (x, y) return x - y end end describe Adder do it "should add the two arguments" do Adder.add(1, 1) end end Coverage: 100% Usefulness: 0
Going with our previous example class Adder def self.add (x, y) return x - y end end describe Adder do it "should add the two arguments" do Adder.add(1, 1) end end Let’s change something
Going with our previous example class Adder def self.add (x, y) return x + y end end describe Adder do it "should add the two arguments" do Adder.add(1, 1) end end This still passes
Slightly less obvious (and I mean slightly) class ConditionChecker def self.check(a, b) if a && b return 42 else return 0 end end end describe ConditionChecker do it "should return 42 when both arguments are true" do ConditionChecker.check(true, true).should == 42 end it "should return 0 when both arguments are false" do ConditionChecker.check(false, false).should == 0 end end Coverage: 100% Usefulness: >0 But still wrong
Slightly less obvious (and I mean slightly) class ConditionChecker def self.check(a, b) if a && b return 42 else return 0 end end end describe ConditionChecker do it "should return 42 when both arguments are true" do ConditionChecker.check(true, true).should == 42 end it "should return 0 when both arguments are false" do ConditionChecker.check(false, false).should == 0 end end Mutate
Slightly less obvious (and I mean slightly) class ConditionChecker def self.check(a, b) if a || b return 42 else return 0 end end end describe ConditionChecker do it "should return 42 when both arguments are true" do ConditionChecker.check(true, true).should == 42 end it "should return 0 when both arguments are false" do ConditionChecker.check(false, false).should == 0 end end Passing tests
The downfall of mutation (Equivalent Mutants) index = 0 while index != 100 do doStuff() index += 1 end index = 0 while index < 100 do doStuff() index += 1 end Mutates to But the programs are equivalent, so no test will fail
How bad is it? • Good paper assessing the problem [SZ10] • Took 7 widely used, “large” projects • Found: – 15 mins to assess one mutation – 45% uncaught mutations are equivalent – Better tested project -> worse signal-to-noise ratio
Can we detect the equivalents? • Not in the general case [BA82] • Some specific cases can be detected – Using compiler optimisation techniques [BS79] – Using mathematical constraints [DO91] – Line coverage changes [SZ10] • All heuristic algorithms – not seen any claiming to kill all equivalent mutants
Ruby • Mutant seems to be the new favourite • Runs in Rubinius (1.8 or 1.9 mode) • Only supports RSpec • Easy to set up rvm install rbx-head rvm use rbx-head gem install mutant • And easy to use mutate “ClassName#method_to_test” spec
PIT - pitest.org • Works with “everything” – Command line – Ant – Maven • Bytecode level mutations (faster) • Very customisable – Exclude classes/packages from mutation – Choose which mutations you want – Timeouts • Makes pretty HTML reports (line/mutation coverage)
References • [BA82] - T. A. Budd and D. Angluin. Two notions of correctness and their relation to testing. Acta Informatica, 18(1):31-45, November 1982. • [BS79] - D. Baldwin and F. Sayward. Heuristics for determining equivalence of program mutations. Research report 276, Department of Computer Science, Yale University, 1979. • [DO91] - R. A. DeMillo and A. J. O utt. Constraint-based automatic test data generation. IEEE Transactions on Software Engineering, 17(9):900-910, September 1991. • [SZ10] - D. Schuler and A. Zeller. (Un-)Covering Equivalent Mutants. Third International Conference on Software Testing, Verification and Validation (ICST), pages 45-54. April 2010.
Also interesting • [AHH04] – K. Adamopoulos, M. Harman and R. M. Hierons. How to Overcome the Equivalent Mutant Problem and Achieve Tailored Selective Mutation Using Co-evolution. Genetic and Evolutionary Computation -- GECCO 2004, pages 1338-1349. 2004.