Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Coding-challenge: AI-test vs PI-test (20260522 ...

Coding-challenge: AI-test vs PI-test (20260522 @ DevDays-Europe)

Let's be honest: every developer loves writing tests, right? Meaning that we're all diligently practicing Test-Driven Development (TDD), not just preaching it. Or... are we? Maybe that's why we keep asking our AI coding assistants to write tests for us. Just prompt ChatGPT to generate a test for your latest class and voilà: "Of course I can help you write your tests. Here's a complete test suite that achieves 100% line and branch coverage."

But does that really mean the tests are any good?

In this session, Frederieke Scheper will push AI-generated tests to their limits using mutation testing and the PI-Test framework. Frederieke, with attendees, will explore whether these tests truly hold up under scrutiny, or if they just look good on paper.

By the end, attendees will know when to say to their coding assistant, "Thanks, ChatGPT, that was helpful," and when to say, "Thanks, but no thanks. I'll write my own test this time."

Avatar for Frederieke Scheper

Frederieke Scheper

May 22, 2026

More Decks by Frederieke Scheper

Other Decks in Technology

Transcript

  1. About me Coding Challenge: AI-Test vs PI-Test Frederieke Scheper •

    Java Architect & Codesmith ‣ I ❤ OSS conferences! • Working for Dutch national police • Passion for ‣ Software development ‣ TDD, BDD, DDD ‣ And AI agents … • Contact me on (@fbascheper)
  2. 1. Who loves TDD ? 2. Who loves writing tests?

    3. Who actually writes them first ? ACT 1 - Warming up Three questions: 🙋
  3. 1. Who loves TDD ? 2. Who loves writing tests?

    3. Who actually writes them first ? ACT 1 - Warming up Three questions: 🤩 AI to the rescue ? It can write our tests !
  4. 1. Who loves TDD ? 2. Who loves writing tests?

    3. Who actually writes them first ? ACT 1 - Warming up Three questions: 🤔 Hey, we can test AI… with mutation testing !
  5. “Simple as 1, 2, 3 ?” mvn test-compile pitest:mutationCoverage “All

    the mutants should have been killed 🛸, No survivors allowed ☠ …” PI-test - mutation testing in Java
  6. Coding Challenge: AI-Test vs PI-Test PI-test - example mutators if

    (a < b) { // do something } if (a <= b) { // do something } ➡ becomes “Conditionals boundary mutator” Source code < <= > >= Mutated code <= < >= > ➡ becomes AND THE TEST SHOULD FAIL !
  7. Coding Challenge: AI-Test vs PI-Test PI-test - example mutators public

    int method(int i) { i++; return i; } public int method(int i) { i--; return i; } ➡ becomes “Increments mutator” ➡ becomes Source code i++ i-- i = i+1 i = i-1 Mutated code i-- i++ i = i-1 i = i+1 AND THE TEST SHOULD FAIL (AGAIN) !
  8. Coding Challenge: AI-Test vs PI-Test PI-test - example mutators public

    void aVoidMethod(int i) { // does something } public int foo() { int i = 5; aVoidMethod(i); return i; } public void aVoidMethod(int i) { // does something } public int foo() { int i = 5; /* method removed !! */ return i; } ➡ becomes “Void method calls mutator” AND THE TEST SHOULD FAIL (AGAIN) !
  9. Coding Challenge: AI-Test vs PI-Test PI-test - example mutators public

    String sayHello() { int i = 5; String foo = method(i); return "hello " + foo; } public String sayHello() { int i = 5; String foo = method(i); return null; } ➡ becomes “Null returns mutator” AND THE TEST SHOULD FAIL (AGAIN) !
  10. Coding Challenge: AI-Test vs PI-Test PI-test - example mutators public

    Set<String> strings() { int i = 5; Set<String> foo = method(i); return foo; } public Set<String> strings() { int i = 5; Set<String> foo = method(i); return Collections.emptySet(); } ➡ becomes “Empty returns mutator” AND THE TEST SHOULD FAIL (AGAIN) ! As I said before: “Simple as 1, 2, 3 !”
  11. Coding Challenge: AI-Test vs PI-Test PI-test - example mutators public

    Set<String> strings() { int i = 5; Set<String> foo = method(i); return foo; } public Set<String> strings() { int i = 5; Set<String> foo = method(i); return Collections.emptySet(); } ➡ becomes “Empty returns mutator” AND THE TEST SHOULD FAIL (AGAIN) ! “The decks are ready Let’s spin up the demo.” 🎧 As I said before: “Simple as 1, 2, 3 !”
  12. • Today’s domain model ‣ About Dance-Events ‣ Where the

    DJ guides us through the night 🎧 ACT 2 - Demo time → Introduction
  13. — People come together to dance and have fun. —

    DJs perform MixSessions to guide the crowd. — Music evolves: warm-up → peak → cool-down. Coding Challenge: AI-Test vs PI-Test ACT 2 - Demo time → The venue
  14. Select Tracks Mixes & Transitions Read the Crowd Respond to

    Requests Build Atmosphere Type Action Coding Challenge: AI-Test vs PI-Test ACT 2 - Demo time → The DJ at work
  15. Coding Challenge: AI-Test vs PI-Test RequestFromAudienceReceived ACT 2 - Demo

    time → Many CrowdEvents ! CrowdCheered DancefloorFilledUp DancefloorEmptied CrowdEnergyDropped
  16. ACT 3 - The dance floor starts filling up CrowdCheered

    LOW MEDIUM HIGH CrowdCheered: Sheesh! Coding Challenge: AI-Test vs PI-Test
  17. The floor is packed DanceFloorFilledUpEvent Sheesh! Turn it up! Pop

    off! Straight bangers! Coding Challenge: AI-Test vs PI-Test
  18. > Who’s actually prompting like this?? Just one question: I

    certainly don’t … not anymore Because AI changed gears So let’s try something else ! ACT 4 - CrowdEnergyDropped → BOOOOOOO
  19. Hey Claude, I have a working legacy application with no

    unit tests. Test support objects are available in src/test/java — use those as a starting point. - Use ./claude-config-starter as reference examples to create valid .claude config files for this project - NEVER modify anything under src/main - Use mocks to avoid turning unit tests into integration tests - Create a single beads issue for writing all tests, then one beads issue per improvement Plan first, then wait for my approval before writing any code. After writing tests, run PITest mutation testing and show me the results table. Then plan improvements and wait for my approval before implementing — create a beads issue per improvement, implement, then run PITest again to show the delta. ACT 4 - CrowdEnergy Dropped → Switching gears Claude’s screencast ! 🖥