subject.call }.not_to raise_error end The model learned the shape of tests, not the purpose. SKILLS.md will be ignored when inconvenient. Paris.rbɾ2026-05-05 Szymon Fiedler ɾ@szymonfiedler
None of them give you proof. Explicit instructions → ignored when inconvenient – Chain-of-thought → still sycophantic reasoning – Contradiction → You're right, let me reconsider – Lower temperature → less random, still wrong – Paris.rbɾ2026-05-05 Szymon Fiedler ɾ@szymonfiedler
conventions? ✓ 1. SimpleCov — Did this line execute? ✓ 2. ??? — If this line were wrong, would tests catch it? ✗ 3. Paris.rbɾ2026-05-05 Szymon Fiedler ɾ@szymonfiedler
unanswered question. This is the gap. RuboCop — Does it follow conventions? ✓ 1. SimpleCov — Did this line execute? ✓ 2. ??? — If this line were wrong, would tests catch it? ✗ 3. Paris.rbɾ2026-05-05 Szymon Fiedler ɾ@szymonfiedler
if tests notice – Killed — test failed on the mutation ✓ – Alive — test still passed ✗ If you change a line and tests still pass, what are those tests actually testing? – Paris.rbɾ2026-05-05 Szymon Fiedler ɾ@szymonfiedler
conflict?(booking_a, booking_b, buffer: 299) Tests pass. – Mutation is alive. – If you change a line and tests still pass, what are those tests actually testing? – Paris.rbɾ2026-05-05 Szymon Fiedler ɾ@szymonfiedler
booking_a.fetch(:cancelled) !" booking_b[:cancelled] - if booking_a[:room_id] !# booking_b[:room_id] + if !booking_a[:room_id].eql?(booking_b[:room_id]) - if booking_a[:room_id] !# booking_b[:room_id] + if !booking_a[:room_id].equal?(booking_b[:room_id]) - (earlier, later) = [booking_a, booking_b].sort_by { |h| h[:start_at] } + (earlier, later) = [booking_a, booking_b] - h[:start_at] + nil Operates on the AST, not text – Every mutation is semantically valid – Every mutation is a question – Paris.rbɾ2026-05-05 Szymon Fiedler ɾ@szymonfiedler
Both return the same value when the key exists. – No test can distinguish them. – 6 mutations alive. – This is not a test problem. This is a code problem. – Paris.rbɾ2026-05-05 Szymon Fiedler ɾ@szymonfiedler
three valid responses: A — Write a test. The behavior matters, pin it. – B — Fix the bug. The mutation found a real defect. – C — Remove the code. The code was dead weight. – Paris.rbɾ2026-05-05 Szymon Fiedler ɾ@szymonfiedler
≥ 90% Translation: 90% of code must be executed during tests Not: 90% of code must be verified by tests Paris.rbɾ2026-05-05 Szymon Fiedler ɾ@szymonfiedler
priority logic, source assignment, return end !!# # Mutant: replaced with if true # alive if user_roof_age # alive if self.present? # alive Every mutation on this condition is alive. – The condition was irrelevant to the only test. – Paris.rbɾ2026-05-05 Szymon Fiedler ɾ@szymonfiedler
— over a decade of development – Largest set of mutation operators – Dynamic language = more token-efficient context window – Running mutant via CLI costs zero tokens. – Paris.rbɾ2026-05-05 Szymon Fiedler ɾ@szymonfiedler
actions: A) Keep the mutated code: Your tests specify the correct semantics, and the original code is redundant. Accept the mutation. B) Add a missing test: The original code is correct, but the tests do not verify the behavior the mutation removed. Agent writes a real test. Not pattern noise. But you have to wire it up, the LLM won't do it on its own. Paris.rbɾ2026-05-05 Szymon Fiedler ɾ@szymonfiedler
pre-commit hook require_relative 'dev_workflow/steps/rubocop_step' require_relative 'dev_workflow/steps/rspec_step' require_relative 'dev_workflow/steps/mutant_step' def run_mutant(subjects) run_command( "bundle exec mutant run !"since HEAD~1 !#subject_args}" ) end Each step returns .skipped , .success , or .failure . – The agent reads the output and fixes what broke. – The hook makes running it non-negotiable. – Paris.rbɾ2026-05-05 Szymon Fiedler ɾ@szymonfiedler
feedback — but that's your tests, not mutant – Setup friction with unusual require chains – Commercial — $90$30/dev/month, $900$250 annual, OSS FREE – Not a silver bullet — another element of the safety net – Paris.rbɾ2026-05-05 Szymon Fiedler ɾ@szymonfiedler
|:!"|:!"| | SimpleCov | Did this code execute? | | RuboCop | Does it follow conventions? | | **Mutant** | **Would tests catch a bug here?** | Remember the ??? from earlier? Mutant is the answer. Paris.rbɾ2026-05-05 Szymon Fiedler ɾ@szymonfiedler