Upgrade to Pro — share decks privately, control downloads, hide ads and more …

CISTI 2026 - Test-Driven Development Versus Tes...

Avatar for Marabesi Marabesi
June 18, 2026

CISTI 2026 - Test-Driven Development Versus Test Smells: An Empirical Study With Students From UFABC University

Context: The practice of writing tests before production code (Test-Driven Development, TDD for short) has become widespread
in the software community, aiming to improve code quality and reduce defects. However, inconsistent reporting on TDD’s effect provides room to explore its impact on test code.

Objectives: This pilot study investigated whether TDD can reduce the prevalence of test smells in student-written code, addressing the question of whether TDD improves test code quality.

Method: We employed a controlled intervention within an undergraduate programming context. Initially, students developed the String Calculator kata using their standard, unconstrained method, subsequently, we introduced TDD, and later students re-implemented the same kata adhering to TDD principles. Test code quality was assessed using established test smell detection tools and qualitative feedback was gathered through student interviews.

Results: Despite improvements in code quality metrics, students continued to generate "Magic Number" test smells in their test code.

Conclusion: This pilot study highlights the need for educators to emphasize comprehensive test design alongside TDD implementation, acknowledging the limitations inherent in a small-scale exploratory research setting.

Avatar for Marabesi

Marabesi

June 18, 2026

More Decks by Marabesi

Other Decks in Science

Transcript

  1. Test-Driven Development versus Test Smells Matheus Marabesi¹, Alicia García-Holgado¹, Francisco

    José García-Peñalvo¹, Juliana Cristina Braga², Ismar Frango Silveira³ ¹Universidad de Salamanca, Spain · ²UFABC, Brazil · ³Mackenzie, Brazil CISTI 2026
  2. Outline 1. Introduction 2. Motivation and research question 3. Study

    design 4. Results 5. Conclusions, limitations and future work
  3. Test-Driven Development - TDD Common terms used for this presentation:

    • Production code: the code that will be executed by the client • Test code: the test code produced • Chicago school/Classicist TDD was used • SonarQube: software quality assessment • Test smells: design issues in the test code 1. Red 2. Green 3. Refactor It shifts the software development process
  4. 2. Motivation • TDD (Test-Driven Development) has been widely promoted

    for improving code quality and reducing defects • Yet empirical results on TDD’s effects remain mixed and inconclusive • Most research focuses on production code quality • The quality of test code itself is often overlooked ◦ Poorly written tests → test smells → maintenance challenges in the test code, one more layer besides the production code maintenance Research gap Limited research examines whether TDD practice reduces the prevalence of test smells.
  5. 2. Motivation - research questions Study type • Pilot study

    • Controlled intervention • Undergraduate context (UFABC) What we measured 1. Production code quality (SonarQube) 2. Test smells (detection tools) 3. Student perceptions (interviews) Does TDD reduce the prevalence of test smells?
  6. 3. Study Design – Overview Timeframe of controlled intervention across

    the course span. Course context • Software Engineering course • 3-month duration, classes twice/week • Scrum framework
  7. 3. Study design - Context and Sample Course context •

    38 students across 10 groups Final sample • 20 out of 31 submitted exercises • 8 with invalid/incomplete - lacking the exercise, or submitted code that was not part of the task requested • 12 valid student submissions Student profiles • Programming skill: 7 intermediate, 3 advanced, 2 basic • Industry experience: 6 with, 6 without • TDD knowledge: majority had no prior experience
  8. 3. Study design - Test Smell Tools • SonarQube used

    for broader code quality (reliability, maintainability, duplication) • Code coverage measured per language Language Tool Reference C# xNose Paul et al. (2024) Java tsDetect Peruma et al. (2020) Javascript SNUTS.js Oliveira et al. (2024) Python PyNose Wang et al. (2021) Typescript Manual Garousi & Küçük (2018)
  9. 4. Results – Code Quality Metrics Left: completeness rate. Middle:

    maintainability issues. Right: coverage comparison. • Reliability: A score in all cases (before and after) • Maintainability: more issues after TDD (likely due to higher completeness) • Coverage: improved after TDD across languages
  10. 4. Results – Test Smells Left: test smells in FizzBuzz

    + String Calculator. Right: number of test cases in FizzBuzz + String Calculator.
  11. 4. Results - Interview Findings – Adoption Decisions Takeaway: Adoption

    was driven by project context assessment, not prior knowledge alone. Group TDD Adopted Reasoning Group 1a Yes Automated checks without manual testing Group 1b No (Had experience) Split decision within team Group 2 No Wanted to see app working first Group 3 No “It was just a prototype” Group 4 No (Used intermittently before) One member had experience
  12. 4. Results - Interview Findings – Benefits and Challenges Perceived

    benefits • Confidence in the code • Helped with refactoring • Better error handling awareness • Improved problem-solving thinking • Found hidden bugs Reported challenges • Test doubles / mocking complexity • “No way to run a test without knowing the output” • Perceived overhead for simple projects • Inconsistent application of TDD principles Student quote: “It kept the tests running and helped with refactoring.” – Group 1a
  13. 4. Results - Interview Findings – Adoption Students did not

    reject TDD – they assessed its applicability based on context: • Group 1b: Would use TDD for authentication, complex logic – not simple projects • Group 2: TDD in backend helped identify errors • Group 3: “Not worth it for university projects”, but makes sense professionally • Group 4: Would use TDD with documentation support “Using TDD, it’s easier to find errors that a user might encounter” Takeaway: Exposure to TDD is necessary but not sufficient for consistent adoption.
  14. 5. Conclusions • Code reliability: consistently high (A score) regardless

    of TDD • Code coverage: improved after TDD intervention • Completeness: higher after TDD (but confounded by prior exposure) • Maintainability: more issues after TDD (likely due to increased complete rate and more code written) • Adoption decisions were context-driven, not knowledge-driven • Positive student perceptions != improvement in test code quality • Core issue: not just writing tests, but designing tests (mocking, test doubles) • Need for explicit training on test quality alongside TDD instruction • TDD alone was not sufficient to prevent test smells. • The Magic Number smell persisted across all languages. • Implication for Educators: Emphasize comprehensive test design alongside TDD instruction, not just the Red-Green-Refactor cycle
  15. 5. Conclusions - Limitations and Future Work Limitations • Small

    sample size (12 valid submissions) • Single language per student (1 student for C#, Java, TypeScript) • Prior exposure to kata may confound results • Different test smell tools per language detect different sets of smells Future directions • Replicate with professional developers • Study test smells in scenarios with databases, external services • Develop multi-language test smell detection tools • Investigate teaching strategies that combine TDD with test quality
  16. References K. Beck, Test Driven Development: By Example, Addison-Wesley, 2002.

    F. Anwer et al., “Agile software development models TDD, FDD, DSDM, and Crystal methods: A survey,” Int. J. Multidiscip. Sci. Eng., vol. 8, no. 2, pp. 1–10, 2017. M. Ghafari et al., “Why research on test-driven development is inconclusive?” in Proc. ESEM, pp. 1–10, 2020. V. Garousi and B. Küçük, “Smells in software test code: A survey of knowledge in industry and academia,” J. Syst. Softw., vol. 138, pp. 52–81, 2018. M. Aniche and M.A. Gerosa, “Does test-driven development improve class design? A qualitative study on developers’ perceptions,” J. Braz. Comput. Soc., vol. 21, no. 1, p. 15, 2015. A. Nanthaamornphong and S. Bressan, “The empirical study: Encouraging students’ interest in software development using TDD,” Tehnički glasnik, vol. 13, no. 4, pp. 267–274, 2019. A. Peruma et al., “tsDetect: An open source test smells detection tool,” in Proc. ESEC/FSE, pp. 1650–1654, 2020. Wang et al., “PyNose: A test smell detector for Python,” 2021. J. Oliveira et al., “SNUTS.js: Sniffing nasty unit test smells in JavaScript,” in Proc. SBES, pp. 720–726, 2024. P.P. Paul et al., “xNose: A test smell detector for C#,” in Proc. ICSE-Companion, pp. 370–371, 2024.