Empirical Software Engineering as a Science: Challenges and Ways Forward

Db212dec0d83349ef63c6100957b52d4?s=47 Robert Feldt
September 19, 2019

Empirical Software Engineering as a Science: Challenges and Ways Forward

The Empirical Software Engineering (ESE) community has made great progress in the last 20 years and expanded the field considerably both in scope, volume as well as quality. Nowadays, we have established conferences as well as journals focused on the area, and a majority of the papers published in the top SE conferences such as ICSE are empirical. However, while more established scientific fields such as Physics, Biology and Psychology have clear identities, specific schools of thought, and explicated research methods, I argue this is less so in ESE.

In this talk, I propose an updated manifesto for empirical software engineering and discuss some challenges and possible fixes to address them. This, I hope, can give a clearer sense of identity as well as act as a vision for next steps. In particular, I discuss the negative effects of our love for novelty (neophilia) and how it affects publication bias and is a challenge to find truth. I also summarize the ongoing debate among statisticians about how to move beyond p-values as well as some ideas for how to improve empirical studies that use qualitative methods. I will discuss some strategies for how we can improve the reliability and validity of our ESE research and conclude with concrete call-for-actions so that we can be an even stronger science going forward.

Db212dec0d83349ef63c6100957b52d4?s=128

Robert Feldt

September 19, 2019
Tweet

Transcript

  1. Empirical Software Engineering as a Science: Challenges and Ways Forward

    ESEM 2019, Porto de Galinhas Robert Feldt
  2. About me

  3. Our time here is limited I only know a little

    bit about this You know this already Preamble When negative, I criticise myself as much as us/you
  4. We won!

  5. Our venues increasing in size and importance +50% in submissions

    in 2018 (vs 2017)
  6. Empirical SE concepts in ICSE Concept 1999 2009 2019 Experiment

    1.0 1.0 7.5 Empiric* 0.5 1.0 3.0 Validity 0.0 0.5 1.0 Median number of empirical “concepts” mentioned per ICSE paper (for 20 random ICSE papers, per year)
  7. Increasing use of statistical analysis https://arxiv.org/abs/1706.00933 accepted for publication in

    JSS, July 2019
  8. Increasing use of statistical analysis https://arxiv.org/abs/1706.00933 accepted for publication in

    JSS, July 2019 Quantitative Stat. Test Parametric Nonparametric
  9. Increasing use of statistical analysis https://arxiv.org/abs/1706.00933 accepted for publication in

    JSS, July 2019
  10. But…. Identity? Real progress? Next steps?

  11. Manifesto for Empirical Software Engineering Through systematic research we are

    uncovering a science of software engineering so that we can better help software practitioners. Through this work we have come to value: Empirical evidence over theoretical & formal arguments Systematic & explicit methods over one-off, unique studies Practical context & impact over clean but simplified lab studies That is, while there is value in the items on the right, we value the items on the left more.
  12. Manifesto for Empirical Software Engineering 2.0 Empirical evidence over theoretical

    & formal arguments Systematic & explicit methods over one-off, unique studies Practical context & impact over clean but simplified lab studies Truth over novelty, relevance and importance Plurality & nuance over simple, dichotomous claims Human factors over algorithms & technology Explanations & theories over descriptions of data at hand
  13. Some threats to finding the Truth from Munafò et al,

    “A Manifesto for Reproducible Science”, Nature, 2017
  14. A Truth root challenge: Neophilia

  15. Some effects of Neophilia Publication bias / “results paradox”: We

    accept clear and positive results (p<0.05) while rejecting “negative” or inconclusive ones Isolated paper islands: Authors must create new model, system, solution, idea rather than replicating and building on what is already there. HARKing: changing Hypothesis After Results are Known
  16. Truth Fix: (Pre-)Registered Reports Illustration by David Parkins in Nature,

    September 2019
  17. Truth Fix: (Pre-)Registered Reports MSR EMSE A form of self-blinding,

    next step after double blind! 200+ Journals today offer pre-registration! Acceptance rate in stage 2: 90% (Cortex journal) Null results: 66% RR replicat., 50% RR novel, 5-20% non-RR
  18. Counterpoint: (Pre-)Registered Reports RRs for confirmatory, hypothesis-driven research They are

    not a good fit for more exploratory work Alternative: Explorative Reports?
  19. Counterpoint: (Pre-)Registered Reports

  20. Truth & Nuance Fix: Beyond p-values

  21. Truth & Nuance Fix: Beyond p-values

  22. Truth & Nuance Fix: What instead of p-values? Ioannidis: alpha

    = 0.005! Greenland & 800 signatories: Stop dichotomising! Compatibility Intervals! Wagenmakers: Bayes factors! Gelman: No tests, just full Bayesian analysis!
  23. Truth & Nuance Fix: What instead of p-values? Now: Lower

    alpha, acknowledge problem, study compatibility interval and how to report on them! Medium-term: Educate yourself about Bayesian analysis Longer-term: Start using flexible Bayesian models. When Causal analysis matures, learn it.
  24. Truth & Nuance Fix: What instead of p-values? https://arxiv.org/abs/1811.05422 accepted

    for publication in TSE, July 2019
  25. Nuance Challenge: Pseudo-profound bullshit

  26. Humans & Plurality Fix: Lifting Qualitative Methods 1. Use broader

    set of Qual methods from Social Science! 2. Emphasize Reflexivity! Researcher is part of social world she studies and the relationship to participants is explicit & transparent. 3. Adapt & employ existing Qual checklists!
  27. Humans & Plurality Fix: Standards & Checklists

  28. Humans & Plurality Fix: Lifting Qualitative Methods https://arxiv.org/abs/1712.08341 rejected and

    in revision since mid 2018… ;)
  29. Shameless plugs: Replications & Open Science Room Baobà 4 !!!

  30. I’ll throw in some Calls-for-action!

  31. Remember why you went into science in 1st place Seek

    truth & improve society. Don’t fall for competition, politics, & the “numbers game”. Call-for-action! Learn to write succinctly Don’t spread pseudo-profound bullshit. Use diverse research methods Broader knowledge base and equipped for pluralism & nuance. Think deeply about actual threats to validity Don’t use as a “recipe” and “copy-n-paste”.
  32. Avoid “lamppost science” Just because we have repositories, logs, and

    DBs doesn’t mean they have the information we truly need or should analyse. Call-for-action! Practice Open Science & try Pre-Registration Don’t wait for venues; arXiv, GitHub, & zenodo are your friends. Don’t preach “One paper, one message!” too strongly Find balance between simplicity and shallow thinking / over-simplification. Consider and discuss alternative explanations. Raise the bar on statistical analysis NHST is so 20th century. Causal analysis & Bayesian is the future.
  33. Help create shared visions for the community Multiple schools of

    thought ok, if clear & explicit and actively discussed. Call-for-action! Standardise quality checklists and guidelines Help authors and peer reviewers. Build on what is there and adapt to ESE. Stop the “numbers game”! “Publish or Perish” can introduce bias that hinders truth. Take responsibility in evaluations/promotions & discussion. Continuous learning also from other fields They know stuff. You’ll learn. Keep on learning & sharing.
  34. Credits “Replication is the immune system of science” / Prof.

    Chris Chambers: Prof. Brian Nosek, Centre for Open Science & OSF All my co-authors, colleagues and mentors!
  35. The End robert.feldt@chalmers.se TODAY @ 13:30 in Baobà 4

  36. Backup Slides

  37. Primacy of quantitative data & ‘objective’ methods More (SE) challenges

    but no time today… Replication crisis in (soft) SE Open Science with qualitative data p-curve analysis and so much more…
  38. Humans & Plurality Fix: Lifting Qualitative Methods

  39. Partial Statistical Analysis of ESEM Outcome 2008 2010 2014 2016

    2017 2018 # papers 25 30 37 27 32 33 No statistics/ Unknown 10 8 12 6 8 6 Qualitative 4 7 5 6 6 9 Quantitative 12 18 21 18 18 20 Qual + Quant 1 3 2 3 1 3 Effect size 3 6 9 9 10 7
  40. Increasing use of statistical analysis https://arxiv.org/abs/1706.00933 accepted for publication in

    JSS, July 2019
  41. Truth Fix: (Pre-)Registered Reports https://github.com/emsejournal/openscience/blob/master/registered-reports.md

  42. Truth Fix 1: (Pre-)Registered Reports