$30 off During Our Annual Pro Sale. View Details »

Empirical Software Engineering as a Science: Challenges and Ways Forward

Robert Feldt
September 19, 2019

Empirical Software Engineering as a Science: Challenges and Ways Forward

The Empirical Software Engineering (ESE) community has made great progress in the last 20 years and expanded the field considerably both in scope, volume as well as quality. Nowadays, we have established conferences as well as journals focused on the area, and a majority of the papers published in the top SE conferences such as ICSE are empirical. However, while more established scientific fields such as Physics, Biology and Psychology have clear identities, specific schools of thought, and explicated research methods, I argue this is less so in ESE.

In this talk, I propose an updated manifesto for empirical software engineering and discuss some challenges and possible fixes to address them. This, I hope, can give a clearer sense of identity as well as act as a vision for next steps. In particular, I discuss the negative effects of our love for novelty (neophilia) and how it affects publication bias and is a challenge to find truth. I also summarize the ongoing debate among statisticians about how to move beyond p-values as well as some ideas for how to improve empirical studies that use qualitative methods. I will discuss some strategies for how we can improve the reliability and validity of our ESE research and conclude with concrete call-for-actions so that we can be an even stronger science going forward.

Robert Feldt

September 19, 2019
Tweet

More Decks by Robert Feldt

Other Decks in Science

Transcript

  1. Empirical Software Engineering as a Science:
    Challenges and Ways Forward
    ESEM 2019, Porto de Galinhas

    Robert Feldt

    View Slide

  2. About me

    View Slide

  3. Our time here is limited
    I only know a little bit about this
    You know this already
    Preamble
    When negative, I criticise myself as much as us/you

    View Slide

  4. We won!

    View Slide

  5. Our venues increasing in size and importance
    +50% in submissions in 2018 (vs 2017)

    View Slide

  6. Empirical SE concepts in ICSE
    Concept 1999 2009 2019
    Experiment 1.0 1.0 7.5
    Empiric* 0.5 1.0 3.0
    Validity 0.0 0.5 1.0
    Median number of empirical “concepts” mentioned per ICSE paper
    (for 20 random ICSE papers, per year)

    View Slide

  7. Increasing use of statistical analysis
    https://arxiv.org/abs/1706.00933 accepted for publication in JSS, July 2019

    View Slide

  8. Increasing use of statistical analysis
    https://arxiv.org/abs/1706.00933 accepted for publication in JSS, July 2019
    Quantitative Stat. Test Parametric Nonparametric

    View Slide

  9. Increasing use of statistical analysis
    https://arxiv.org/abs/1706.00933 accepted for publication in JSS, July 2019

    View Slide

  10. But….
    Identity?
    Real progress?
    Next steps?

    View Slide

  11. Manifesto for Empirical Software Engineering
    Through systematic research we are
    uncovering a science of software engineering
    so that we can better help software practitioners.
    Through this work we have come to value:
    Empirical evidence over theoretical & formal arguments
    Systematic & explicit methods over one-off, unique studies
    Practical context & impact over clean but simplified lab studies
    That is, while there is value in the items on the right,
    we value the items on the left more.

    View Slide

  12. Manifesto for Empirical Software Engineering 2.0
    Empirical evidence over theoretical & formal arguments
    Systematic & explicit methods over one-off, unique studies
    Practical context & impact over clean but simplified lab studies
    Truth over novelty, relevance and importance
    Plurality & nuance over simple, dichotomous claims
    Human factors over algorithms & technology
    Explanations & theories over descriptions of data at hand

    View Slide

  13. Some threats to finding the Truth
    from Munafò et al, “A Manifesto for Reproducible Science”, Nature, 2017

    View Slide

  14. A Truth root challenge: Neophilia

    View Slide

  15. Some effects of Neophilia
    Publication bias / “results paradox”: We accept clear
    and positive results (p<0.05) while rejecting “negative” or
    inconclusive ones
    Isolated paper islands: Authors must create new model,
    system, solution, idea rather than replicating and building on
    what is already there.
    HARKing: changing Hypothesis After Results are Known

    View Slide

  16. Truth Fix: (Pre-)Registered Reports
    Illustration by David Parkins in Nature, September 2019

    View Slide

  17. Truth Fix: (Pre-)Registered Reports
    MSR EMSE
    A form of self-blinding, next step after double blind!
    200+ Journals today offer pre-registration!
    Acceptance rate in stage 2: 90% (Cortex journal)
    Null results: 66% RR replicat., 50% RR novel, 5-20% non-RR

    View Slide

  18. Counterpoint: (Pre-)Registered Reports
    RRs for confirmatory, hypothesis-driven research
    They are not a good fit for more exploratory work
    Alternative: Explorative Reports?

    View Slide

  19. Counterpoint: (Pre-)Registered Reports

    View Slide

  20. Truth & Nuance Fix: Beyond p-values

    View Slide

  21. Truth & Nuance Fix: Beyond p-values

    View Slide

  22. Truth & Nuance Fix: What instead of p-values?
    Ioannidis:
    alpha = 0.005!
    Greenland & 800 signatories:
    Stop dichotomising!
    Compatibility Intervals!
    Wagenmakers:
    Bayes factors!
    Gelman:
    No tests, just full
    Bayesian analysis!

    View Slide

  23. Truth & Nuance Fix: What instead of p-values?
    Now: Lower alpha, acknowledge problem,

    study compatibility interval and how to report on them!
    Medium-term: Educate yourself about Bayesian analysis
    Longer-term: Start using flexible Bayesian models.

    When Causal analysis matures, learn it.

    View Slide

  24. Truth & Nuance Fix: What instead of p-values?
    https://arxiv.org/abs/1811.05422 accepted for publication in TSE, July 2019

    View Slide

  25. Nuance Challenge: Pseudo-profound bullshit

    View Slide

  26. Humans & Plurality Fix: Lifting Qualitative Methods
    1. Use broader set of Qual methods from Social Science!
    2. Emphasize Reflexivity!

    Researcher is part of social world she studies and the
    relationship to participants is explicit & transparent.
    3. Adapt & employ existing Qual checklists!

    View Slide

  27. Humans & Plurality Fix: Standards & Checklists

    View Slide

  28. Humans & Plurality Fix: Lifting Qualitative Methods
    https://arxiv.org/abs/1712.08341 rejected and in revision since mid 2018… ;)

    View Slide

  29. Shameless plugs: Replications & Open Science
    Room Baobà 4 !!!

    View Slide

  30. I’ll throw in some Calls-for-action!

    View Slide

  31. Remember why you went into science in 1st place
    Seek truth & improve society. Don’t fall for

    competition, politics, & the “numbers game”.
    Call-for-action!
    Learn to write succinctly
    Don’t spread pseudo-profound bullshit.
    Use diverse research methods
    Broader knowledge base and equipped for pluralism & nuance.
    Think deeply about actual threats to validity
    Don’t use as a “recipe” and “copy-n-paste”.

    View Slide

  32. Avoid “lamppost science”
    Just because we have repositories, logs, and DBs doesn’t mean

    they have the information we truly need or should analyse.
    Call-for-action!
    Practice Open Science & try Pre-Registration
    Don’t wait for venues; arXiv, GitHub, & zenodo are your friends.
    Don’t preach “One paper, one message!” too strongly
    Find balance between simplicity and shallow thinking / over-simplification.
    Consider and discuss alternative explanations.
    Raise the bar on statistical analysis
    NHST is so 20th century. Causal analysis & Bayesian is the future.

    View Slide

  33. Help create shared visions for the community
    Multiple schools of thought ok, if clear & explicit and actively discussed.
    Call-for-action!
    Standardise quality checklists and guidelines
    Help authors and peer reviewers. Build on what is there and adapt to ESE.
    Stop the “numbers game”!
    “Publish or Perish” can introduce bias that hinders truth.

    Take responsibility in evaluations/promotions & discussion.
    Continuous learning also from other fields
    They know stuff. You’ll learn. Keep on learning & sharing.

    View Slide

  34. Credits
    “Replication is the immune
    system of science”

    / Prof. Chris Chambers:
    Prof. Brian Nosek, Centre for Open Science & OSF
    All my co-authors, colleagues and mentors!

    View Slide

  35. The End
    [email protected]
    TODAY @ 13:30 in Baobà 4

    View Slide

  36. Backup Slides

    View Slide

  37. Primacy of quantitative data & ‘objective’ methods
    More (SE) challenges but no time today…
    Replication crisis in (soft) SE
    Open Science with qualitative data
    p-curve analysis
    and so much more…

    View Slide

  38. Humans & Plurality Fix: Lifting Qualitative Methods

    View Slide

  39. Partial Statistical Analysis of ESEM
    Outcome 2008 2010 2014 2016 2017 2018
    # papers 25 30 37 27 32 33
    No
    statistics/
    Unknown
    10 8 12 6 8 6
    Qualitative 4 7 5 6 6 9
    Quantitative 12 18 21 18 18 20
    Qual +
    Quant
    1 3 2 3 1 3
    Effect size 3 6 9 9 10 7

    View Slide

  40. Increasing use of statistical analysis
    https://arxiv.org/abs/1706.00933 accepted for publication in JSS, July 2019

    View Slide

  41. Truth Fix: (Pre-)Registered Reports
    https://github.com/emsejournal/openscience/blob/master/registered-reports.md

    View Slide

  42. Truth Fix 1: (Pre-)Registered Reports

    View Slide