RR.pdf

Neil Ernst, University of Victoria. November 2022 - Dagstuhl Seminar
on Evidence in CS Research Registered Reports in Computer Science: Why Bother? 1

2 EMSE J. → MSR, ICSME, then ESEM, now CHASE,
SANER, ICPC TOSEM (direct submit) CSE special issue (ACM, Springer, T&F)

RR: Why? 3 Pre-registration: register your protocol (particularly for clinical
trials) Registered Report: a peer reviewed pre-registration. 1. Provide feedback at early phase of research (before spending $$$) 2. Reduce/eliminate under-powered, selectively reported, researcher-biased studies

Questionable Research Practices Hurt Science 4 HARKing and Post-hoc rationalizing
Neat data, what explains it? (story-telling, Gelman & Basbøll) File-drawer effect Hmm, bad outcome, bin it. Negative result - reject. Forking paths in data analysis choices (researcher bias) Let’s use a Kruskal-Wallis test and then Lewandoski-Neymar test of significance (instead of?) Result when publication venue and publication significance/novelty are emphasized over replication, soundness BTW, why aren’t we doing more on these problems?!

https://medianwatch.netlify.app/post/z_values/

7 https://arxiv.org/abs/1706.00933

8 Robert Feldt’s Manifesto for Empirical Software Research Truth Over
Novelty, relevance, and importance Plurality and Nuance Over Simple, dichotomous claims Human Factors Over Algorithms and Technology Explanations and Theories Over Descriptions of Data at Hand

Mechanics of RR

Phase 1 Review Criteria 10 1. Importance of the research
question(s). 2. Logic, rationale, and plausibility of the proposed hypotheses. 3. Soundness and feasibility of the methodology and analysis pipeline (including statistical power analysis where appropriate). 4. Clarity and degree of methodological detail for replication. 5. Will results obtained test the stated hypotheses? These were derived via https://osf.io/pukzy/ and live on our EMSE Open Science repo and ACM SIGSoft RR standard Is this study novel, significant, able to find effects?

Phase 2 Criteria 11 1. Whether the data are able
to test the authors’ proposed hypotheses by satisfying the approved outcome-neutral conditions (such as quality checks, positive controls) 2. Whether the Introduction, rationale and stated hypotheses are the same as the approved Stage 1 submission (required) 3. Whether the authors adhered precisely to the registered experimental procedures 4. Whether any unregistered post hoc analyses added by the authors are justified, methodologically sound, and informative 5. Whether the authors’ conclusions are justified given the data Did the authors execute on Phase 1 plan?

RR workflow 12 Source: OSF.io Conference (MSR) Journal (EMSE)

Assumptions of pre-registration 13 Only positivist philosophies matter (signi fi
cance testing, NHST, falsi fi cation etc) Can a priori determine the right protocol to follow Deductive and con fi rmatory, not inductive or abductive and exploratory (usually) Researchers actually understand e ff ect sizes, power calculations, appropriate statistical tests, data bias, epistemology, instrument bias, etc. → Spoiler alert - they usually do not - 1 term of stats/epistemology AT MOST (but 2 years of calc and discrete math …)

Current State of RR in SE 14 MSR 2020 feedback
on IPA: “I think it is a key principle. However, in a way it also raises the bar significantly for the Registered Reports” “[...] the fact that the results are missing, helps reviewers and authors focus on the methodological issue, which is a great added value in the review process [...]”

MSR Results - IPA 15 “During my review, though, I
had the feeling that more interaction with the authors could add even further value” “I think the EMSE paper still needs a careful assessment, as it is still possible that the operation or the application of the protocol turns out to be wrong [...]” “I felt a bit uncomfortable to have this burden on my shoulders as a reviewer so early in the process.” No (3 responses): “A registered report may be, and should be allowed to be, risky and, therefore, may not work out. The ensuing work should be subject to full and normal review.”

Open Issues and Questions

Three Faces of RR 18 RR to prevent questionable research
practices Tell the world what you will do, then do it RR as doctoral symposium Early feedback before expensive data collection RR as 1st round review Pre-empt journal review with in-principle acceptance

Ongoing Questions 19 To what CS studies could it apply?
Most suited to post-positivist, confirmatory studies with clear hypotheses. What about: Qualitative/constructivist approaches? Exploratory studies? Data mining and ML studies? (See Narayanan and replicability in ML: https:// reproducible.cs.princeton.edu/ )

Ongoing Questions 20 Administrative approach Are QRPs even an issue?
Effectiveness

Admin challenges 21 Do not overlook issue of work-life balance
in all the things “Never ascribe to malice that which can be explained by overcommitment” CS has conference and journals - no one else does Journals and conference rarely share admin interfaces (HotCRP vs Editorial Manager - and they are usually terrible) Hard to manage reviewer discussions esp longitudinally Currently, stick Phase 1 on Arxiv/OSF.io/Github Have to explicitly coach reviewers (not yet mature, but true of other formats) Manually track in progress RR on Google Sheets (low vacation factor)

When Do We Learn Research Methods? 22 https://xkcd.com/2618/

Admin challenges (cont) 23 Reviewer/editor burden is increasingly a problem
(overall, not just RR) Accepting 5 IPAs at 3 conferences a year = 15 journal submissions in the next 12-18 months, with publication 24-36 months after that + who is asked to be conference track chair? What freedoms do they have? Minor shenanigans - reviewer COI, authorship incentives Overreliance on OSF and (maybe) PCI approach

The Open Science Foundation serves as the de facto driver
for RR approaches: OSF.io Registry PeerCommunityIn RR promises (!) to manage the process entirely. Journals simply indicate they accept RR (as-is or with minor review). Editors become “recommenders” New Approach: PeerCommunityIn 24

Publication models run into journal profit models First phase -
Journal - then present at conference? Admin: J1C2? 25

If QRP is not a problem, do we need RR?
Conversely - if they are a problem elsewhere, why not here? Evidence researchers don’t understand effect sizes or practical significance Few studies replicated Lots of hard to access industry data More research needed! QRPs in CS 26 Shepperd, M., et al. (2018). The role and value of replication in empirical software engineering results. IST, 99, 120–132. Jørgensen, M. et al (2016) , Incorrect Reeulsts in SE Experiments: How to Improve Practices. JSS, 116, 133—145

Effectiveness 27 Not yet studied for CS/SE Chambers - more
negative results… Does it encourage narrow scoping? actually help with file-drawer or other problems? place implicit value on empirical, experimental research? see https://www.nature.com/articles/d41586-019-02674-6 (“What’s next for Registered Reports?”)

Department of Reuse

Department of Reuse 29 https://reuse-dept.org/ Ultimately RR is about pre-specifying
analysis. One way to do that is to reuse analysis protocols from other papers. Done all the time in medicine; rarely in CS except in benchmarks. Q: to what extent are artifacts such as protocols reused?

Acknowledgments 31 Teresa Baldassarre, Janet Siegmund, Tim Menzies Martin Sheppherd,
Prem Devanbu, Robert Feldt & Tom Zimmermann MSR 2020 SC Andrew Gelman/Chris Chambers/Ben Goldacre

Neil Ernst  [email protected] @neilernst 32 ESEM RR Guide ACM RR
Supplement https://reuse-dept.org/

RR.pdf

RR.pdf

Neil Ernst

More Decks by Neil Ernst

Other Decks in Research

Featured

Transcript

Neil Ernst, University of Victoria. November 2022 - Dagstuhl Seminar

2 EMSE J. → MSR, ICSME, then ESEM, now CHASE,

RR: Why? 3 Pre-registration: register your protocol (particularly for clinical

Questionable Research Practices Hurt Science 4 HARKing and Post-hoc rationalizing

5

https://medianwatch.netlify.app/post/z_values/

7 https://arxiv.org/abs/1706.00933

8 Robert Feldt’s Manifesto for Empirical Software Research Truth Over

Mechanics of RR

Phase 1 Review Criteria 10 1. Importance of the research

Phase 2 Criteria 11 1. Whether the data are able

RR workflow 12 Source: OSF.io Conference (MSR) Journal (EMSE)

Assumptions of pre-registration 13 Only positivist philosophies matter (signi fi

Current State of RR in SE 14 MSR 2020 feedback

MSR Results - IPA 15 “During my review, though, I

Open Issues and Questions

Three Faces of RR 18 RR to prevent questionable research

Ongoing Questions 19 To what CS studies could it apply?

Ongoing Questions 20 Administrative approach Are QRPs even an issue?

Admin challenges 21 Do not overlook issue of work-life balance

When Do We Learn Research Methods? 22 https://xkcd.com/2618/

Admin challenges (cont) 23 Reviewer/editor burden is increasingly a problem

The Open Science Foundation serves as the de facto driver

Publication models run into journal profit models First phase -

If QRP is not a problem, do we need RR?

Effectiveness 27 Not yet studied for CS/SE Chambers - more

Department of Reuse

Department of Reuse 29 https://reuse-dept.org/ Ultimately RR is about pre-specifying

30

Acknowledgments 31 Teresa Baldassarre, Janet Siegmund, Tim Menzies Martin Sheppherd,

Neil Ernst  [email protected] @neilernst 32 ESEM RR Guide ACM RR