Slide 1

Slide 1 text

Neil Ernst, University of Victoria. November 2022 - Dagstuhl Seminar on Evidence in CS Research Registered Reports in Computer Science: Why Bother? 1

Slide 2

Slide 2 text

2 EMSE J. → MSR, ICSME, then ESEM, now CHASE, SANER, ICPC TOSEM (direct submit) CSE special issue (ACM, Springer, T&F)

Slide 3

Slide 3 text

RR: Why? 3 Pre-registration: register your protocol (particularly for clinical trials) Registered Report: a peer reviewed pre-registration. 1. Provide feedback at early phase of research (before spending $$$) 2. Reduce/eliminate under-powered, selectively reported, researcher-biased studies

Slide 4

Slide 4 text

Questionable Research Practices Hurt Science 4 HARKing and Post-hoc rationalizing Neat data, what explains it? (story-telling, Gelman & Basbøll) File-drawer effect Hmm, bad outcome, bin it. Negative result - reject. Forking paths in data analysis choices (researcher bias) Let’s use a Kruskal-Wallis test and then Lewandoski-Neymar test of significance (instead of?) Result when publication venue and publication significance/novelty are emphasized over replication, soundness BTW, why aren’t we doing more on these problems?!

Slide 5

Slide 5 text

5

Slide 6

Slide 6 text

https://medianwatch.netlify.app/post/z_values/

Slide 7

Slide 7 text

7 https://arxiv.org/abs/1706.00933

Slide 8

Slide 8 text

8 Robert Feldt’s Manifesto for Empirical Software Research Truth Over Novelty, relevance, and importance Plurality and Nuance Over Simple, dichotomous claims Human Factors Over Algorithms and Technology Explanations and Theories Over Descriptions of Data at Hand

Slide 9

Slide 9 text

Mechanics of RR

Slide 10

Slide 10 text

Phase 1 Review Criteria 10 1. Importance of the research question(s). 2. Logic, rationale, and plausibility of the proposed hypotheses. 3. Soundness and feasibility of the methodology and analysis pipeline (including statistical power analysis where appropriate). 4. Clarity and degree of methodological detail for replication. 5. Will results obtained test the stated hypotheses? These were derived via https://osf.io/pukzy/ and live on our EMSE Open Science repo and ACM SIGSoft RR standard Is this study novel, significant, able to find effects?

Slide 11

Slide 11 text

Phase 2 Criteria 11 1. Whether the data are able to test the authors’ proposed hypotheses by satisfying the approved outcome-neutral conditions (such as quality checks, positive controls) 2. Whether the Introduction, rationale and stated hypotheses are the same as the approved Stage 1 submission (required) 3. Whether the authors adhered precisely to the registered experimental procedures 4. Whether any unregistered post hoc analyses added by the authors are justified, methodologically sound, and informative 5. Whether the authors’ conclusions are justified given the data Did the authors execute on Phase 1 plan?

Slide 12

Slide 12 text

RR workflow 12 Source: OSF.io Conference (MSR) Journal (EMSE)

Slide 13

Slide 13 text

Assumptions of pre-registration 13 Only positivist philosophies matter (signi fi cance testing, NHST, falsi fi cation etc) Can a priori determine the right protocol to follow Deductive and con fi rmatory, not inductive or abductive and exploratory (usually) Researchers actually understand e ff ect sizes, power calculations, appropriate statistical tests, data bias, epistemology, instrument bias, etc. → Spoiler alert - they usually do not - 1 term of stats/epistemology AT MOST (but 2 years of calc and discrete math …)

Slide 14

Slide 14 text

Current State of RR in SE 14 MSR 2020 feedback on IPA: “I think it is a key principle. However, in a way it also raises the bar significantly for the Registered Reports” “[...] the fact that the results are missing, helps reviewers and authors focus on the methodological issue, which is a great added value in the review process [...]”

Slide 15

Slide 15 text

MSR Results - IPA 15 “During my review, though, I had the feeling that more interaction with the authors could add even further value” “I think the EMSE paper still needs a careful assessment, as it is still possible that the operation or the application of the protocol turns out to be wrong [...]” “I felt a bit uncomfortable to have this burden on my shoulders as a reviewer so early in the process.” No (3 responses): “A registered report may be, and should be allowed to be, risky and, therefore, may not work out. The ensuing work should be subject to full and normal review.”

Slide 16

Slide 16 text

No content

Slide 17

Slide 17 text

Open Issues and Questions

Slide 18

Slide 18 text

Three Faces of RR 18 RR to prevent questionable research practices Tell the world what you will do, then do it RR as doctoral symposium Early feedback before expensive data collection RR as 1st round review Pre-empt journal review with in-principle acceptance

Slide 19

Slide 19 text

Ongoing Questions 19 To what CS studies could it apply? Most suited to post-positivist, confirmatory studies with clear hypotheses. What about: Qualitative/constructivist approaches? Exploratory studies? Data mining and ML studies? (See Narayanan and replicability in ML: https:// reproducible.cs.princeton.edu/ )

Slide 20

Slide 20 text

Ongoing Questions 20 Administrative approach Are QRPs even an issue? Effectiveness

Slide 21

Slide 21 text

Admin challenges 21 Do not overlook issue of work-life balance in all the things “Never ascribe to malice that which can be explained by overcommitment” CS has conference and journals - no one else does Journals and conference rarely share admin interfaces (HotCRP vs Editorial Manager - and they are usually terrible) Hard to manage reviewer discussions esp longitudinally Currently, stick Phase 1 on Arxiv/OSF.io/Github Have to explicitly coach reviewers (not yet mature, but true of other formats) Manually track in progress RR on Google Sheets (low vacation factor)

Slide 22

Slide 22 text

When Do We Learn Research Methods? 22 https://xkcd.com/2618/

Slide 23

Slide 23 text

Admin challenges (cont) 23 Reviewer/editor burden is increasingly a problem (overall, not just RR) Accepting 5 IPAs at 3 conferences a year = 15 journal submissions in the next 12-18 months, with publication 24-36 months after that + who is asked to be conference track chair? What freedoms do they have? Minor shenanigans - reviewer COI, authorship incentives Overreliance on OSF and (maybe) PCI approach

Slide 24

Slide 24 text

The Open Science Foundation serves as the de facto driver for RR approaches: OSF.io Registry PeerCommunityIn RR promises (!) to manage the process entirely. Journals simply indicate they accept RR (as-is or with minor review). Editors become “recommenders” New Approach: PeerCommunityIn 24

Slide 25

Slide 25 text

Publication models run into journal profit models First phase - Journal - then present at conference? Admin: J1C2? 25

Slide 26

Slide 26 text

If QRP is not a problem, do we need RR? Conversely - if they are a problem elsewhere, why not here? Evidence researchers don’t understand effect sizes or practical significance Few studies replicated Lots of hard to access industry data More research needed! QRPs in CS 26 Shepperd, M., et al. (2018). The role and value of replication in empirical software engineering results. IST, 99, 120–132. Jørgensen, M. et al (2016) , Incorrect Reeulsts in SE Experiments: How to Improve Practices. JSS, 116, 133—145

Slide 27

Slide 27 text

Effectiveness 27 Not yet studied for CS/SE Chambers - more negative results… Does it encourage narrow scoping? actually help with file-drawer or other problems? place implicit value on empirical, experimental research? see https://www.nature.com/articles/d41586-019-02674-6 (“What’s next for Registered Reports?”)

Slide 28

Slide 28 text

Department of Reuse

Slide 29

Slide 29 text

Department of Reuse 29 https://reuse-dept.org/ Ultimately RR is about pre-specifying analysis. One way to do that is to reuse analysis protocols from other papers. Done all the time in medicine; rarely in CS except in benchmarks. Q: to what extent are artifacts such as protocols reused?

Slide 30

Slide 30 text

30

Slide 31

Slide 31 text

Acknowledgments 31 Teresa Baldassarre, Janet Siegmund, Tim Menzies Martin Sheppherd, Prem Devanbu, Robert Feldt & Tom Zimmermann MSR 2020 SC Andrew Gelman/Chris Chambers/Ben Goldacre

Slide 32

Slide 32 text

Neil Ernst
 [email protected] @neilernst 32 ESEM RR Guide ACM RR Supplement https://reuse-dept.org/