Upgrade to Pro — share decks privately, control downloads, hide ads and more …

RR.pdf

Neil Ernst
November 04, 2022

 RR.pdf

Neil Ernst

November 04, 2022
Tweet

More Decks by Neil Ernst

Other Decks in Research

Transcript

  1. Neil Ernst, University of Victoria. November 2022 - Dagstuhl Seminar

    on Evidence in CS Research Registered Reports in Computer Science: Why Bother? 1
  2. 2 EMSE J. → MSR, ICSME, then ESEM, now CHASE,

    SANER, ICPC TOSEM (direct submit) CSE special issue (ACM, Springer, T&F)
  3. RR: Why? 3 Pre-registration: register your protocol (particularly for clinical

    trials) Registered Report: a peer reviewed pre-registration. 1. Provide feedback at early phase of research (before spending $$$) 2. Reduce/eliminate under-powered, selectively reported, researcher-biased studies
  4. Questionable Research Practices Hurt Science 4 HARKing and Post-hoc rationalizing

    Neat data, what explains it? (story-telling, Gelman & Basbøll) File-drawer effect Hmm, bad outcome, bin it. Negative result - reject. Forking paths in data analysis choices (researcher bias) Let’s use a Kruskal-Wallis test and then Lewandoski-Neymar test of significance (instead of?) Result when publication venue and publication significance/novelty are emphasized over replication, soundness BTW, why aren’t we doing more on these problems?!
  5. 5

  6. 8 Robert Feldt’s Manifesto for Empirical Software Research Truth Over

    Novelty, relevance, and importance Plurality and Nuance Over Simple, dichotomous claims Human Factors Over Algorithms and Technology Explanations and Theories Over Descriptions of Data at Hand
  7. Phase 1 Review Criteria 10 1. Importance of the research

    question(s). 2. Logic, rationale, and plausibility of the proposed hypotheses. 3. Soundness and feasibility of the methodology and analysis pipeline (including statistical power analysis where appropriate). 4. Clarity and degree of methodological detail for replication. 5. Will results obtained test the stated hypotheses? These were derived via https://osf.io/pukzy/ and live on our EMSE Open Science repo and ACM SIGSoft RR standard Is this study novel, significant, able to find effects?
  8. Phase 2 Criteria 11 1. Whether the data are able

    to test the authors’ proposed hypotheses by satisfying the approved outcome-neutral conditions (such as quality checks, positive controls) 2. Whether the Introduction, rationale and stated hypotheses are the same as the approved Stage 1 submission (required) 3. Whether the authors adhered precisely to the registered experimental procedures 4. Whether any unregistered post hoc analyses added by the authors are justified, methodologically sound, and informative 5. Whether the authors’ conclusions are justified given the data Did the authors execute on Phase 1 plan?
  9. Assumptions of pre-registration 13 Only positivist philosophies matter (signi fi

    cance testing, NHST, falsi fi cation etc) Can a priori determine the right protocol to follow Deductive and con fi rmatory, not inductive or abductive and exploratory (usually) Researchers actually understand e ff ect sizes, power calculations, appropriate statistical tests, data bias, epistemology, instrument bias, etc. → Spoiler alert - they usually do not - 1 term of stats/epistemology AT MOST (but 2 years of calc and discrete math …)
  10. Current State of RR in SE 14 MSR 2020 feedback

    on IPA: “I think it is a key principle. However, in a way it also raises the bar significantly for the Registered Reports” “[...] the fact that the results are missing, helps reviewers and authors focus on the methodological issue, which is a great added value in the review process [...]”
  11. MSR Results - IPA 15 “During my review, though, I

    had the feeling that more interaction with the authors could add even further value” “I think the EMSE paper still needs a careful assessment, as it is still possible that the operation or the application of the protocol turns out to be wrong [...]” “I felt a bit uncomfortable to have this burden on my shoulders as a reviewer so early in the process.” No (3 responses): “A registered report may be, and should be allowed to be, risky and, therefore, may not work out. The ensuing work should be subject to full and normal review.”
  12. Three Faces of RR 18 RR to prevent questionable research

    practices Tell the world what you will do, then do it RR as doctoral symposium Early feedback before expensive data collection RR as 1st round review Pre-empt journal review with in-principle acceptance
  13. Ongoing Questions 19 To what CS studies could it apply?

    Most suited to post-positivist, confirmatory studies with clear hypotheses. What about: Qualitative/constructivist approaches? Exploratory studies? Data mining and ML studies? (See Narayanan and replicability in ML: https:// reproducible.cs.princeton.edu/ )
  14. Admin challenges 21 Do not overlook issue of work-life balance

    in all the things “Never ascribe to malice that which can be explained by overcommitment” CS has conference and journals - no one else does Journals and conference rarely share admin interfaces (HotCRP vs Editorial Manager - and they are usually terrible) Hard to manage reviewer discussions esp longitudinally Currently, stick Phase 1 on Arxiv/OSF.io/Github Have to explicitly coach reviewers (not yet mature, but true of other formats) Manually track in progress RR on Google Sheets (low vacation factor)
  15. Admin challenges (cont) 23 Reviewer/editor burden is increasingly a problem

    (overall, not just RR) Accepting 5 IPAs at 3 conferences a year = 15 journal submissions in the next 12-18 months, with publication 24-36 months after that + who is asked to be conference track chair? What freedoms do they have? Minor shenanigans - reviewer COI, authorship incentives Overreliance on OSF and (maybe) PCI approach
  16. The Open Science Foundation serves as the de facto driver

    for RR approaches: OSF.io Registry PeerCommunityIn RR promises (!) to manage the process entirely. Journals simply indicate they accept RR (as-is or with minor review). Editors become “recommenders” New Approach: PeerCommunityIn 24
  17. Publication models run into journal profit models First phase -

    Journal - then present at conference? Admin: J1C2? 25
  18. If QRP is not a problem, do we need RR?

    Conversely - if they are a problem elsewhere, why not here? Evidence researchers don’t understand effect sizes or practical significance Few studies replicated Lots of hard to access industry data More research needed! QRPs in CS 26 Shepperd, M., et al. (2018). The role and value of replication in empirical software engineering results. IST, 99, 120–132. Jørgensen, M. et al (2016) , Incorrect Reeulsts in SE Experiments: How to Improve Practices. JSS, 116, 133—145
  19. Effectiveness 27 Not yet studied for CS/SE Chambers - more

    negative results… Does it encourage narrow scoping? actually help with file-drawer or other problems? place implicit value on empirical, experimental research? see https://www.nature.com/articles/d41586-019-02674-6 (“What’s next for Registered Reports?”)
  20. Department of Reuse 29 https://reuse-dept.org/ Ultimately RR is about pre-specifying

    analysis. One way to do that is to reuse analysis protocols from other papers. Done all the time in medicine; rarely in CS except in benchmarks. Q: to what extent are artifacts such as protocols reused?
  21. 30

  22. Acknowledgments 31 Teresa Baldassarre, Janet Siegmund, Tim Menzies Martin Sheppherd,

    Prem Devanbu, Robert Feldt & Tom Zimmermann MSR 2020 SC Andrew Gelman/Chris Chambers/Ben Goldacre