Benefits of Open Data and Open Stimuli

by Research Data Services

Slide 1

Slide 1 text

Benefits of Open Data and Open Stimuli Morton Ann Gernsbacher| Dept. of Psychology| UW-Madison

Slide 2

Slide 2 text

Beneﬁts of Open Data and Open Stimuli Morton Ann Gernsbacher Vilas Research & Sir Frederic Bartlett Professor University of Wisconsin-Madison @GernsbacherLab Greetings, everyone. I’m Morton Ann Gernsbacher. I’m a professor in the Psychology Department. I’ve been here at the UW for 25 years and before that I was on the faculty at the University of Oregon for 10 years. I did my PhD work at the University of Texas in the early 80s.

Slide 3

Slide 3 text

Beneﬁts of Open Data and Open Stimuli Morton Ann Gernsbacher Vilas Research & Sir Frederic Bartlett Professor University of Wisconsin-Madison @GernsbacherLab I’ve become a proponent of Open Data and Open Stimuli, and more generally replicability and transparency in research. Of course, we all know that

Slide 4

Slide 4 text

the beauty of science is that its results are reproducible. And we all know that the core of our

Slide 5

Slide 5 text

published results rests on the assumption that are results are replicable. I teach an undergraduate Research Methods course and rather than talking about inter-item reliability and part-whole reliability and the other ﬂavors of reliability, I cut right to change and teach that replication indexes reliability. Perhaps what feels a bit new is the

Slide 6

Slide 6 text

contemporary focus on transparency. And by transparency, we mean

Slide 7

Slide 7 text

taking steps such as

Slide 8

Slide 8 text

Pre-Registering a study, which in many ways looks similar to the way we’ve always done research.

Slide 9

Slide 9 text

We develop an idea, design a study to test that idea, collect and then analyze data, write a research report, and then publish that report. What’s different about pre-registration, and specifically, Registered Report journal articles, is that instead of submitting our report to

Slide 10

Slide 10 text

Peer Review Stage 1 Stage 2 Peer Review peer review only after we’ve collected and analyzed our data, we also submit our report to peer review CLICK: BEFORE we collect and analyze our data. Steve Lindsay will explain more about Registered Reports in his presentation later in our session. But two points I want to stress now are first, that researchers can pre- register their study,

Slide 11

Slide 11 text

Peer Review Stage 1 Stage 2 Peer Review even if they’re not planning to submit to a journal that offers registered reports, by using websites, such as Open Science Framework and AsPredicted.org. And second, most of us are already familiar with the process of pre- registration.

Slide 12

Slide 12 text

Peer Review Stage 1 Stage 2 Peer Review PROPOSAL DEFENSE FINAL DEFENSE It’s the way we wrote our dissertations, masters theses, and senior theses. We registered, and defended, our hypotheses with our committee during our CLICK: proposal defense. And, then later, during our CLICK: final defense, we defended our results and interpretation of the results.

Slide 13

Slide 13 text

That’s what pre-registering a study is all about. Another step toward greater research transparency is

Slide 14

Slide 14 text

making research materials open. For those of us trained as psycholinguists, open materials are quite familiar.

Slide 15

Slide 15 text

Even in the early 80s, we couldn’t publish our studies without making our experimental materials available to reviewers – and often available to readers, too, in the appendices of our published paper. Just providing an example stimulus or two in the body of the manuscript was NOT enough.

Slide 16

Slide 16 text

Reviewers needed to be able to peruse the entire set of all of our materials, for each and every experiment. After the

Slide 17

Slide 17 text

Internet came online, journal editors preferred that we make ALL of our experimental materials available to reviewers during peer review and to readers forever after, by using our labs’ websites rather than the journals’ limited printed pages,

Slide 18

Slide 18 text

which is why to this day, I still post all my experimental materials on my lab’s website, although I’ve also begun posting additional copies on repositories, such as the Open Science Framework, which guarantees 50 years of longevity, which is more than I can promise on my lab’s website.

Slide 19

Slide 19 text

Another step toward greater research transparency is

Slide 20

Slide 20 text

open data, I know, from talking with my colleagues, that opening up one’s data can cause anxious feelings of vulnerability. One way I’ve lowered my own anxiety about making my data public is

Slide 21

Slide 21 text

that I do a data checking swap with other colleagues. I send my data to other colleagues to check, and they send me their data to check. We try to reproduce each other’s reported results prior to each of us posting our data -- or submitting our manuscripts. A fourth step toward greater research transparency is

Slide 22

Slide 22 text

Open Access publishing, which can mean either publishing in an Open Access journal, or placing the final version of our manuscripts

Slide 23

Slide 23 text

in an Open Access repository, such as PubMed or PsyArXiv.

Slide 24

Slide 24 text

So, these are four steps we can take to improve reproducibility through transparency. We can pre-register our study’s goals and analysis plans; we can make our study’s research materials available to everyone; we can make our study’s data available to everyone; and we can make our final research report available to everyone.

Slide 25

Slide 25 text

But these four steps take extra time and effort, and we all know that researchers, particularly academic researchers,

Slide 26

Slide 26 text

love and crave

Slide 27

Slide 27 text

Incentives. As the U.S.

Slide 28

Slide 28 text

National Institutes of Health have noted, QUOTE “the current incentive system may be a major barrier for achieving transparency in research.” UNQUOTE Similarly, this past September, the Science Ministers of Canada, France, Germany, Italy, Japan, the U.K., and the U.S., aka:

Slide 29

Slide 29 text

the G7 Science Ministers advised that QUOTE “evaluation of research careers should better recognize and reward Open Science activities” UNQUOTE

Slide 30

Slide 30 text

And the U.S. National Academies of Science are currently holding workshops to QUOTE “Recommend specific solutions in policy, … incentives and requirements that would facilitate Open Science.” UNQUOTE But why wait for an august agency to offer recommendations?

Slide 31

Slide 31 text

But today I want to talk about the selfish or perhaps better put Investigator-enhancement reasons for Open Science, and in particular Open Data and Open Materials. I’ll begin by describing a couple of the QUOTE “selfish reasons to work reproducibly” articulated by Florian Markowetz in a 2015 article in Genomic Biology. Markowetz begins by relating the headline

Slide 32

Slide 32 text

“How bright promise in cancer testing fell apart,” which appeared in The New York Times in the summer 2011. The New York Times article describes how to scientists had discovered QUOTE “lethal data analysis problems in a series of high-impact papers by breast cancer researchers from Duke University. UNQUOTE. As Markowetz notes, the errors the two scientists identified QUOTE “could have easily been spotted by any co-author before submitting the paper.

Slide 33

Slide 33 text

The data sets are not huge and can easily be spot-checked on a standard laptop. You do not have to be a statistics wizard to realize that patient numbers differ, labels got swapped or samples appear multiple times with conflicting annotations in the same data set.” UNQUOTE But no one had noticed them. I agree with Markowetz that had the Duke cancer researchers

Slide 34

Slide 34 text

posted their data during peer review or -- had they done as I’ve begun to do -- swap data checking with another lab, these errors would have been caught prior to publication. But they weren’t and, as the New York Times reported, bright promise in cancer testing fell apart. Markowetz also notes that making one’s data, materials, and methods makes it easier

Slide 35

Slide 35 text

to write their journal articles, and I heartily agree. Having all my data, materials, analysis code, and the like packaged in such a way that others can see them allows me to also easily access them. Another selfish reason is that research that is publicly documented is more likely to be

Slide 36

Slide 36 text

replicated. ANECDOTE ABOUT “DUG WITH THE SPADE.” So what are the steps involved in Open Data and Open Materials? Markowetz suggests the following

Slide 37

Slide 37 text

At the lowest level, working reproducibly just means avoiding beginners’ mistakes. Keep your project organized, name your ﬁles and directories in some informative way, store your data and code at a single backed-up location. Don’t spread your data over different servers, laptops and hard drives. At the lowest level, working reproducibly just means avoiding beginners’ mistakes. Keep your project organized, name your files and directories in some informative way, store your data and code at a single backed-up location. Don’t spread your data over different servers, laptops and hard drives.

Slide 38

Slide 38 text

To achieve the next levels of reproducibility, you need to learn some tools of computational reproducibility. In general, reproducibility is improved when there is less clicking and pasting and more scripting and coding. To achieve the next levels of reproducibility, you need to learn some tools of computational reproducibility. In general, reproducibility is improved when there is less clicking and pasting and more scripting and coding.

Slide 39

Slide 39 text

To achieve the next levels of reproducibility, you need to learn some tools of computational reproducibility. In general, reproducibility is improved when there is less clicking and pasting and more scripting and coding. Merkowitz recommends doing analyses in R or Python and documenting analysis using knitR or IPython notebooks. Because these tools help merge descriptive text with analysis code into dynamic documents that can be automatically updated every time the data or code change.

Slide 40

Slide 40 text

As a next step, learn how to use a version-control system like git on a collaborative platform such as GitHub. Finally, if you want to become a pro, learn to use docker, which will make your analysis self- contained and easily transportable to different systems. As a next step, Merkowitz recommends learning how to use a version- control system like git on a collaborative platform such as GitHub. Finally, if you want to become a pro, learn to use docker, which will make your analysis self-contained and easily transportable to different systems. UNQUOTE I admit that I’m at level 2.

Slide 41

Slide 41 text

Keep your project organized, name your ﬁles and directories in some informative way, store your data and code at a single backed-up location. Don’t spread your data over different servers, laptops and hard drives. ... learn some tools of computational reproducibility. I keep my project organized, I name my files and directories in some informative way, I store my data and code at a single backed-up location. I don’t spread my data over different servers, laptops and hard drives. And I’ve learned some tools of computational reproducibility. I’m not yet at the github or docker level, but plan to move there soon.

Slide 42

Slide 42 text

And I’d love to talk with you about other strategies for the security -- and greater visibility -- that comes with Open Science. So let me open the ﬂoor for our discussion.

Slide 43

Slide 43 text

Morton Ann Gernsbacher @GernsbacherLab www.GernsbacherLab.org [email protected] Thank you.