Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Benefits of Open Data and Open Stimuli

Research Data Services
November 21, 2017
310

Benefits of Open Data and Open Stimuli

Presentation given as part of the RDS Holz Brown Bag series, November 2017.

Research Data Services

November 21, 2017
Tweet

Transcript

  1. Benefits of Open Data and Open Stimuli Morton Ann Gernsbacher

    Vilas Research & Sir Frederic Bartlett Professor University of Wisconsin-Madison @GernsbacherLab Greetings, everyone. I’m Morton Ann Gernsbacher. I’m a professor in the Psychology Department. I’ve been here at the UW for 25 years and before that I was on the faculty at the University of Oregon for 10 years. I did my PhD work at the University of Texas in the early 80s.
  2. Benefits of Open Data and Open Stimuli Morton Ann Gernsbacher

    Vilas Research & Sir Frederic Bartlett Professor University of Wisconsin-Madison @GernsbacherLab I’ve become a proponent of Open Data and Open Stimuli, and more generally replicability and transparency in research. Of course, we all know that
  3. published results rests on the assumption that are results are

    replicable. I teach an undergraduate Research Methods course and rather than talking about inter-item reliability and part-whole reliability and the other flavors of reliability, I cut right to change and teach that replication indexes reliability. Perhaps what feels a bit new is the
  4. Pre-Registering a study, which in many ways looks similar to

    the way we’ve always done research.
  5. We develop an idea, design a study to test that

    idea, collect and then analyze data, write a research report, and then publish that report. What’s different about pre-registration, and specifically, Registered Report journal articles, is that instead of submitting our report to
  6. Peer Review Stage 1 Stage 2 Peer Review peer review

    only after we’ve collected and analyzed our data, we also submit our report to peer review CLICK: BEFORE we collect and analyze our data. Steve Lindsay will explain more about Registered Reports in his presentation later in our session. But two points I want to stress now are first, that researchers can pre- register their study,
  7. Peer Review Stage 1 Stage 2 Peer Review even if

    they’re not planning to submit to a journal that offers registered reports, by using websites, such as Open Science Framework and AsPredicted.org. And second, most of us are already familiar with the process of pre- registration.
  8. Peer Review Stage 1 Stage 2 Peer Review PROPOSAL DEFENSE

    FINAL DEFENSE It’s the way we wrote our dissertations, masters theses, and senior theses. We registered, and defended, our hypotheses with our committee during our CLICK: proposal defense. And, then later, during our CLICK: final defense, we defended our results and interpretation of the results.
  9. making research materials open. For those of us trained as

    psycholinguists, open materials are quite familiar.
  10. Even in the early 80s, we couldn’t publish our studies

    without making our experimental materials available to reviewers – and often available to readers, too, in the appendices of our published paper. Just providing an example stimulus or two in the body of the manuscript was NOT enough.
  11. Reviewers needed to be able to peruse the entire set

    of all of our materials, for each and every experiment. After the
  12. Internet came online, journal editors preferred that we make ALL

    of our experimental materials available to reviewers during peer review and to readers forever after, by using our labs’ websites rather than the journals’ limited printed pages,
  13. which is why to this day, I still post all

    my experimental materials on my lab’s website, although I’ve also begun posting additional copies on repositories, such as the Open Science Framework, which guarantees 50 years of longevity, which is more than I can promise on my lab’s website.
  14. open data, I know, from talking with my colleagues, that

    opening up one’s data can cause anxious feelings of vulnerability. One way I’ve lowered my own anxiety about making my data public is
  15. that I do a data checking swap with other colleagues.

    I send my data to other colleagues to check, and they send me their data to check. We try to reproduce each other’s reported results prior to each of us posting our data -- or submitting our manuscripts. A fourth step toward greater research transparency is
  16. Open Access publishing, which can mean either publishing in an

    Open Access journal, or placing the final version of our manuscripts
  17. So, these are four steps we can take to improve

    reproducibility through transparency. We can pre-register our study’s goals and analysis plans; we can make our study’s research materials available to everyone; we can make our study’s data available to everyone; and we can make our final research report available to everyone.
  18. But these four steps take extra time and effort, and

    we all know that researchers, particularly academic researchers,
  19. National Institutes of Health have noted, QUOTE “the current incentive

    system may be a major barrier for achieving transparency in research.” UNQUOTE Similarly, this past September, the Science Ministers of Canada, France, Germany, Italy, Japan, the U.K., and the U.S., aka:
  20. the G7 Science Ministers advised that QUOTE “evaluation of research

    careers should better recognize and reward Open Science activities” UNQUOTE
  21. And the U.S. National Academies of Science are currently holding

    workshops to QUOTE “Recommend specific solutions in policy, … incentives and requirements that would facilitate Open Science.” UNQUOTE But why wait for an august agency to offer recommendations?
  22. But today I want to talk about the selfish or

    perhaps better put Investigator-enhancement reasons for Open Science, and in particular Open Data and Open Materials. I’ll begin by describing a couple of the QUOTE “selfish reasons to work reproducibly” articulated by Florian Markowetz in a 2015 article in Genomic Biology. Markowetz begins by relating the headline
  23. “How bright promise in cancer testing fell apart,” which appeared

    in The New York Times in the summer 2011. The New York Times article describes how to scientists had discovered QUOTE “lethal data analysis problems in a series of high-impact papers by breast cancer researchers from Duke University. UNQUOTE. As Markowetz notes, the errors the two scientists identified QUOTE “could have easily been spotted by any co-author before submitting the paper.
  24. The data sets are not huge and can easily be

    spot-checked on a standard laptop. You do not have to be a statistics wizard to realize that patient numbers differ, labels got swapped or samples appear multiple times with conflicting annotations in the same data set.” UNQUOTE But no one had noticed them. I agree with Markowetz that had the Duke cancer researchers
  25. posted their data during peer review or -- had they

    done as I’ve begun to do -- swap data checking with another lab, these errors would have been caught prior to publication. But they weren’t and, as the New York Times reported, bright promise in cancer testing fell apart. Markowetz also notes that making one’s data, materials, and methods makes it easier
  26. to write their journal articles, and I heartily agree. Having

    all my data, materials, analysis code, and the like packaged in such a way that others can see them allows me to also easily access them. Another selfish reason is that research that is publicly documented is more likely to be
  27. replicated. ANECDOTE ABOUT “DUG WITH THE SPADE.” So what are

    the steps involved in Open Data and Open Materials? Markowetz suggests the following
  28. At the lowest level, working reproducibly just means avoiding beginners’

    mistakes. Keep your project organized, name your files and directories in some informative way, store your data and code at a single backed-up location. Don’t spread your data over different servers, laptops and hard drives. At the lowest level, working reproducibly just means avoiding beginners’ mistakes. Keep your project organized, name your files and directories in some informative way, store your data and code at a single backed-up location. Don’t spread your data over different servers, laptops and hard drives.
  29. To achieve the next levels of reproducibility, you need to

    learn some tools of computational reproducibility. In general, reproducibility is improved when there is less clicking and pasting and more scripting and coding. To achieve the next levels of reproducibility, you need to learn some tools of computational reproducibility. In general, reproducibility is improved when there is less clicking and pasting and more scripting and coding.
  30. To achieve the next levels of reproducibility, you need to

    learn some tools of computational reproducibility. In general, reproducibility is improved when there is less clicking and pasting and more scripting and coding. Merkowitz recommends doing analyses in R or Python and documenting analysis using knitR or IPython notebooks. Because these tools help merge descriptive text with analysis code into dynamic documents that can be automatically updated every time the data or code change.
  31. As a next step, learn how to use a version-control

    system like git on a collaborative platform such as GitHub. Finally, if you want to become a pro, learn to use docker, which will make your analysis self- contained and easily transportable to different systems. As a next step, Merkowitz recommends learning how to use a version- control system like git on a collaborative platform such as GitHub. Finally, if you want to become a pro, learn to use docker, which will make your analysis self-contained and easily transportable to different systems. UNQUOTE I admit that I’m at level 2.
  32. Keep your project organized, name your files and directories in

    some informative way, store your data and code at a single backed-up location. Don’t spread your data over different servers, laptops and hard drives. ... learn some tools of computational reproducibility. I keep my project organized, I name my files and directories in some informative way, I store my data and code at a single backed-up location. I don’t spread my data over different servers, laptops and hard drives. And I’ve learned some tools of computational reproducibility. I’m not yet at the github or docker level, but plan to move there soon.
  33. And I’d love to talk with you about other strategies

    for the security -- and greater visibility -- that comes with Open Science. So let me open the floor for our discussion.