Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The value, impact and barriers to Open Access data

The value, impact and barriers to Open Access data

Open access and data – Open access data in the context of data-driven delivery / Scientific Data as a Service – TAO case study. Talk presented at Swinburne University of Technology as part of International Open Access Week, 20 – 26 October 2014

Dr. Arna Karick

October 24, 2014
Tweet

More Decks by Dr. Arna Karick

Other Decks in Research

Transcript

  1. The Value and Impact of Open Access Data & Barriers

    to Open Access Data Dr. Arna Karick e-Research Consultant, Swinburne Research
  2. The Value and Impact of Open Access Data Astronomy (research

    & scientific discovery) http://www.spacetelescope.org/projects/hiddentreasures/ http://www.sdss.org
  3. The Value and Impact of Open Access Data Data Journalism

    (data mining/research for society) • 100s of websites ready to be mined! • GovHack: http://data.gov.au • With a few coding skills and access to online tools researchers can create new datasets & visualisations for research and tell stories... • The Conversation pieces. Could also be integrated into Australia Policy Online. • Would be nice to have a home for researcher created datasets... From The Guardian newspaper Data Blog (http://visual.ly/global-emissions-kyoto)
  4. Astronomy Sloan Digital Sky Survey: • Images and spectral data

    of ~500 million stars and galaxies taken with the Apache Point Observatory • Data archive designed and developed in collaboration with Microsoft Research. Careful thought went into data re-use for research. • Archive includes multiple tools and catalogues. Multiple data releases over the past decade had meant that the archive and data can evolve. • Data collection began in 2000 (SDSSI+II, ~8 years) followed by a silly number of papers (5800+ peer reviewed, total #citations ~245,000). • Profound impact on the way astronomers were doing research: moving from detailed single object studies to global, highly statistical analyses of millions of objects. • Enabled crowd sourced citizen science projects like the Zooniverse (44 scientific papers, discoveries of “anomalies”, ~150,000 users in Y1) "By some measures, the scientific impact of the SDSS over the past decade is comparable to or exceeds that of the Hubble Space Telescope," says Donald Schneider of Pennsylvania State University. D.G. York, et al., "The Sloan Digital Sky Survey: Technical summary," Astronom. J., 120(3): 1579-87, 2000. 4398 citations http:// labs.adsabs.harvard.edu/ adsabs
  5. But SDSS is only one of many examples of data-driven

    discovery... • The first Fast Radio Burst (FRBs) - a short lived, energetic burst of radio emission coming from our neighboring galaxy - was discovered by searching through (reprocessing?) archival radio telescope data. • The Hubble Space Telescope’s MAST and Legacy Archives contain decades of observations freely accessible through a web portal. (fun fact: US researchers can apply for $$ to mine the data archive) • After 21 years of exploration the 10,000th refereed paper was published in 2011. (fun fact: ARI-LJMU, faintest known SN assoc. with a long durat. gamma-ray burst) • In 2012 ESA launched the Hubble’s Hidden Treasures outreach program. Credit: NASA, ESA, and A. Feild (STScI) http://hubblesite.org/newscenter/archive/releases/2011/40/image/a/
  6. Figshare (Also recommended for Nature’s Scientific Data Portal) • Example:

    my Figshare profile • Accepts most file formats including .fits • Size limits on individual files. • Total data limits depending on account type. http://figshare.com/authors/Arna_Karick/454430
  7. Scientific Data - Nature Portal • Doesn’t require paper publication

    in Nature (they are quite happy to take your money regardless...) • Publishes short Data Description type papers (with a DOI) BUT does not host data - only provides links
  8. RDSI - VicNode 2015: Pay as you go. A ~70TB

    collection will cost ~$10+K
  9. • It’s not enough to describe the data + include

    caveats & quality ratings • Most general use data repositories can’t handle “big data” (Gb+ datasets) • Data storage will never be free: accessing data seems to drive costs up • The best data archives come with tools for analysis (or formatting) • Data citation can be tricky but it’s not a major problem (tracking DOIs is) • Institutional repositories needs to be jointly developed and flexible in terms of management and they are intended to be used • Apparent rules and regulations. Abiding laws (if they exist) for international and collaborative research is tricky - The who owns the moon? problem. ANDS pushes for licensing, copyright and access statements. Ownership/ transferral/reuse of Figshare material? - copyright? or contract law? Important to understand terms and conditions. Issues