The value, impact and barriers to Open Access data
Open access and data – Open access data in the context of data-driven delivery / Scientific Data as a Service – TAO case study. Talk presented at Swinburne University of Technology as part of International Open Access Week, 20 – 26 October 2014
(data mining/research for society) • 100s of websites ready to be mined! • GovHack: http://data.gov.au • With a few coding skills and access to online tools researchers can create new datasets & visualisations for research and tell stories... • The Conversation pieces. Could also be integrated into Australia Policy Online. • Would be nice to have a home for researcher created datasets... From The Guardian newspaper Data Blog (http://visual.ly/global-emissions-kyoto)
of ~500 million stars and galaxies taken with the Apache Point Observatory • Data archive designed and developed in collaboration with Microsoft Research. Careful thought went into data re-use for research. • Archive includes multiple tools and catalogues. Multiple data releases over the past decade had meant that the archive and data can evolve. • Data collection began in 2000 (SDSSI+II, ~8 years) followed by a silly number of papers (5800+ peer reviewed, total #citations ~245,000). • Profound impact on the way astronomers were doing research: moving from detailed single object studies to global, highly statistical analyses of millions of objects. • Enabled crowd sourced citizen science projects like the Zooniverse (44 scientific papers, discoveries of “anomalies”, ~150,000 users in Y1) "By some measures, the scientific impact of the SDSS over the past decade is comparable to or exceeds that of the Hubble Space Telescope," says Donald Schneider of Pennsylvania State University. D.G. York, et al., "The Sloan Digital Sky Survey: Technical summary," Astronom. J., 120(3): 1579-87, 2000. 4398 citations http:// labs.adsabs.harvard.edu/ adsabs
discovery... • The first Fast Radio Burst (FRBs) - a short lived, energetic burst of radio emission coming from our neighboring galaxy - was discovered by searching through (reprocessing?) archival radio telescope data. • The Hubble Space Telescope’s MAST and Legacy Archives contain decades of observations freely accessible through a web portal. (fun fact: US researchers can apply for $$ to mine the data archive) • After 21 years of exploration the 10,000th refereed paper was published in 2011. (fun fact: ARI-LJMU, faintest known SN assoc. with a long durat. gamma-ray burst) • In 2012 ESA launched the Hubble’s Hidden Treasures outreach program. Credit: NASA, ESA, and A. Feild (STScI) http://hubblesite.org/newscenter/archive/releases/2011/40/image/a/
my Figshare profile • Accepts most file formats including .fits • Size limits on individual files. • Total data limits depending on account type. http://figshare.com/authors/Arna_Karick/454430
in Nature (they are quite happy to take your money regardless...) • Publishes short Data Description type papers (with a DOI) BUT does not host data - only provides links
caveats & quality ratings • Most general use data repositories can’t handle “big data” (Gb+ datasets) • Data storage will never be free: accessing data seems to drive costs up • The best data archives come with tools for analysis (or formatting) • Data citation can be tricky but it’s not a major problem (tracking DOIs is) • Institutional repositories needs to be jointly developed and flexible in terms of management and they are intended to be used • Apparent rules and regulations. Abiding laws (if they exist) for international and collaborative research is tricky - The who owns the moon? problem. ANDS pushes for licensing, copyright and access statements. Ownership/ transferral/reuse of Figshare material? - copyright? or contract law? Important to understand terms and conditions. Issues