Data Management Challenges in Data Synthesis Projects

Challenges of data management in synthesis projects Summary
of/reﬂec.on on a0ernoon session on Wed 7/5/14 at ACEAS Grand Synthesis Workshop CC-‐BY @atreloar

Caveats •  I am not an ecologist •  I
see most things through a data lens •  And so apologies for what I have noted about presenta.ons this arvo

Data iden8ﬁca8on and acquisi8on: Seagrass •  Challenges •  Lack
of metadata – need corporate knowledge •  Limited data available for open access exchange •  Lack of info about how data was collected •  Iden.fying relevant data sets •  Hard to iden.fy relevant variables in some data for par.cular ques.ons •  GePng data at right spa.al and temporal scale •  Implica.ons of necessary assump.ons •  Data (including layers) constrains spa.al resolu.on •  Opportunity for map improvement •  But where does the improved map end up? (c.f. data synthesis, publica.on)

•  Developing wetland plant database with range of traits
•  Drawing on a number of diﬀerent exis.ng data sets •  Using a range of dispersal models •  Need for further data collec.on and modelling by researchers •  Data acquisi.on challenges •  O0en sourced through personal contacts •  Popula.ng the database with the right traits Data iden8ﬁca8on and acquisi8on: Aqua8c

Data colla8on and blending: Animal telemetry •  OzTrack pla^orm provides
a loca.on to bring together tracking data across disciplines •  Analysis tools are the carrot to a_ract the data •  Obliga.on to make data available (because you may have degraded study animals QoL) •  Sourced datasets through TERN DDP ("It's awesome!") •  Challenges •  Reuse hard because original studies determine tag set up •  Raw data on its own not enough – need rich context from data custodians/ collectors •  Who owns the data?

Data colla8on and blending: Northern Quoll •  Challenges • 
Data mismatches between availability and study ques.on (burned patches, rockiness) •  Studies set up for diﬀerent purposes, and hence produce diﬀerent data

Data analysis and synthesis •  Challenges – endemic gene.cs
•  Lack of adequate metadata (stuﬀ just missing – DNA, loca.on) •  Inadequate response from authors •  Need for format conversion •  Challenges – phenology monitoring •  Need be_er data => protocols and standards for data capture •  Tools for managing and sharing 1000s of images •  No global standards for phenocams •  Challenges – drought induced mortality •  Data is o0en biased, incomplete and patchy (but it's all we've got some.mes)

Data publica8on and visualisa8on •  Challenges – aerobiology • 
Different data capture technologies influence data collected •  Could only use 11 of the 17 possible data sets •  GePng the data online delayed publica.on of first paper •  Reluctance to release primary data (priority, errors/quality, journal policies) •  Ignorance of data value (commercial exploita.on, value adding by others) •  Challenges – indigenous knowledge •  Interac.on between cultural landscape scales and cultural infrastructure

Overall issues •  Fitness for purpose vs. It's all we
have •  When synthesising, may be constrained by lowest quality data set •  E.g. spa.al resolu.on for seagrass, existence of presence/absence only •  Need to capture context in metadata (seagrass, telemetry, endemics) •  Mo.vators for data exchange/availability •  Answer new ques.ons through more data •  Use tools that are made available as carrot •  Data gets collected but doesn't always get published •  Some data owners are reluctant to share for understandable human issues

Overall issues •  Hard to ﬁnd data (if cited in
paywall journals) •  Role here for DDP, Research Data Australia •  Data quality (or purpose) mismatch •  Non-‐interoperable data •  Academic ethos •  Hierarchical structure incompa.ble with data sharing •  Academia selects for possessiveness •  Underfunding => overcontribu.on => protec.veness

Possible ac8ons •  An.cipate Reuse: get groups who collect poten.ally
combinable data to agree on minimum elements they will collect that will make datasets more reusable/recombinable •  More is More: concentrate on large long-‐term ﬁeld projects with standardised instruments and data products •  Research Locally, Coordinate Globally: Research Data Alliance (rd-‐ alliance.org) provides loca.on for working groups to reduce barriers to data exchange •  Bribe, don't Bully: Provide tools with a_rac.ve func.onality where data sharing is easier (than what they do now) •  Change the Norms: Discussion within discipline around data-‐sharing norms

Thank you for the opportunity to come and listen
@atreloar [email protected] andrew.treloar.net

Data Management Challenges in Data Synthesis Pr...

Data Management Challenges in Data Synthesis Projects

atreloar

More Decks by atreloar

Other Decks in Science

Featured

Transcript

Challenges of data management in synthesis projects Summary

Caveats •  I am not an ecologist •  I

Data iden8ﬁca8on and acquisi8on: Seagrass •  Challenges •  Lack

•  Developing wetland plant database with range of traits

Data colla8on and blending: Animal telemetry •  OzTrack pla^orm provides

Data colla8on and blending: Northern Quoll •  Challenges •

Data analysis and synthesis •  Challenges – endemic gene.cs

Data publica8on and visualisa8on •  Challenges – aerobiology •

Overall issues •  Fitness for purpose vs. It's all we

Overall issues •  Hard to ﬁnd data (if cited in

Possible ac8ons •  An.cipate Reuse: get groups who collect poten.ally

Thank you for the opportunity to come and listen