Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Data Management Challenges in Data Synthesis Pr...

Avatar for atreloar atreloar
May 07, 2014

Data Management Challenges in Data Synthesis Projects

Summary and reflection for afternoon session, Wed 7/5/14, ACEAS Grand Synthesis Workshop

Avatar for atreloar

atreloar

May 07, 2014
Tweet

More Decks by atreloar

Other Decks in Science

Transcript

  1. Challenges  of  data   management  in  synthesis   projects Summary

     of/reflec.on  on  a0ernoon  session  on  Wed  7/5/14  at  ACEAS   Grand  Synthesis  Workshop   CC-­‐BY  @atreloar  
  2. Caveats •  I  am  not  an  ecologist   •  I

     see  most  things  through  a  data  lens   •  And  so  apologies  for  what  I  have  noted  about  presenta.ons  this  arvo  
  3. Data  iden8fica8on  and  acquisi8on:  Seagrass •  Challenges   •  Lack

     of  metadata  –  need  corporate  knowledge   •  Limited  data  available  for  open  access  exchange   •  Lack  of  info  about  how  data  was  collected   •  Iden.fying  relevant  data  sets   •  Hard  to  iden.fy  relevant  variables  in  some  data  for  par.cular  ques.ons   •  GePng  data  at  right  spa.al  and  temporal  scale   •  Implica.ons  of  necessary  assump.ons   •  Data  (including  layers)  constrains  spa.al  resolu.on   •  Opportunity  for  map  improvement   •  But  where  does  the  improved  map  end  up?  (c.f.  data  synthesis,  publica.on)  
  4. •  Developing  wetland  plant  database  with  range  of  traits  

    •  Drawing  on  a  number  of  different  exis.ng  data  sets   •  Using  a  range  of  dispersal  models   •  Need  for  further  data  collec.on  and  modelling  by  researchers   •  Data  acquisi.on  challenges   •  O0en  sourced  through  personal  contacts   •  Popula.ng  the  database  with  the  right  traits   Data  iden8fica8on  and  acquisi8on:  Aqua8c
  5. Data  colla8on  and  blending:  Animal  telemetry •  OzTrack  pla^orm  provides

     a  loca.on  to  bring  together  tracking  data   across  disciplines   •  Analysis  tools  are  the  carrot  to  a_ract  the  data   •  Obliga.on  to  make  data  available  (because  you  may  have  degraded   study  animals  QoL)   •  Sourced  datasets  through  TERN  DDP  ("It's  awesome!")   •  Challenges   •  Reuse  hard  because  original  studies  determine  tag  set  up   •  Raw  data  on  its  own  not  enough  –  need  rich  context  from  data  custodians/ collectors   •  Who  owns  the  data?  
  6. Data  colla8on  and  blending:  Northern  Quoll •  Challenges   • 

    Data  mismatches  between  availability  and  study  ques.on  (burned  patches,   rockiness)   •  Studies  set  up  for  different  purposes,  and  hence  produce  different  data  
  7. Data  analysis  and  synthesis •  Challenges  –  endemic  gene.cs  

    •  Lack  of  adequate  metadata  (stuff  just  missing  –  DNA,  loca.on)   •  Inadequate  response  from  authors   •  Need  for  format  conversion   •  Challenges  –  phenology  monitoring   •  Need  be_er  data  =>  protocols  and  standards  for  data  capture   •  Tools  for  managing  and  sharing  1000s  of  images   •  No  global  standards  for  phenocams   •  Challenges  –  drought  induced  mortality   •  Data  is  o0en  biased,  incomplete  and  patchy  (but  it's  all  we've  got  some.mes)  
  8. Data  publica8on  and  visualisa8on •  Challenges  –  aerobiology   • 

    Different  data  capture  technologies    influence  data  collected   •  Could  only  use  11  of  the  17  possible  data  sets   •  GePng  the  data  online  delayed  publica.on  of  first  paper   •  Reluctance  to  release  primary  data  (priority,  errors/quality,  journal  policies)   •  Ignorance  of  data  value  (commercial  exploita.on,  value  adding  by  others)   •  Challenges  –  indigenous  knowledge   •  Interac.on  between  cultural  landscape  scales  and  cultural  infrastructure  
  9. Overall  issues •  Fitness  for  purpose  vs.  It's  all  we

     have   •  When  synthesising,  may  be  constrained  by  lowest  quality  data  set   •  E.g.  spa.al  resolu.on  for  seagrass,  existence  of  presence/absence  only   •  Need  to  capture  context  in  metadata  (seagrass,  telemetry,  endemics)   •  Mo.vators  for  data  exchange/availability   •  Answer  new  ques.ons  through  more  data   •  Use  tools  that  are  made  available  as  carrot   •  Data  gets  collected  but  doesn't  always  get  published   •  Some  data  owners  are  reluctant  to  share  for  understandable  human   issues  
  10. Overall  issues •  Hard  to  find  data  (if  cited  in

     paywall  journals)   •  Role  here  for  DDP,  Research  Data  Australia   •  Data  quality  (or  purpose)  mismatch   •  Non-­‐interoperable  data   •  Academic  ethos   •  Hierarchical  structure  incompa.ble  with  data  sharing   •  Academia  selects  for  possessiveness   •  Underfunding  =>  overcontribu.on  =>  protec.veness  
  11. Possible  ac8ons •  An.cipate  Reuse:  get  groups  who  collect  poten.ally

     combinable  data   to  agree  on  minimum  elements  they  will  collect  that  will  make   datasets  more  reusable/recombinable   •  More  is  More:  concentrate  on  large  long-­‐term  field  projects  with   standardised  instruments  and  data  products   •  Research  Locally,  Coordinate  Globally:  Research  Data  Alliance  (rd-­‐ alliance.org)  provides  loca.on  for  working  groups  to  reduce  barriers   to  data  exchange   •  Bribe,  don't  Bully:  Provide  tools  with  a_rac.ve  func.onality  where   data  sharing  is  easier  (than  what  they  do  now)   •  Change  the  Norms:  Discussion  within  discipline  around  data-­‐sharing   norms