Data Management Challenges in Data Synthesis Projects

Data Management Challenges in Data Synthesis Projects

Summary and reflection for afternoon session, Wed 7/5/14, ACEAS Grand Synthesis Workshop

A36f8911e5346f2475c0b7029c64c340?s=128

atreloar

May 07, 2014
Tweet

Transcript

  1. Challenges  of  data   management  in  synthesis   projects Summary

     of/reflec.on  on  a0ernoon  session  on  Wed  7/5/14  at  ACEAS   Grand  Synthesis  Workshop   CC-­‐BY  @atreloar  
  2. Caveats •  I  am  not  an  ecologist   •  I

     see  most  things  through  a  data  lens   •  And  so  apologies  for  what  I  have  noted  about  presenta.ons  this  arvo  
  3. Data  iden8fica8on  and  acquisi8on:  Seagrass •  Challenges   •  Lack

     of  metadata  –  need  corporate  knowledge   •  Limited  data  available  for  open  access  exchange   •  Lack  of  info  about  how  data  was  collected   •  Iden.fying  relevant  data  sets   •  Hard  to  iden.fy  relevant  variables  in  some  data  for  par.cular  ques.ons   •  GePng  data  at  right  spa.al  and  temporal  scale   •  Implica.ons  of  necessary  assump.ons   •  Data  (including  layers)  constrains  spa.al  resolu.on   •  Opportunity  for  map  improvement   •  But  where  does  the  improved  map  end  up?  (c.f.  data  synthesis,  publica.on)  
  4. •  Developing  wetland  plant  database  with  range  of  traits  

    •  Drawing  on  a  number  of  different  exis.ng  data  sets   •  Using  a  range  of  dispersal  models   •  Need  for  further  data  collec.on  and  modelling  by  researchers   •  Data  acquisi.on  challenges   •  O0en  sourced  through  personal  contacts   •  Popula.ng  the  database  with  the  right  traits   Data  iden8fica8on  and  acquisi8on:  Aqua8c
  5. Data  colla8on  and  blending:  Animal  telemetry •  OzTrack  pla^orm  provides

     a  loca.on  to  bring  together  tracking  data   across  disciplines   •  Analysis  tools  are  the  carrot  to  a_ract  the  data   •  Obliga.on  to  make  data  available  (because  you  may  have  degraded   study  animals  QoL)   •  Sourced  datasets  through  TERN  DDP  ("It's  awesome!")   •  Challenges   •  Reuse  hard  because  original  studies  determine  tag  set  up   •  Raw  data  on  its  own  not  enough  –  need  rich  context  from  data  custodians/ collectors   •  Who  owns  the  data?  
  6. Data  colla8on  and  blending:  Northern  Quoll •  Challenges   • 

    Data  mismatches  between  availability  and  study  ques.on  (burned  patches,   rockiness)   •  Studies  set  up  for  different  purposes,  and  hence  produce  different  data  
  7. Data  analysis  and  synthesis •  Challenges  –  endemic  gene.cs  

    •  Lack  of  adequate  metadata  (stuff  just  missing  –  DNA,  loca.on)   •  Inadequate  response  from  authors   •  Need  for  format  conversion   •  Challenges  –  phenology  monitoring   •  Need  be_er  data  =>  protocols  and  standards  for  data  capture   •  Tools  for  managing  and  sharing  1000s  of  images   •  No  global  standards  for  phenocams   •  Challenges  –  drought  induced  mortality   •  Data  is  o0en  biased,  incomplete  and  patchy  (but  it's  all  we've  got  some.mes)  
  8. Data  publica8on  and  visualisa8on •  Challenges  –  aerobiology   • 

    Different  data  capture  technologies    influence  data  collected   •  Could  only  use  11  of  the  17  possible  data  sets   •  GePng  the  data  online  delayed  publica.on  of  first  paper   •  Reluctance  to  release  primary  data  (priority,  errors/quality,  journal  policies)   •  Ignorance  of  data  value  (commercial  exploita.on,  value  adding  by  others)   •  Challenges  –  indigenous  knowledge   •  Interac.on  between  cultural  landscape  scales  and  cultural  infrastructure  
  9. Overall  issues •  Fitness  for  purpose  vs.  It's  all  we

     have   •  When  synthesising,  may  be  constrained  by  lowest  quality  data  set   •  E.g.  spa.al  resolu.on  for  seagrass,  existence  of  presence/absence  only   •  Need  to  capture  context  in  metadata  (seagrass,  telemetry,  endemics)   •  Mo.vators  for  data  exchange/availability   •  Answer  new  ques.ons  through  more  data   •  Use  tools  that  are  made  available  as  carrot   •  Data  gets  collected  but  doesn't  always  get  published   •  Some  data  owners  are  reluctant  to  share  for  understandable  human   issues  
  10. Overall  issues •  Hard  to  find  data  (if  cited  in

     paywall  journals)   •  Role  here  for  DDP,  Research  Data  Australia   •  Data  quality  (or  purpose)  mismatch   •  Non-­‐interoperable  data   •  Academic  ethos   •  Hierarchical  structure  incompa.ble  with  data  sharing   •  Academia  selects  for  possessiveness   •  Underfunding  =>  overcontribu.on  =>  protec.veness  
  11. Possible  ac8ons •  An.cipate  Reuse:  get  groups  who  collect  poten.ally

     combinable  data   to  agree  on  minimum  elements  they  will  collect  that  will  make   datasets  more  reusable/recombinable   •  More  is  More:  concentrate  on  large  long-­‐term  field  projects  with   standardised  instruments  and  data  products   •  Research  Locally,  Coordinate  Globally:  Research  Data  Alliance  (rd-­‐ alliance.org)  provides  loca.on  for  working  groups  to  reduce  barriers   to  data  exchange   •  Bribe,  don't  Bully:  Provide  tools  with  a_rac.ve  func.onality  where   data  sharing  is  easier  (than  what  they  do  now)   •  Change  the  Norms:  Discussion  within  discipline  around  data-­‐sharing   norms  
  12. Thank  you  for  the  opportunity  to  come  and   listen

    @atreloar     andrew.treloar@ands.org.au     andrew.treloar.net