Sustainable  Environment  –  Ac4onable  Data   SEAD Ecological  Society  of  America  Conference   August  6,  2013  –  Minneapolis,  MN   Robert  H.  McDonald   Indiana  University  |  @mcdonald  

SEAD  Partners   Margaret  Hedstrom,  PI   Ann  Zimmerman   James  Myers   Beth  Plale   Katy  Börner   Robert  McDonald       Praveen  Kumar              George  Alter  &  Bryan  Beecher    

Data  Challenges  in  Sustainability  Research   •  Many  dimensions,  many  coordinate  systems,  many   scales,  many  formats,  many  providers  and  users,  …  

SEAD  Vision  and  Ra4onale   •  Interdisciplinary  research  teams  are  under  served   by  today’s  data  preserva4on  and  access   infrastructure     •  These  communi4es  will  take  advantage  of   evolving  data  preserva4on  and  access   infrastructure  if:   •  it  supports  science  objec4ves  and  enables  new  kinds   of  science     •  it  is  easy  to  use   •  collaborators  and  peers  are  also  using  it   •  Sustainability  science  is  a  good  test  case    

Making  Data  Sustainable:  Use  Case   Ac2ve  Cura2on   Repository   (ACR)   SEAD  Virtual   Archive   IUScholarwork s   UIUC  Ideals   Packaged object   Preserve  data   Keep  private  for  5  years   Index  data,  metadata   and  rela4onships   • Collected  data  about  Lower   Mississippi  flood   • Stored  in  Ac4ve  Repository   • Organized  as  a  collec4on   • Marked  “Ready  for   publica4on”   • Collec4ons  visible  to  team  only   for  5  years   • Deposited  to  repository  based   on  dataset  creator  affilia4on   • Find  by  author,  loca4on,   keywords  or  repository  

Research     Ques4on   SEARCH  for   People   Publica4ons   Data   Collabora4on   Environment   Discovery  and  Access   Environments     Combine,   Integrate,   Analyze   Preserva4on   Environments   SEAD  Vision   Share   Improve   Curate  Data   Upload/ Download   Data   SEAD  ACR   SEAD  Virtual  Archive   SEAD  Social  Network  

Science  Objec4ve:  Ac4onable  Science   Enable  Novel  Paradigms   •  Couple  data  and  scien4fic   discovery  life  cycles   •  Reuse  of  verifiable  value-­‐ added  data  and  science     •  Accelerate  4me  to  new   discovery   •  Data  reveals  novel   dependencies   •  Couple  natural  and  social   science     •  Data  provides  a  common   language   •  New  paradigms  for  a   knowledge  society     •  Support  agile  ac4ons  rooted   in  verifiable  data  and   knowledge     Data Collection Experimen- tation Planning Needs Assessment Data Model Ontology Matching Metadata Harvesting Catalog Semantic Integration Active Curation Processing Repurposing Integration Visualization Analysis Discovery Preservation Archival Access Data Network Hypothesis Problem Analysis Modeling Discussion People Network Veri cation Publication Community adoption Policy Assessment Decision

Cyberinfrastructure  for  a  New  Paradigm  in   Digital  Preserva4on  and  Access   • Couples  the  data  and  scien4fic  discovery  life  cycles   • Moves  cura4on  into  the  scien4fic  discovery  life   cycle  through  ac4ve  cura4on   • Supports  con4nuous  enrichment  of  data   • Reduces  costs  and  burdens  associated  with  ac4ve   data  management  and  post-­‐project  cura4on  for   researchers   • Simplifies  release  and  publica4on  of  data   • Accelerates  movement  of  data  from  research   environments  to  preserva4on  and  discovery   environments   • Builds  capacity  in  exis4ng  repositories  (people,   technology  and  services   SEAD  has  created  a  prototype  environment  that  

SEAD  Virtual  Archive:  Simplifies  and  Accelerates   release,  publica4on  and  preserva4on  of  data   •  Low-­‐barrier,  click-­‐to-­‐publish    capability  from   project  repositories   •  Leveraging  sustainable  organiza4ons  for  long   term  preserva4on   •  Works  with  university  data  storage   ini4a4ves   •  Extends  Data  Conservancy  to  operate  over   mul4ple  repositories   •  Unique  CI  contribu4ons  in     •  Workflows  for  metadata  transfer,   conversion,  inference,  and  packaging   •  Policy  based  “matchmaker”  determina4on   of  loca4on  of  data  object  during  deposit       •  Data  models  that  expose  scien4fic  metadata  in  addi4on  to  preserva4on  metadata  for   richer  discovery   •  Standards-­‐based  submission  to  ins4tu4onal  repositories  and  cloud  services   •  Generates  data  cita4on  and  collec4on  reference  (DOI),  which  is  propagated  automa4cally   to  community  network  (VIVO)  and  back  to  project  repository  

SEAD  Prototype   Capabili4es   SEAD  ACR     SEAD  VIVO   SEAD  VIRTUAL   ARCHIVE     IU   ScholarWorks   UIUC  IDEALS   Manage  Heterogeneous  Data   Manage  Ac4ve  Data   Ac4ve  Cura4on   Connect  –  People  –  Publica4ons  -­‐  Data   Long-­‐Term  Preserva4on     Data  Access  and  Discovery  

SEAD  Prototype  Community:  NCED   •  Na4onal  Center  for  Earth   Surface  Dynamics  (NCED)   one  overarching  ques2on:   "How  will  the  coupled  system   of  physical,  biological,   geochemical,  and  human   processes  that  shape  the   surface  of  the  Earth  respond   to  changes  in  climate,  land   use,  environmental   management,  and  other   forcings?"    

NCED:  Data  Management  Concerns     •  Completed  Projects  –   Ongoing  Projects  –   New  Projects   •  Dynamic  Movement   of  People  and  their   Data  through  NCED   •   NCED  Repository   captures  some  data   •  No  long-­‐term   preserva4on  plan  

Use  Case  0:  NCED  Collec4on  Access   •  NCED  collec4ons  in  ACR     •  (20  Top-­‐level  Collec4ons,  454K  files,      2.25M  objects,  1.6  TB  data)   •  NCED  Repository  Interface   •  Support  for  hierarchy     •  Support  for  collec4on  annota4on     •  View/add  NCED/domain  specific   terms       •  New  Large  Server  with  Virtual     Machine  ACR  instances   •  Ingest  tools  and  procedures   •  csv2rdf4LOD   •  Archiving,  Cita4on,  DOI  assignment,  …   NCED  users  can  (with  an  account)  go  from  web   page  to  previews  and  downloads  (w/o  cart),  can   add  annota>ons,  can  browse,  search  by  text  (any   fields  and  content),  tags,  etc.  

SEAD  End  to  End  Capabili4es   Demonstra4on  

Challenges  and  Responses   Flow  rates,   materials,   condi4ons     •  Large  Collec4ons  require  hierarchical  views   •  Support  for  hierarchical  collec4ons   •  NCED-­‐branded  Repository  interface   •  NCED  Branding  is  important  to  center   •  Further  NCED  branding  on  Data  Pages   •  Significant  metadata  in  separate  files  and     pathnames   •  Tag/relate  descrip4ve  files  to  data   •  Provide  spreadsheet  view  for  metadata   •  Demonstrate  ac4ve  cura4on  via  tagging/annota4ve   •  Path  2  rdf  tool  in  development  via  collabora4on   •  Geospa4al  Data  in  Files   •  Geospa4al  indexing   •  Filtered  Map  Overlays   •  Expose  layers  via  OGC  service  

 Summary  of  SEAD’s  Contribu4ons     •  Provides  researchers  with  access  to   heterogeneous  data  collec4ons  needed  for   sustainability  science   •  Supports  data  management  and  ac4ve  cura4on   that  improves  and  adds  value  to  data   •  Creates  a  rich  discovery  environment  of  data,   publica4ons,  and  exper4se   •  Ensures  long-­‐term  preserva4on  of  data  with   publica4ons  through  interoperability  with  trusted   repositories  

More  SEAD  Informa4on   •  Follow  us  on  Twiper  @SEADdatanet   •  See  all  of  our  demo  videos   •  hpp://   •  Check  out  our  Web  site   •  hpp://sead-­‐  (ACR/Social  Network/VirtA)     •  Contact  Us:       Robert  H.  McDonald  |  [email protected]