under served by today’s data preserva4on and access infrastructure • These communi4es will take advantage of evolving data preserva4on and access infrastructure if: • it supports science objec4ves and enables new kinds of science • it is easy to use • collaborators and peers are also using it • Sustainability science is a good test case
(ACR) SEAD Virtual Archive IUScholarwork s UIUC Ideals Packaged object Preserve data Keep private for 5 years Index data, metadata and rela4onships • Collected data about Lower Mississippi flood • Stored in Ac4ve Repository • Organized as a collec4on • Marked “Ready for publica4on” • Collec4ons visible to team only for 5 years • Deposited to repository based on dataset creator affilia4on • Find by author, loca4on, keywords or repository
Couple data and scien4fic discovery life cycles • Reuse of verifiable value-‐ added data and science • Accelerate 4me to new discovery • Data reveals novel dependencies • Couple natural and social science • Data provides a common language • New paradigms for a knowledge society • Support agile ac4ons rooted in verifiable data and knowledge Data Collection Experimen- tation Planning Needs Assessment Data Model Ontology Matching Metadata Harvesting Catalog Semantic Integration Active Curation Processing Repurposing Integration Visualization Analysis Discovery Preservation Archival Access Data Network Hypothesis Problem Analysis Modeling Discussion People Network Veri cation Publication Community adoption Policy Assessment Decision
Access • Couples the data and scien4fic discovery life cycles • Moves cura4on into the scien4fic discovery life cycle through ac4ve cura4on • Supports con4nuous enrichment of data • Reduces costs and burdens associated with ac4ve data management and post-‐project cura4on for researchers • Simplifies release and publica4on of data • Accelerates movement of data from research environments to preserva4on and discovery environments • Builds capacity in exis4ng repositories (people, technology and services SEAD has created a prototype environment that
preserva4on of data • Low-‐barrier, click-‐to-‐publish capability from project repositories • Leveraging sustainable organiza4ons for long term preserva4on • Works with university data storage ini4a4ves • Extends Data Conservancy to operate over mul4ple repositories • Unique CI contribu4ons in • Workflows for metadata transfer, conversion, inference, and packaging • Policy based “matchmaker” determina4on of loca4on of data object during deposit • Data models that expose scien4fic metadata in addi4on to preserva4on metadata for richer discovery • Standards-‐based submission to ins4tu4onal repositories and cloud services • Generates data cita4on and collec4on reference (DOI), which is propagated automa4cally to community network (VIVO) and back to project repository
VIVO SEAD VIRTUAL ARCHIVE IU ScholarWorks UIUC IDEALS Manage Heterogeneous Data Manage Ac4ve Data Ac4ve Cura4on Connect – People – Publica4ons -‐ Data Long-‐Term Preserva4on Data Access and Discovery
Surface Dynamics (NCED) one overarching ques2on: "How will the coupled system of physical, biological, geochemical, and human processes that shape the surface of the Earth respond to changes in climate, land use, environmental management, and other forcings?"
Ongoing Projects – New Projects • Dynamic Movement of People and their Data through NCED • NCED Repository captures some data • No long-‐term preserva4on plan
in ACR • (20 Top-‐level Collec4ons, 454K files, 2.25M objects, 1.6 TB data) • NCED Repository Interface • Support for hierarchy • Support for collec4on annota4on • View/add NCED/domain specific terms • New Large Server with Virtual Machine ACR instances • Ingest tools and procedures • csv2rdf4LOD • Archiving, Cita4on, DOI assignment, … NCED users can (with an account) go from web page to previews and downloads (w/o cart), can add annota>ons, can browse, search by text (any fields and content), tags, etc.
• Large Collec4ons require hierarchical views • Support for hierarchical collec4ons • NCED-‐branded Repository interface • NCED Branding is important to center • Further NCED branding on Data Pages • Significant metadata in separate files and pathnames • Tag/relate descrip4ve files to data • Provide spreadsheet view for metadata • Demonstrate ac4ve cura4on via tagging/annota4ve • Path 2 rdf tool in development via collabora4on • Geospa4al Data in Files • Geospa4al indexing • Filtered Map Overlays • Expose layers via OGC service
access to heterogeneous data collec4ons needed for sustainability science • Supports data management and ac4ve cura4on that improves and adds value to data • Creates a rich discovery environment of data, publica4ons, and exper4se • Ensures long-‐term preserva4on of data with publica4ons through interoperability with trusted repositories
• See all of our demo videos • hpp://bit.ly/1cHhkjw • Check out our Web site • hpp://sead-‐data.net (ACR/Social Network/VirtA) • Contact Us: Robert H. McDonald | [email protected]