will we know one is useful a priori? 2. How do you know you need a standard? Determining requirements 3. How do you know a standard is working well? How can you evaluate them? 4. How do you know what standard will work best for any given application?
formats – research data types and standards of varying quality and maturity When does it make sense to create a standard? Standards are always good, but when are they worth the effort? If something is really needed by the community, why do we need to create additional incentives? When do standards help vs. hinder research? How can we predict or evaluate the impact of a standard? What are the appropriate metrics? Where are the success stories? How can we determine what domains / types of data will benefits most from standards efforts? Where should we put our resources?
need standards? to enable communication, release and preservation of data • How do you know that you need a standard? I/O bottlenecks: platform specific API + costly licensed tools for data access Disruptive technology with massive but uncontrolled data growth • How do you know a standard is working well? Uptake by Repositories, Software Vendors, Publishers, Open Source efforts Reuse /Extension to other fields than those initially targeted User support requests • How can you evaluate them? ease of use, support, documentation, implementation guides, flexibility, extensibility, maintainability, interoperability with other standards -> promote modularity and factorisation of reusable components • How to determine what standard will work best for any given application? Create and curate a registry of standards (biosharing) Create metrics and evaluation criteria (fitness for purposes tests) Neutral Assessment by review or standardization bodies (NIST? )
Time. W3C HCLS Dataset Description Guideline – “a couple of months tops” - Over 2 years now. – 51 metadata elements identified, 12+ vocabularies surveyed • No single vocabulary to cover all needs. – Weekly 1hr teleconference calls, chairing, scribing, input from dozens of participants, mailing lists, issue tracker, document revisions -> Dedicated project staff to stay on track over 12-24months. • Effort. Bio2RDF – linked data for the life sciences – “it’s low quality” Why? “they’re out of date” – all 30 database conversion scripts updated in yearly update. – Daily/weekly updates to perpetually changing schemas is near impossible. -> Add data interoperability into data sharing plans @micheldumontier::NIH CBS Workshop:25-02-2015 7
common terminologies • Intentions of data collector (e.g. clinical vs research vs advocacy) • Historical differences in usage • Repeatable data characterization • Clinical vs research provenance • Definition of data quality requirements • Domain-specific requirements • Pediatrics - growth-related normalization • Study-specific granularity - need for hierarchical relationships
disconnected standards – Makes it difficult to integrate translational data – Requires bridging • Prospective harmonization efforts – Among bio standards: OBO Foundry – Between clinical standards: SNOMED CT-LOINC; SNOMED CT-ICD-11 • Post hoc mapping efforts (UMLS, BioPortal, GEM, cross-references) • Technical: How to best distribute standards? – RDF/OWL (Linked Open Data) – APIs (integration in software) • Social: Listening to the community – Use cases; feedback from users – Licensing restrictions (e.g., UMLS license agreement)
standards? How will we know one is useful a priori? 2. How do you know you need a standard? Determining requirements 3. How do you know a standard is working well? How can you evaluate them? 4. How do you know what standard will work best for any given application?
acceptance, uptake for an effort which starting very small, and faced an uphill battle • How effective was it in making data more findable? structuring information is distinct from indexing as we have a lot of private users but this is a first step. the interaction with users always results in creating a data management plan and a curation policy as discussion often identify a big gap “findable”: 2 things: syntax + vocabulary curation policies are essential documentation coding patterns in the form of implementation guidelines convincing end users of the value of those patterns for long term • Does it actually aid reusability? making data available is the first step so to that extend, ISA-Tab definitely aids reusability how to assess it? we would need to be able to detect datasets citation -> ongoing work Can we improve? certainly, there is always room for improvement in expressivity, tooling,pattern documentation Initial Q&A: ISA point of view
the first step so to that extend, ISA- Tab definitely aids reusability. • How to assess it ? we would need to be able to detect datasets citation -> ongoing work • Can we improve? certainly, there is always room for improvement in expressivity, tooling, documentation of coding patterns. • Discuss any other aspect of your perspective and experience that you like in your slide. The goal is to highlight the problems and their diversity, and discuss social, technical, and financial solutions to solving them. Technology Geeks versus wet-lab biologists: keep it simple was the winning point for ISA-Tab. Think presentation layer and make it easy for end users. Prospective Data Management vs Retrospective Data Forensics -> changing the habits / the practice Big problem: sustainability of standards development! Most of standard related work in academia regularly faces the axe, which is a major threat to any standardization effort, which requires long-term support to establish authoritative status. Furthermore, the goal is to operate in an open, free to access framework. Some standardization development Organization such ISO make standards specification available at a fee or required user registration for accessing material. This model makes it difficult to ensure diffusion of standards (a single ISO standard document can reach several thousands of USD). Big Question: How to properly support standardization activities – Support for Biosharing effort to establish an umbrella , one stop shop for funding agencies/developments to come together, avoid duplication of efforts and broker development pathways. Initial Q&A: ISA point of view