Information schemas & datasets discovery: crossroads for cross-domain applications

Information schemas & datasets discovery crossroads for cross-domain applications Giannis
Tsakonas Library & Information Center, University of Patras, Greece SIMDAS Colloquium June 26, 2018, Nicosia, Cyprus

Context • Three, superficially, distinct domains. • Life Sciences •
Climate Change • Cultural Heritage "Yet that big-and-slow data set offers clues to understanding oceanic “dead zones” and factors contributing to climate change. The distribution of microorganisms’ shells can signal changes in ocean currents and species migration, and the presence of particular oxygen isotopes can reveal the rate at which carbon is reaching the ocean floor or how much water is locked up in land-based ice sheets at a given time." [Mattern, 2017]

Why access datasets? • Access to datasets is required to
transit to a state of Open Science. • We assume that datasets will be reused by researchers, in or out of service environments for many purposes. • We hope that datasets will be used by citizens and will educate them in scientific and reasonable thought. • We aspire that datasets will stir innovation and entrepreneurship.

The problem(s) • All three areas collect, organize and provide
datasets • We could define datasets as sets of recorded information (observations, experiments, measurements, markings, annotations) • These data have been used in certain stages of scientific processes. • The scientific process is not just data (theories, hypothesis, findings, interpretations, etc). • The first problem is the concreteness of information we are managing. • The second problem is the heterogeneity of data itself (type, versions, sizes, etc.) • The third problem is the organization of information.

Challenges • Complex integration challenges; within and beyond a domain.
• The challenge is to have information schemas that can balance between effectiveness of documentation and efficiency of ‘work’. • Lighter profiles of schemas exist to address the issue of complexity, see “CDWA Lite records are intended for contribution to union catalogues and other repositories using the Open Archives Initiative (OAI) harvesting protocol” • Clear containers for the physical and the digital asset.

Integration through interoperability • Semantic interoperability: agreement of terms, e.g.
the common understanding of the meanings of the concepts used. • Syntactic interoperability: agreement of structures, e.g. the understanding of the way a record is build.

Schemas of information • Vocabularies • Integration by terminological equivalence,
between labels, between meanings, between languages, etc. • Metadata schemas • Integration by mapping metadata schemas, e.g. crosswalks between fields • Ontologies • Integration by ontology mediation

Definition of metadata • Simple: Information about resources in a
structured and organized format • Object centered information schema • In the semantic web world, metadata classes are linked by their properties.

What metadata do? • Describe: descriptive information • Document: provenance,
preservation, status information • Link: structural information

Definition of ontologies Simple: provide the representational machinery with which
to instantiate domain models in knowledge bases, make queries to knowledge-based services, and represent the results of calling such services [Gruber 2007]

What ontologies do? • Ontologies are formal models that help
us: • Understand a domain of knowledge   (what, where, when, how…) • Structure a knowledge base to collate different instances   (records of actors, events, places, topics…) • Infer a logical development   (what has happened, what comes next…)

Categories of ontologies • Domain Ontologies: represent knowledge of a
domain or a discourse. • Metadata Ontologies: represent the semantics of vocabularies for the description of domain information. • Generic/Common Sense Ontologies: represent information based on common sense concepts, such as time, space, events, etc. • Representational Ontologies: represent concepts of high abstraction • Task Ontologies: represent processes and methods

Ontologies as Knowledge Structures • As conceptual constructs, define the
semantics of information in a coherent way and facilitate its processing • An approach that: • Reflects the structure of a domain. • Highlights relationships. • Supports information conversion

Ontologies as Information Access Tools • Define the semantics of
the real world and facilitate its connection to machine accessible content, based on a commonly agreed terminology. • An approach that: • Links the conceptual to the physical world. • Emphasizes on the terminological view, e.g. vocabularies and their agreement. • Supports information discovery

Crossroads for cross-domain problems • Cross-Domain infrastructures face the problem
of harmonization/integration • A decision to be made is whether one should construct a new ontology/schema or use existing ones. • Aim for a balance between representation and efficiency of the new schema. • Answering the why and for whom will substantially define the scope and the choice.

Problems • Conceptual issues between ontologies/schemas • Inexistent concepts •
Overlapping/Mismatched concepts • Terminological issues • Different terms for the same concept • Different concepts for the same term • Inadequacy of terms • Architecture issues • Event centered approaches • Object centered approaches

Ontology mediation • Alignment: the correspondence between the elements of
two or more ontologies. • Schema based alignment • Instance based alignment • Legacy ontologies remain intact; correspondences work on a middle layer. • Merging: the unification of two or more ontologies to create a new one. • All elements of legacy ontologies should be represented in the new one.

Metadata crosswalks • Correspondences of metadata fields between two or
more standards. • The case of CARARE: • POLIS DTD, MIDAS, EDM, LIDO

Datasets discovery - what to model? • Resources • Domain
agnostic approaches • Serving: Discovery • Data Catalog Vocabulary DCAT • a RDF vocabulary to facilitate interoperability between data catalogs published on the Web. • Processes • Domain-governed approaches • Serving: Documentation • Ontology for Biomedical Investigations • Scientific Observation Model • Based on upper ontologies, such as Basic Formal Ontology and CIDOC CRM.

Issues to consider • Ontologies are intentionally designed structures, i.e.
biased. One should avoid to impose his intentionality on other structures. New cross-domain models should be neutral. • Skills are required during the construction of a ‘meta-semantic’ modelling. Ontologies are already complex semantic constructs.

Issues to consider • Conceptual exercises • Identification of correspondences
• Representation of correspondences • Technical compatibility • Common description languages, e.g. OWL • Other agreements, e.g. standardized values for time, space, etc. • Legal compatibility • License agreement

Conclusions • The issue of discovery is different than the
issue of documentation. • Discovery requires compromise. Information will condense. • Documentation can continue in legacy schemas.

Thank you for your attention.

Information schemas & datasets discovery: cross...

Information schemas & datasets discovery: crossroads for cross-domain applications

Giannis Tsakonas

More Decks by Giannis Tsakonas

Other Decks in Education

Featured

Transcript

Information schemas & datasets discovery crossroads for cross-domain applications Giannis

Context • Three, superficially, distinct domains. • Life Sciences •

Why access datasets? • Access to datasets is required to

The problem(s) • All three areas collect, organize and provide

Challenges • Complex integration challenges; within and beyond a domain.

Integration through interoperability • Semantic interoperability: agreement of terms, e.g.

Schemas of information • Vocabularies • Integration by terminological equivalence,

Definition of metadata • Simple: Information about resources in a

What metadata do? • Describe: descriptive information • Document: provenance,

Definition of ontologies Simple: provide the representational machinery with which

What ontologies do? • Ontologies are formal models that help

Categories of ontologies • Domain Ontologies: represent knowledge of a

Ontologies as Knowledge Structures • As conceptual constructs, define the

Ontologies as Information Access Tools • Define the semantics of

Crossroads for cross-domain problems • Cross-Domain infrastructures face the problem

Problems • Conceptual issues between ontologies/schemas • Inexistent concepts •

Ontology mediation • Alignment: the correspondence between the elements of

Metadata crosswalks • Correspondences of metadata fields between two or

Datasets discovery - what to model? • Resources • Domain

Issues to consider • Ontologies are intentionally designed structures, i.e.

Issues to consider • Conceptual exercises • Identification of correspondences

Conclusions • The issue of discovery is different than the

Thank you for your attention.