The Logical Diversity of Explanations in OWL Ontologies, Ulm, May 2013

Samantha Bail, [email protected] Information Management Group The University of Manchester,
UK The Logical Diversity of Explanations in OWL Ontologies Ulm, 29/05/2013

Samantha Bail The Logical Diversity of Explanations in OWL Ontologies
[email protected] Background •An OWL (2) ontology is a set of axioms 2 O SROIQ

[email protected] Background •An entailment is an axiom that follows logically from 3 O η

[email protected] Background •Understanding why an entailment holds can be hard... 4 ?

[email protected] Background •... but may be necessary for removing (“repairing”) it •We need to remove the reasons why the entailment holds 5 ?

[email protected] Background •A justiﬁcation is a minimal subset of that entails 6 η O

[email protected] Background •There can be multiple justiﬁcations for an entailment 7

How can we understand (and repair) all of them?

Samantha Bail [email protected] The Logical Diversity of Explanations in OWL
Ontologies Some kind of similarity... 10 Fi, hasTopping to hasTop, InterestingPizza to IP, and is the description logic notation for the top concept Thing which stands for ‘any element in the domain’. Example 1.2. (1) Fi NamedPizza Fi NamedPizza (2) NamedPizza Pizza NamedPizza Pizza domain(hasTop, Pizza) (3) Fi ∃hasTop.Tomato Fi ∃hasTop.Tomato Fi ∃hasTop.Tomato (4) Fi ∃hasTop.Olive Fi ∃hasTop.Mozzarella Fi ∃hasTop.Garlic (5) Fi ∃hasTop.Spinach Fi ∃hasTop.Olive Fi ∃hasTop.Spinach (6) Spinach ¬Tomato Mozzarella ¬Tomato Spinach ¬Tomato (7) Spinach ¬Olive Mozzarella ¬Olive Spinach ¬Garlic (8) Olive ¬Tomato Olive ¬Tomato Garlic ¬Tomato (9) IP ≡ Pizza ≥ 3hasTop. IP ≡ Pizza ≥ 3hasTop. IP ≡ Pizza ≥ 3hasTop. While these justifications may look complicated at first, they can all be easily summarised as follows: • Axioms 1-2 (2-3 in the last justification): Fiorentina is a Pizza. • Axioms 3-5: Fiorentina has three kinds of toppings. • Axioms 6-8: These three toppings are pairwise disjoint, that is, no element in the domain can be a member of several toppings at the same time.

Ontologies Some kind of similarity... 11 Fi, hasTopping to hasTop, InterestingPizza to IP, and is the description logic notation for the top concept Thing which stands for ‘any element in the domain’. Example 1.2. (1) Fi NamedPizza Fi NamedPizza (2) NamedPizza Pizza NamedPizza Pizza domain(hasTop, Pizza) (3) Fi ∃hasTop.Tomato Fi ∃hasTop.Tomato Fi ∃hasTop.Tomato (4) Fi ∃hasTop.Olive Fi ∃hasTop.Mozzarella Fi ∃hasTop.Garlic (5) Fi ∃hasTop.Spinach Fi ∃hasTop.Olive Fi ∃hasTop.Spinach (6) Spinach ¬Tomato Mozzarella ¬Tomato Spinach ¬Tomato (7) Spinach ¬Olive Mozzarella ¬Olive Spinach ¬Garlic (8) Olive ¬Tomato Olive ¬Tomato Garlic ¬Tomato (9) IP ≡ Pizza ≥ 3hasTop. IP ≡ Pizza ≥ 3hasTop. IP ≡ Pizza ≥ 3hasTop. While these justifications may look complicated at first, they can all be easily summarised as follows: • Axioms 1-2 (2-3 in the last justification): Fiorentina is a Pizza. • Axioms 3-5: Fiorentina has three kinds of toppings. • Axioms 6-8: These three toppings are pairwise disjoint, that is, no element in the domain can be a member of several toppings at the same time. •1-2 (2-3 in last justification): Fiorentina is a Pizza

Ontologies Some kind of similarity... 12 Fi, hasTopping to hasTop, InterestingPizza to IP, and is the description logic notation for the top concept Thing which stands for ‘any element in the domain’. Example 1.2. (1) Fi NamedPizza Fi NamedPizza (2) NamedPizza Pizza NamedPizza Pizza domain(hasTop, Pizza) (3) Fi ∃hasTop.Tomato Fi ∃hasTop.Tomato Fi ∃hasTop.Tomato (4) Fi ∃hasTop.Olive Fi ∃hasTop.Mozzarella Fi ∃hasTop.Garlic (5) Fi ∃hasTop.Spinach Fi ∃hasTop.Olive Fi ∃hasTop.Spinach (6) Spinach ¬Tomato Mozzarella ¬Tomato Spinach ¬Tomato (7) Spinach ¬Olive Mozzarella ¬Olive Spinach ¬Garlic (8) Olive ¬Tomato Olive ¬Tomato Garlic ¬Tomato (9) IP ≡ Pizza ≥ 3hasTop. IP ≡ Pizza ≥ 3hasTop. IP ≡ Pizza ≥ 3hasTop. While these justifications may look complicated at first, they can all be easily summarised as follows: • Axioms 1-2 (2-3 in the last justification): Fiorentina is a Pizza. • Axioms 3-5: Fiorentina has three kinds of toppings. • Axioms 6-8: These three toppings are pairwise disjoint, that is, no element in the domain can be a member of several toppings at the same time. •1-2 (2-3 in last justification): Fiorentina is a Pizza •3-5: Fiorentina has 3 kinds of toppings

Ontologies Some kind of similarity... 13 Fi, hasTopping to hasTop, InterestingPizza to IP, and is the description logic notation for the top concept Thing which stands for ‘any element in the domain’. Example 1.2. (1) Fi NamedPizza Fi NamedPizza (2) NamedPizza Pizza NamedPizza Pizza domain(hasTop, Pizza) (3) Fi ∃hasTop.Tomato Fi ∃hasTop.Tomato Fi ∃hasTop.Tomato (4) Fi ∃hasTop.Olive Fi ∃hasTop.Mozzarella Fi ∃hasTop.Garlic (5) Fi ∃hasTop.Spinach Fi ∃hasTop.Olive Fi ∃hasTop.Spinach (6) Spinach ¬Tomato Mozzarella ¬Tomato Spinach ¬Tomato (7) Spinach ¬Olive Mozzarella ¬Olive Spinach ¬Garlic (8) Olive ¬Tomato Olive ¬Tomato Garlic ¬Tomato (9) IP ≡ Pizza ≥ 3hasTop. IP ≡ Pizza ≥ 3hasTop. IP ≡ Pizza ≥ 3hasTop. While these justifications may look complicated at first, they can all be easily summarised as follows: • Axioms 1-2 (2-3 in the last justification): Fiorentina is a Pizza. • Axioms 3-5: Fiorentina has three kinds of toppings. • Axioms 6-8: These three toppings are pairwise disjoint, that is, no element in the domain can be a member of several toppings at the same time. •1-2 (2-3 in last justification): Fiorentina is a Pizza •3-5: Fiorentina has 3 kinds of toppings •6-8: The toppings are pairwise disjoint

Ontologies Some kind of similarity... 14 Fi, hasTopping to hasTop, InterestingPizza to IP, and is the description logic notation for the top concept Thing which stands for ‘any element in the domain’. Example 1.2. (1) Fi NamedPizza Fi NamedPizza (2) NamedPizza Pizza NamedPizza Pizza domain(hasTop, Pizza) (3) Fi ∃hasTop.Tomato Fi ∃hasTop.Tomato Fi ∃hasTop.Tomato (4) Fi ∃hasTop.Olive Fi ∃hasTop.Mozzarella Fi ∃hasTop.Garlic (5) Fi ∃hasTop.Spinach Fi ∃hasTop.Olive Fi ∃hasTop.Spinach (6) Spinach ¬Tomato Mozzarella ¬Tomato Spinach ¬Tomato (7) Spinach ¬Olive Mozzarella ¬Olive Spinach ¬Garlic (8) Olive ¬Tomato Olive ¬Tomato Garlic ¬Tomato (9) IP ≡ Pizza ≥ 3hasTop. IP ≡ Pizza ≥ 3hasTop. IP ≡ Pizza ≥ 3hasTop. While these justifications may look complicated at first, they can all be easily summarised as follows: • Axioms 1-2 (2-3 in the last justification): Fiorentina is a Pizza. • Axioms 3-5: Fiorentina has three kinds of toppings. • Axioms 6-8: These three toppings are pairwise disjoint, that is, no element in the domain can be a member of several toppings at the same time. •1-2 (2-3 in last justification): Fiorentina is a Pizza •3-5: Fiorentina has 3 kinds of toppings •6-8: The toppings are pairwise disjoint •9: Anything that is a Pizza and has at least 3 toppings is an InterestingPizza

[email protected] Capturing similarity •Understanding similarity can help understanding multiple justifications •... and potentially reduce effort for repairing them •Given two justifications J1 and J2, how do we determine whether they are “similar”? ‣ We want to group justifications based on similarity ‣ Define an equivalence relation •This lets us determine the actual logical diversity of OWL justifications ‣ Large groups = little diversity 15

[email protected] Structural equivalence •Structural equivalence [1] of OWL axioms is well deﬁned: •We have an equivalence relation! •... but only a boring one 16 1) InterestingPizza ≡ Pizza 3 hasTopping 2) 3 hasTopping Pizza ≡ InterestingPizza [1] http://www.w3.org/TR/owl2-syntax/ Example

[email protected] ‘Strict’ isomorphism 17 •Isomorphism [2] between justifications is well defined •It describes an equivalence relation [2] Matthew Horridge, Samantha Bail, Bijan Parsia, and Ulrike Sattler. The cognitive complexity of OWL justifications. In Proceedings of ISWC-11, 2011. J1 = {A B ∃r.C, B ∃r.C D} |= A D J2 = {E B ∃s.F, B ∃s.F D} |= E D φ = {A → E, C → F, r → s} Example

[email protected] Subexpression-isomorphism •Covers justiﬁcations that have different subexpressions •Semantics of subexpressions are not relevant ‣ Subexpressions are used like propositional variables 18 Example J1 = {A B C, B C D} |= A D J2 = {A ∃r.C, ∃r.C D} |= A D

[email protected] Subexpression-isomorphism •Covers justiﬁcations that have different subexpressions •Semantics of subexpressions are not relevant ‣ Subexpressions are used like propositional variables 19 Example J1 = {A B C, B C D} |= A D J2 = {A ∃r.C, ∃r.C D} |= A D X1

[email protected] Subexpression-isomorphism •Covers justiﬁcations that have different subexpressions •Semantics of subexpressions are not relevant ‣ Subexpressions are used like propositional variables 20 Example J1 = {A B C, B C D} |= A D J2 = {A ∃r.C, ∃r.C D} |= A D X1 X1

[email protected] Subexpression-isomorphism •Covers justiﬁcations that have different subexpressions •Semantics of subexpressions are not relevant ‣ Subexpressions are used like propositional variables 21 Example J1 = {A B C, B C D} |= A D J2 = {A ∃r.C, ∃r.C D} |= A D X1 X1 X2

[email protected] Subexpression-isomorphism •Covers justiﬁcations that have different subexpressions •Semantics of subexpressions are not relevant ‣ Subexpressions are used like propositional variables 22 Example J1 = {A B C, B C D} |= A D J2 = {A ∃r.C, ∃r.C D} |= A D X1 X1 X2 X2

[email protected] Lemma-isomorphism 23 •Covers justifications of different expressions and different size •Substitute justification subsets with lemmas [3] [3] Matthew Horridge, Bijan Parsia, and Ulrike Sattler. Lemmas for justifications in OWL. In Proceedings of DL-09, 2009. Example J1 = {A B, B C} |= A C J2 = {A B, B C, C D} |= A D

[email protected] Lemma-isomorphism 24 •Covers justifications of different expressions and different size •Substitute justification subsets with lemmas [3] [3] Matthew Horridge, Bijan Parsia, and Ulrike Sattler. Lemmas for justifications in OWL. In Proceedings of DL-09, 2009. Example J1 = {A B, B C} |= A C J2 = {A B, B C, C D} |= A D ...

[email protected] Restricting lemmatisations 25 •Subexpression-isomorphism is transitive •Lemma-isomorphism with arbitrary lemmas •is non-transitive and •lemmatisations might be non-obvious. •We have to restrict lemmatisations to ‣ obvious lemmatisations ‣ summarising lemmas + maximal atomic subsumption chains ‣ ... and even that may not sufﬁce. ‣ Work in progress!

[email protected] Survey: test corpus •78 OWL and OBO ontologies from BioPortal •Entailments: indirect atomic subsumptions •Computed 141,560 justifications in total 26 Table 1: Basic ontology metrics in the corpus. Mean Median Min Max Classes 1,887 34 4 38,640 Object properties 38 24 0 431 Data properties 9 2 0 178 Individuals 56 0 0 1,697 Logical axioms 3,871 931 5 68,777 Justifications 1,828 756 1 25,765 axiom in a justification is parsed into its parse tree where gen wh cat Su S div tai pe ide iso ph Th

[email protected] Survey: results across all ontologies •Large reductions across the corpus of all justifications ‣ from 141,560 justifications to 5,487 templates ‣ avg. 25.8 structurally similar justifications per template 27 . an n- se ge ns Table 3: Template frequency and coverage (mean, median, min, max) across the corpus. Coverage Type Count % of Ss Mean Med Min Max all 141,560 100% - - - - strict 12,527 8.8% 11.3 2 1 2,072 subex 10,952 7.8% 12.9 2 1 2,128 lemma 5,487 3.1% 25.8 3 1 7,490 ontologies (with an average number of 32.1 justifications) show no reaction to l-isomorphism, whereas 12 ontologies

[email protected] Survey: template coverage 28 Proportion (number) of justiﬁcations 25% (35,390) 50% (70,780) 75% (106,170) 90% (127,404) 100% (141,560) Number of templates 8 44 277 951 5,487 •75% of justiﬁcations captured by 277 (l-iso) templates!

[email protected] Survey: template coverage 29 Strict isomorphism Lemma-isomorphism (a) Template frequency for strict isomorphism. (b) Template frequency for l-isomorphism. Figure 2: Template frequencies across the corpus for strict and lemma-isomorphism. reduction by 92.2% and a 1% difference compared ct isomorphism. The number of justifications covered ngle template is slightly increased with an average of Note that any subsumption axiom in the templates Θ4 to Θ7 corresponds to an atomic subsumption chain of arbitrary length. These templates each cover between 4,206 and 7,490 Template ID (ordered by coverage) (a) Template frequency for strict isomorphism. (b) Template frequency for l-isomorphism. Figure 2: Template frequencies across the corpus for strict and lemma-isomorphism. overall reduction by 92.2% and a 1% difference compared to strict isomorphism. The number of justifications covered by a single template is slightly increased with an average of Note that any subsumption axiom in the templates Θ7 corresponds to an atomic subsumption chain of ar length. These templates each cover between 4,206 an Template ID (ordered by coverage) Coverage (number of justifications) Bubble size = number of ontologies the template occurs in

[email protected] Survey: most frequent templates 30 tailment (σ = 9.5, m = 2), which is a reduction by 33.7% compared to the full corpus. On average, a template covers 1.5 justifications (standard deviation σ = 2.3, median m = 1), with some ontologies containing entailments with large numbers of isomorphic justifications. One such example is the Orphanet Ontology of Rare Diseases, whose dominating templates are of the type Θ1 = {C1 C2, C2 ∃p1.C4, domain(p1, C3)} |= C1 C3 with atomic subsumption chains of arbitrary size in place of the first subsumption axiom, and some variations that include subproperty axioms. Two of the templates of this type cover the majority (110 and 105 justifications, respec- tively) of the 220 justifications each for several entailments in the ontology. From personal contact with the Orphanet developers we have learned that this OWL ontology is in fact 8https://sites.google.com/site/isocikm2013/ both strict and subexpression-isomorphism. A total of 1,492 entailments (7.8% of the total corpus) rom 43 ontologies are affected by lemma-isomorphism, with an average reduction of 30.3% compared to strict isomorphism for those entailments. The strongest effects can be seen in the Fission Yeast Phenotype ontology, where the justifications for several entailments only differ in the length of their atomic subsumption chains and thus are each reduced to a single template of the type Θ2 = {C1 . . . Cn, Cn ≡ C2 . . .} |= C1 C2. 5.3.2 Isomorphism within ontologies Across the justifications for all entailments of an ontology, the reductions caused by the three equivalence relations are clearly more visible than for individual entailments. However, the effects of the relations differ strongly across the 78 ontologies, with strict isomorphism generally having the strongest impact, and subexpression-isomorphism hav- duction by 92.2% and a 1% difference compared somorphism. The number of justifications covered e template is slightly increased with an average of fications (σ = 59, m = 2) per template. ost frequent template (by numbers of justifications) Θ2, which covers 2,128 (1.5% of the total set) justifi- 26 ontologies. Across the ontologies in the corpus, frequent template occurs in 28 of the 78 ontologies. plate is a single equivalence axiom which we have een in the Lipid ontology: Θ3 = {C1 ≡ C2 x} |= C1 C2 rfluous part x matches a number of operands such classes and existential restrictions. Interestingly, template occurs in the highest number of ontolo- nly covers 573 justifications across the corpus. Note tha Θ7 corre length. T justificat the corp plate occ both of w corpus. isomorph in multip out of 5, 5.3.4 As we isomorph only due do not c for lemma-isomorphism. On average, a template covers 25.8 justifications (σ = 208.5, m =3); however, the large standard deviation shows that the distribution of justifications per template has shifted towards a few very frequent templates, whereas there is still a ‘long tail’ of 1,878 templates that match only a single justification. If we consider the distribution of justifications per template over the quartiles of the corpus, 25% of the justifications in Ss can be covered by the 8 most frequent templates, 50% by the 44 most frequent templates, and 75% by the 277 (out of 5,487) most frequent templates. The most frequent templates are all subtle variations of a template containing only two or three axioms: Θ4 ={C1 C2, C3 ≡ C2 C4, C3 C5} |= C1 C5 Θ5 ={C1 C2, C5 ≡ C2 C3} |= C1 C5 Θ6 ={C1 C2, C3 ≡ C2 C4 C6, C3 C5} |= C1 C6 Θ ={C1 C2, C5 ≡ C2 C3 C4} |= C1 C5 •Atomic subsumption chains - but then also:

[email protected] Most frequent templates 31 tailment (σ = 9.5, m = 2), which is a reduction by 33.7% compared to the full corpus. On average, a template covers 1.5 justifications (standard deviation σ = 2.3, median m = 1), with some ontologies containing entailments with large numbers of isomorphic justifications. One such example is the Orphanet Ontology of Rare Diseases, whose dominating templates are of the type Θ1 = {C1 C2, C2 ∃p1.C4, domain(p1, C3)} |= C1 C3 with atomic subsumption chains of arbitrary size in place of the first subsumption axiom, and some variations that include subproperty axioms. Two of the templates of this type cover the majority (110 and 105 justifications, respec- tively) of the 220 justifications each for several entailments in the ontology. From personal contact with the Orphanet developers we have learned that this OWL ontology is in fact 8https://sites.google.com/site/isocikm2013/ both strict and subexpression-isomorphism. A total of 1,492 entailments (7.8% of the total corpus) rom 43 ontologies are affected by lemma-isomorphism, with an average reduction of 30.3% compared to strict isomorphism for those entailments. The strongest effects can be seen in the Fission Yeast Phenotype ontology, where the justifications for several entailments only differ in the length of their atomic subsumption chains and thus are each reduced to a single template of the type Θ2 = {C1 . . . Cn, Cn ≡ C2 . . .} |= C1 C2. 5.3.2 Isomorphism within ontologies Across the justifications for all entailments of an ontology, the reductions caused by the three equivalence relations are clearly more visible than for individual entailments. However, the effects of the relations differ strongly across the 78 ontologies, with strict isomorphism generally having the strongest impact, and subexpression-isomorphism hav- duction by 92.2% and a 1% difference compared somorphism. The number of justifications covered e template is slightly increased with an average of fications (σ = 59, m = 2) per template. ost frequent template (by numbers of justifications) Θ2, which covers 2,128 (1.5% of the total set) justifi- 26 ontologies. Across the ontologies in the corpus, frequent template occurs in 28 of the 78 ontologies. plate is a single equivalence axiom which we have een in the Lipid ontology: Θ3 = {C1 ≡ C2 x} |= C1 C2 rfluous part x matches a number of operands such classes and existential restrictions. Interestingly, template occurs in the highest number of ontolo- nly covers 573 justifications across the corpus. Note tha Θ7 corre length. T justificat the corp plate occ both of w corpus. isomorph in multip out of 5, 5.3.4 As we isomorph only due do not c for lemma-isomorphism. On average, a template covers 25.8 justifications (σ = 208.5, m =3); however, the large standard deviation shows that the distribution of justifications per template has shifted towards a few very frequent templates, whereas there is still a ‘long tail’ of 1,878 templates that match only a single justification. If we consider the distribution of justifications per template over the quartiles of the corpus, 25% of the justifications in Ss can be covered by the 8 most frequent templates, 50% by the 44 most frequent templates, and 75% by the 277 (out of 5,487) most frequent templates. The most frequent templates are all subtle variations of a template containing only two or three axioms: Θ4 ={C1 C2, C3 ≡ C2 C4, C3 C5} |= C1 C5 Θ5 ={C1 C2, C5 ≡ C2 C3} |= C1 C5 Θ6 ={C1 C2, C3 ≡ C2 C4 C6, C3 C5} |= C1 C6 Θ ={C1 C2, C5 ≡ C2 C3 C4} |= C1 C5 •Atomic subsumption chains - but then also: some superfluous part

[email protected] Concluding remarks 32 •We introduced equivalence relations for justiﬁcations •subexpression-isomorphism •lemma-isomorphism (with atomic subsumption chains) •We found that structural similarity occurs frequently •... but subexpression-isomorphism only has a weak effect •Future work: •Fix properties of transitivity-preserving lemmatisations •Extend “obvious” lemmatisations •Implementation of grouping/navigation in OWL tools

The Logical Diversity of Explanations in OWL On...

The Logical Diversity of Explanations in OWL Ontologies, Ulm, May 2013

spbail

More Decks by spbail

Other Decks in Research

Featured

Transcript

Samantha Bail, [email protected] Information Management Group The University of Manchester,

Samantha Bail The Logical Diversity of Explanations in OWL Ontologies

Samantha Bail The Logical Diversity of Explanations in OWL Ontologies

Samantha Bail The Logical Diversity of Explanations in OWL Ontologies

Samantha Bail The Logical Diversity of Explanations in OWL Ontologies

Samantha Bail The Logical Diversity of Explanations in OWL Ontologies

Samantha Bail The Logical Diversity of Explanations in OWL Ontologies

How can we understand (and repair) all of them?

Samantha Bail [email protected] The Logical Diversity of Explanations in OWL

Samantha Bail [email protected] The Logical Diversity of Explanations in OWL

Samantha Bail [email protected] The Logical Diversity of Explanations in OWL

Samantha Bail [email protected] The Logical Diversity of Explanations in OWL

Samantha Bail [email protected] The Logical Diversity of Explanations in OWL

Samantha Bail The Logical Diversity of Explanations in OWL Ontologies

Samantha Bail The Logical Diversity of Explanations in OWL Ontologies

Samantha Bail The Logical Diversity of Explanations in OWL Ontologies

Samantha Bail The Logical Diversity of Explanations in OWL Ontologies

Samantha Bail The Logical Diversity of Explanations in OWL Ontologies

Samantha Bail The Logical Diversity of Explanations in OWL Ontologies

Samantha Bail The Logical Diversity of Explanations in OWL Ontologies

Samantha Bail The Logical Diversity of Explanations in OWL Ontologies

Samantha Bail The Logical Diversity of Explanations in OWL Ontologies

Samantha Bail The Logical Diversity of Explanations in OWL Ontologies

Samantha Bail The Logical Diversity of Explanations in OWL Ontologies

Samantha Bail The Logical Diversity of Explanations in OWL Ontologies

Samantha Bail The Logical Diversity of Explanations in OWL Ontologies

Samantha Bail The Logical Diversity of Explanations in OWL Ontologies

Samantha Bail The Logical Diversity of Explanations in OWL Ontologies

Samantha Bail The Logical Diversity of Explanations in OWL Ontologies

Samantha Bail The Logical Diversity of Explanations in OWL Ontologies

Samantha Bail The Logical Diversity of Explanations in OWL Ontologies