Charting the Digital Library Evaluation Domain with a Semantically Enhanced Mining Methodology

Charting the Digital Library Evaluation Domain with a Semantically Enhanced
Mining Methodology S Eleni A ontzi,1 Giannis Kazadeis,1 Leonidas Papachristopoulos,2 Michalis Sfakakis,2 Giannis Tsakonas,2 Christos Papatheodorou2 13th ACM/IEEE Joint Conference on Digital Libraries, July 22-26, Indianapolis, IN, USA 1. Department of Informatics, Athens University of Economics & Business 2. Database & Information Systems Group, Department of Archives & Library Science, Ionian University

aim & scope of research

aim & scope of research • To propose a methodology
for discovering patterns in the scienti c literature.

for discovering patterns in the scienti c literature. • Our case study is performed in the digital library evaluation domain and its conference literature.

for discovering patterns in the scienti c literature. • Our case study is performed in the digital library evaluation domain and its conference literature. • We question:

for discovering patterns in the scienti c literature. • Our case study is performed in the digital library evaluation domain and its conference literature. • We question: - how we select relevant studies,

for discovering patterns in the scienti c literature. • Our case study is performed in the digital library evaluation domain and its conference literature. • We question: - how we select relevant studies, - how we annotate them,

for discovering patterns in the scienti c literature. • Our case study is performed in the digital library evaluation domain and its conference literature. • We question: - how we select relevant studies, - how we annotate them, - how we discover these patterns,

for discovering patterns in the scienti c literature. • Our case study is performed in the digital library evaluation domain and its conference literature. • We question: - how we select relevant studies, - how we annotate them, - how we discover these patterns, in an eﬀective, machine-operated way, in order to have reusable and interpretable data?

why • Abundance of scienti c information

why • Abundance of scienti c information • Limitations of
existing tools, such as reusability

existing tools, such as reusability • Lack of contextualized analytic tools

existing tools, such as reusability • Lack of contextualized analytic tools • Supervised automated processes

panorama

panorama 1. Document classi cation to identify relevant papers

panorama 1. Document classi cation to identify relevant papers -
We use a corpus of 1,824 papers from the JCDL and ECDL (now TPDL) conferences, era 2001-2011.

We use a corpus of 1,824 papers from the JCDL and ECDL (now TPDL) conferences, era 2001-2011. 2. Semantic annotation processes to mark up important concepts

We use a corpus of 1,824 papers from the JCDL and ECDL (now TPDL) conferences, era 2001-2011. 2. Semantic annotation processes to mark up important concepts - We use a schema for semantic annotation, the Digital Library Evaluation Ontology, and a semantic annotation tool, GoNTogle.

We use a corpus of 1,824 papers from the JCDL and ECDL (now TPDL) conferences, era 2001-2011. 2. Semantic annotation processes to mark up important concepts - We use a schema for semantic annotation, the Digital Library Evaluation Ontology, and a semantic annotation tool, GoNTogle. 3. Clustering to form coherent groups (K=11)

We use a corpus of 1,824 papers from the JCDL and ECDL (now TPDL) conferences, era 2001-2011. 2. Semantic annotation processes to mark up important concepts - We use a schema for semantic annotation, the Digital Library Evaluation Ontology, and a semantic annotation tool, GoNTogle. 3. Clustering to form coherent groups (K=11) 4. Interpretation with the assistance of the ontology schema

We use a corpus of 1,824 papers from the JCDL and ECDL (now TPDL) conferences, era 2001-2011. 2. Semantic annotation processes to mark up important concepts - We use a schema for semantic annotation, the Digital Library Evaluation Ontology, and a semantic annotation tool, GoNTogle. 3. Clustering to form coherent groups (K=11) 4. Interpretation with the assistance of the ontology schema • During this process we perform benchmarking tests to qualify speci c components to eﬀectively automate the exploration of the literature and the discovery of research patterns.

part 1 how we identify relevant studies

training phase

training phase • e aim was to train a classi
er to identify relevant papers.

er to identify relevant papers. • Categorization

er to identify relevant papers. • Categorization - two researchers categorized, a third one supervised

er to identify relevant papers. • Categorization - two researchers categorized, a third one supervised - descriptors: title, abstract & author keywords

er to identify relevant papers. • Categorization - two researchers categorized, a third one supervised - descriptors: title, abstract & author keywords - rater’s agreement: 82.96% for JCDL, 78% for ECDL

er to identify relevant papers. • Categorization - two researchers categorized, a third one supervised - descriptors: title, abstract & author keywords - rater’s agreement: 82.96% for JCDL, 78% for ECDL - inter-rater agreement: moderate levels of Cohen’s Kappa

er to identify relevant papers. • Categorization - two researchers categorized, a third one supervised - descriptors: title, abstract & author keywords - rater’s agreement: 82.96% for JCDL, 78% for ECDL - inter-rater agreement: moderate levels of Cohen’s Kappa - 12% positive # 88% negative

er to identify relevant papers. • Categorization - two researchers categorized, a third one supervised - descriptors: title, abstract & author keywords - rater’s agreement: 82.96% for JCDL, 78% for ECDL - inter-rater agreement: moderate levels of Cohen’s Kappa - 12% positive # 88% negative • Skewness of data addressed via resampling:

er to identify relevant papers. • Categorization - two researchers categorized, a third one supervised - descriptors: title, abstract & author keywords - rater’s agreement: 82.96% for JCDL, 78% for ECDL - inter-rater agreement: moderate levels of Cohen’s Kappa - 12% positive # 88% negative • Skewness of data addressed via resampling: - under-sampling (Tomek Links)

er to identify relevant papers. • Categorization - two researchers categorized, a third one supervised - descriptors: title, abstract & author keywords - rater’s agreement: 82.96% for JCDL, 78% for ECDL - inter-rater agreement: moderate levels of Cohen’s Kappa - 12% positive # 88% negative • Skewness of data addressed via resampling: - under-sampling (Tomek Links) - over-sampling (random over-sampling)

corpus de nition

corpus de nition • Classi cation algorithm: Naïve Bayes

corpus de nition • Classi cation algorithm: Naïve Bayes •
Two sub-sets: a development (75%) and a test (25%)

Two sub-sets: a development (75%) and a test (25%) • Ten-fold validation: the development set was randomly divided to 10 equal; 9/10 as training set and 1/10 as test set.

Two sub-sets: a development (75%) and a test (25%) • Ten-fold validation: the development set was randomly divided to 10 equal; 9/10 as training set and 1/10 as test set. 0 0.2 0.4 0.6 0.8 1.0 0 0.2 0.4 0.6 0.8 1.0

Two sub-sets: a development (75%) and a test (25%) • Ten-fold validation: the development set was randomly divided to 10 equal; 9/10 as training set and 1/10 as test set. 0 0.2 0.4 0.6 0.8 1.0 0 0.2 0.4 0.6 0.8 1.0 Test Development

Two sub-sets: a development (75%) and a test (25%) • Ten-fold validation: the development set was randomly divided to 10 equal; 9/10 as training set and 1/10 as test set. 0 0.2 0.4 0.6 0.8 1.0 0 0.2 0.4 0.6 0.8 1.0 Test Development fp rate

Two sub-sets: a development (75%) and a test (25%) • Ten-fold validation: the development set was randomly divided to 10 equal; 9/10 as training set and 1/10 as test set. 0 0.2 0.4 0.6 0.8 1.0 0 0.2 0.4 0.6 0.8 1.0 Test Development fp rate tp rate

part 2 how we annotate

the schema - DiLEO

the schema - DiLEO • DiLEO aims to conceptualize the
DL evaluation domain by exploring its key entities, their attributes and their relationships.

DL evaluation domain by exploring its key entities, their attributes and their relationships. • A two layered ontology:

DL evaluation domain by exploring its key entities, their attributes and their relationships. • A two layered ontology: - Strategic level: consists of a set of classes related with the scope and aim of an evaluation.

DL evaluation domain by exploring its key entities, their attributes and their relationships. • A two layered ontology: - Strategic level: consists of a set of classes related with the scope and aim of an evaluation. - Procedural level: consists of classes dealing with practical issues.

the instrument - GoNTogle

the instrument - GoNTogle • We used GoNTogle to generate
a RDFS knowledge base.

a RDFS knowledge base. • GoNTogle uses the weighted k-NN algorithm to support either manual, or automated ontology- based annotation.

a RDFS knowledge base. • GoNTogle uses the weighted k-NN algorithm to support either manual, or automated ontology- based annotation. • http://bit.ly/12nlryh

the process - 1/3

the process - 1/3 • GoNTogle estimates a score for
each class/subclass, calculating its presence in the k nearest neighbors.

each class/subclass, calculating its presence in the k nearest neighbors. • We set a score threshold above which a class is assigned to a new instance (optimal score: 0.18).

each class/subclass, calculating its presence in the k nearest neighbors. • We set a score threshold above which a class is assigned to a new instance (optimal score: 0.18). • e user is presented with a ranked list of the suggested classes/ subclasses and their score ranging from 0 to 1.

each class/subclass, calculating its presence in the k nearest neighbors. • We set a score threshold above which a class is assigned to a new instance (optimal score: 0.18). • e user is presented with a ranked list of the suggested classes/ subclasses and their score ranging from 0 to 1. • 2,672 annotations were manually generated.

the process - 2/3

the process - 2/3 • RDFS statements were processed to
construct a new data set (removal of stopwords, symbols, lowercasing, etc.)

construct a new data set (removal of stopwords, symbols, lowercasing, etc.) • Experiments both with un-stemmed (4,880 features) and stemmed (3,257 features) words.

construct a new data set (removal of stopwords, symbols, lowercasing, etc.) • Experiments both with un-stemmed (4,880 features) and stemmed (3,257 features) words. • Multi-label classi cation via the ML framework Meka.

construct a new data set (removal of stopwords, symbols, lowercasing, etc.) • Experiments both with un-stemmed (4,880 features) and stemmed (3,257 features) words. • Multi-label classi cation via the ML framework Meka. • Four methods - binary representation - Label powersets - RAkEL - ML-kNN • Four algorithms - Naïve Bayes - Multinomial Naïve Bayes - k-Nearest- Neighbors - Support Vector Machines • Four metrics - Hamming Loss - Accuracy - One-error - F1 macro

the process - 3/3

the process - 3/3 • Performance tests were repeated using
GoNTogle.

GoNTogle. • GoNTogle’s algorithm achieves good results in relation to the tested multi-label classi cation algorithms.

GoNTogle. • GoNTogle’s algorithm achieves good results in relation to the tested multi-label classi cation algorithms. 0 0.2 0.4 0.6 0.8 1.0 Hamming Loss Accuracy One - Error F1 macro 0.44 0.27 0.63 0.02 0.39 0.29 0.49 0.02

GoNTogle. • GoNTogle’s algorithm achieves good results in relation to the tested multi-label classi cation algorithms. 0 0.2 0.4 0.6 0.8 1.0 Hamming Loss Accuracy One - Error F1 macro 0.44 0.27 0.63 0.02 0.39 0.29 0.49 0.02 GoNTogle Meka

part 3 how we discover

clustering - 1/3

clustering - 1/3 • e nal data set consists of
224 vectors of 53 features

224 vectors of 53 features - represents the assigned annotations from the DiLEO vocabulary to the document corpus.

224 vectors of 53 features - represents the assigned annotations from the DiLEO vocabulary to the document corpus. • We represent the annotated documents by 2 vector models:

224 vectors of 53 features - represents the assigned annotations from the DiLEO vocabulary to the document corpus. • We represent the annotated documents by 2 vector models: - binary: fi has the value of 1, if the respective to fi subclass is assigned to the document m, otherwise 0.

224 vectors of 53 features - represents the assigned annotations from the DiLEO vocabulary to the document corpus. • We represent the annotated documents by 2 vector models: - binary: fi has the value of 1, if the respective to fi subclass is assigned to the document m, otherwise 0. - tf-idf: feature frequency ﬀi of fi in all vectors is equal to 1 when the respective subclass is annotated to the respective document m; idfi is the inverse document frequency of the feature i in documents M.

clustering - 2/3

clustering - 2/3 • We cluster the vector representations of
the annotations by applying 2 clustering algorithms:

the annotations by applying 2 clustering algorithms: - K-Means: partitions M data points to K clusters. e rate of decrease peaked for K near 11 when plotted the Objective function (cost or error) for various values of K.

the annotations by applying 2 clustering algorithms: - K-Means: partitions M data points to K clusters. e rate of decrease peaked for K near 11 when plotted the Objective function (cost or error) for various values of K. - Agglomerative Hierarchical Clustering: a ‘bottom up’ built hierarchy of clusters.

clustering - 3/3

clustering - 3/3 • We assess each feature of each
cluster using the frequency increase metric.

cluster using the frequency increase metric. - it calculates the increase of the frequency of a feature fi in the cluster k (cfi,k) compared to its document frequency dfi in the entire data set

cluster using the frequency increase metric. - it calculates the increase of the frequency of a feature fi in the cluster k (cfi,k) compared to its document frequency dfi in the entire data set • We select the threshold a that maximizes the F1-measure, the harmonic mean of Coverage and Dissimilarity mean.

cluster using the frequency increase metric. - it calculates the increase of the frequency of a feature fi in the cluster k (cfi,k) compared to its document frequency dfi in the entire data set • We select the threshold a that maximizes the F1-measure, the harmonic mean of Coverage and Dissimilarity mean. - Coverage: the proportion of features participating in the clusters to the total number of features

cluster using the frequency increase metric. - it calculates the increase of the frequency of a feature fi in the cluster k (cfi,k) compared to its document frequency dfi in the entire data set • We select the threshold a that maximizes the F1-measure, the harmonic mean of Coverage and Dissimilarity mean. - Coverage: the proportion of features participating in the clusters to the total number of features - Dissimilarity mean: the average of the distinctiveness of the clusters, de ned in terms of the dissimilarity di,j between all the possible pairs of the clusters.

metrics - F1-measure

metrics - F1-measure 0 0.1 0.2 0.3 0.4 0.6 0.7
0.8 0.9 1.0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

metrics - F1-measure 0 0.1 0.2 0.3 0.4 0.6 0.7
0.8 0.9 1.0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 K-Means tf-idf K-Means binary Hierarchical tf-idf

part 4 how (and what) we interpret

Levels patterns hasDimensionsType isAimingAt Research Questions isSupporting/ isSupportedBy hasPerformed/ isPerformedIn
isUsedIn/ isUsing Findings Criteria Metrics Factors Means Types Criteria Categories hasConstituent/ isConstituting Dimensions technical excellence Instruments software Activity report Goals design Subjects human agents Dimension Type summative Means survey studies isParticipatingIn Means laboratory studies Characteristics count Characteristics discipline Dimensions eﬀectiveness Objects PROCEDURAL LAYER STRATEGIC LAYER K-Means tf-idf

patterns Research Questions hasPerformed/ isPerformedIn Findings Criteria Metrics Factors Criteria
Categories hasConstituent/ isConstituting isParticipatingIn Instruments Dimensions effectiveness Dimensions Types means survey studies means laboratory studies Characteristics Goal describe means type quantitative hasMeansType activity record activity compare Level interface isAimingAt isAffecting/ isAffectedBy Objects Subjects human agents PROCEDURAL LAYER STRATEGIC LAYER Hierarchical

part 5 conclusions

conclusions

conclusions • e patterns re ect and - up to
a point - con rm the anecdotally evident research practices of DL researchers.

a point - con rm the anecdotally evident research practices of DL researchers. • Patterns have similar properties to a map.

a point - con rm the anecdotally evident research practices of DL researchers. • Patterns have similar properties to a map. - ey can provide the main and the alternative routes one can follow to reach to a destination, taking into account several practical parameters that might not know.

a point - con rm the anecdotally evident research practices of DL researchers. • Patterns have similar properties to a map. - ey can provide the main and the alternative routes one can follow to reach to a destination, taking into account several practical parameters that might not know. • By exploring previous pro les, one can weight all the available options.

a point - con rm the anecdotally evident research practices of DL researchers. • Patterns have similar properties to a map. - ey can provide the main and the alternative routes one can follow to reach to a destination, taking into account several practical parameters that might not know. • By exploring previous pro les, one can weight all the available options. • is approach can extend other coding methodologies in terms of transparency, standardization and reusability.

ank you for your attention. questions?

Charting the Digital Library Evaluation Domain ...

Charting the Digital Library Evaluation Domain with a Semantically Enhanced Mining Methodology

More Decks by Giannis Tsakonas

Other Decks in Research

Featured

Transcript