Charting the Digital Library Evaluation Domain with a Semantically Enhanced Mining Methodology

Charting the Digital Library Evaluation Domain with a Semantically Enhanced Mining Methodology

Afiontzi, Kazadeis, Papachristopoulos, Papatheodorou, Sfakakis, Tsakonas (2013) In Proceedings of the 13th ACM/IEEE-CS Joint Conference on Digital Libraries.

[CC BY-NC-SA]

8bc27ce7461f9557879771e9f9a7bdd8?s=128

Giannis Tsakonas

July 23, 2013
Tweet

Transcript

  1. Charting the Digital Library Evaluation Domain with a Semantically Enhanced

    Mining Methodology S Eleni A ontzi,1 Giannis Kazadeis,1 Leonidas Papachristopoulos,2 Michalis Sfakakis,2 Giannis Tsakonas,2 Christos Papatheodorou2 13th ACM/IEEE Joint Conference on Digital Libraries, July 22-26, Indianapolis, IN, USA 1. Department of Informatics, Athens University of Economics & Business 2. Database & Information Systems Group, Department of Archives & Library Science, Ionian University
  2. None
  3. aim & scope of research

  4. aim & scope of research • To propose a methodology

    for discovering patterns in the scienti c literature.
  5. aim & scope of research • To propose a methodology

    for discovering patterns in the scienti c literature. • Our case study is performed in the digital library evaluation domain and its conference literature.
  6. aim & scope of research • To propose a methodology

    for discovering patterns in the scienti c literature. • Our case study is performed in the digital library evaluation domain and its conference literature. • We question:
  7. aim & scope of research • To propose a methodology

    for discovering patterns in the scienti c literature. • Our case study is performed in the digital library evaluation domain and its conference literature. • We question: - how we select relevant studies,
  8. aim & scope of research • To propose a methodology

    for discovering patterns in the scienti c literature. • Our case study is performed in the digital library evaluation domain and its conference literature. • We question: - how we select relevant studies, - how we annotate them,
  9. aim & scope of research • To propose a methodology

    for discovering patterns in the scienti c literature. • Our case study is performed in the digital library evaluation domain and its conference literature. • We question: - how we select relevant studies, - how we annotate them, - how we discover these patterns,
  10. aim & scope of research • To propose a methodology

    for discovering patterns in the scienti c literature. • Our case study is performed in the digital library evaluation domain and its conference literature. • We question: - how we select relevant studies, - how we annotate them, - how we discover these patterns, in an effective, machine-operated way, in order to have reusable and interpretable data?
  11. None
  12. why

  13. why • Abundance of scienti c information

  14. why • Abundance of scienti c information • Limitations of

    existing tools, such as reusability
  15. why • Abundance of scienti c information • Limitations of

    existing tools, such as reusability • Lack of contextualized analytic tools
  16. why • Abundance of scienti c information • Limitations of

    existing tools, such as reusability • Lack of contextualized analytic tools • Supervised automated processes
  17. None
  18. panorama

  19. panorama 1. Document classi cation to identify relevant papers

  20. panorama 1. Document classi cation to identify relevant papers -

    We use a corpus of 1,824 papers from the JCDL and ECDL (now TPDL) conferences, era 2001-2011.
  21. panorama 1. Document classi cation to identify relevant papers -

    We use a corpus of 1,824 papers from the JCDL and ECDL (now TPDL) conferences, era 2001-2011. 2. Semantic annotation processes to mark up important concepts
  22. panorama 1. Document classi cation to identify relevant papers -

    We use a corpus of 1,824 papers from the JCDL and ECDL (now TPDL) conferences, era 2001-2011. 2. Semantic annotation processes to mark up important concepts - We use a schema for semantic annotation, the Digital Library Evaluation Ontology, and a semantic annotation tool, GoNTogle.
  23. panorama 1. Document classi cation to identify relevant papers -

    We use a corpus of 1,824 papers from the JCDL and ECDL (now TPDL) conferences, era 2001-2011. 2. Semantic annotation processes to mark up important concepts - We use a schema for semantic annotation, the Digital Library Evaluation Ontology, and a semantic annotation tool, GoNTogle. 3. Clustering to form coherent groups (K=11)
  24. panorama 1. Document classi cation to identify relevant papers -

    We use a corpus of 1,824 papers from the JCDL and ECDL (now TPDL) conferences, era 2001-2011. 2. Semantic annotation processes to mark up important concepts - We use a schema for semantic annotation, the Digital Library Evaluation Ontology, and a semantic annotation tool, GoNTogle. 3. Clustering to form coherent groups (K=11) 4. Interpretation with the assistance of the ontology schema
  25. panorama 1. Document classi cation to identify relevant papers -

    We use a corpus of 1,824 papers from the JCDL and ECDL (now TPDL) conferences, era 2001-2011. 2. Semantic annotation processes to mark up important concepts - We use a schema for semantic annotation, the Digital Library Evaluation Ontology, and a semantic annotation tool, GoNTogle. 3. Clustering to form coherent groups (K=11) 4. Interpretation with the assistance of the ontology schema
  26. panorama 1. Document classi cation to identify relevant papers -

    We use a corpus of 1,824 papers from the JCDL and ECDL (now TPDL) conferences, era 2001-2011. 2. Semantic annotation processes to mark up important concepts - We use a schema for semantic annotation, the Digital Library Evaluation Ontology, and a semantic annotation tool, GoNTogle. 3. Clustering to form coherent groups (K=11) 4. Interpretation with the assistance of the ontology schema • During this process we perform benchmarking tests to qualify speci c components to effectively automate the exploration of the literature and the discovery of research patterns.
  27. None
  28. part 1 how we identify relevant studies

  29. None
  30. training phase

  31. training phase • e aim was to train a classi

    er to identify relevant papers.
  32. training phase • e aim was to train a classi

    er to identify relevant papers. • Categorization
  33. training phase • e aim was to train a classi

    er to identify relevant papers. • Categorization - two researchers categorized, a third one supervised
  34. training phase • e aim was to train a classi

    er to identify relevant papers. • Categorization - two researchers categorized, a third one supervised - descriptors: title, abstract & author keywords
  35. training phase • e aim was to train a classi

    er to identify relevant papers. • Categorization - two researchers categorized, a third one supervised - descriptors: title, abstract & author keywords - rater’s agreement: 82.96% for JCDL, 78% for ECDL
  36. training phase • e aim was to train a classi

    er to identify relevant papers. • Categorization - two researchers categorized, a third one supervised - descriptors: title, abstract & author keywords - rater’s agreement: 82.96% for JCDL, 78% for ECDL - inter-rater agreement: moderate levels of Cohen’s Kappa
  37. training phase • e aim was to train a classi

    er to identify relevant papers. • Categorization - two researchers categorized, a third one supervised - descriptors: title, abstract & author keywords - rater’s agreement: 82.96% for JCDL, 78% for ECDL - inter-rater agreement: moderate levels of Cohen’s Kappa - 12% positive # 88% negative
  38. training phase • e aim was to train a classi

    er to identify relevant papers. • Categorization - two researchers categorized, a third one supervised - descriptors: title, abstract & author keywords - rater’s agreement: 82.96% for JCDL, 78% for ECDL - inter-rater agreement: moderate levels of Cohen’s Kappa - 12% positive # 88% negative • Skewness of data addressed via resampling:
  39. training phase • e aim was to train a classi

    er to identify relevant papers. • Categorization - two researchers categorized, a third one supervised - descriptors: title, abstract & author keywords - rater’s agreement: 82.96% for JCDL, 78% for ECDL - inter-rater agreement: moderate levels of Cohen’s Kappa - 12% positive # 88% negative • Skewness of data addressed via resampling: - under-sampling (Tomek Links)
  40. training phase • e aim was to train a classi

    er to identify relevant papers. • Categorization - two researchers categorized, a third one supervised - descriptors: title, abstract & author keywords - rater’s agreement: 82.96% for JCDL, 78% for ECDL - inter-rater agreement: moderate levels of Cohen’s Kappa - 12% positive # 88% negative • Skewness of data addressed via resampling: - under-sampling (Tomek Links) - over-sampling (random over-sampling)
  41. None
  42. corpus de nition

  43. corpus de nition • Classi cation algorithm: Naïve Bayes

  44. corpus de nition • Classi cation algorithm: Naïve Bayes •

    Two sub-sets: a development (75%) and a test (25%)
  45. corpus de nition • Classi cation algorithm: Naïve Bayes •

    Two sub-sets: a development (75%) and a test (25%) • Ten-fold validation: the development set was randomly divided to 10 equal; 9/10 as training set and 1/10 as test set.
  46. corpus de nition • Classi cation algorithm: Naïve Bayes •

    Two sub-sets: a development (75%) and a test (25%) • Ten-fold validation: the development set was randomly divided to 10 equal; 9/10 as training set and 1/10 as test set. 0 0.2 0.4 0.6 0.8 1.0 0 0.2 0.4 0.6 0.8 1.0
  47. corpus de nition • Classi cation algorithm: Naïve Bayes •

    Two sub-sets: a development (75%) and a test (25%) • Ten-fold validation: the development set was randomly divided to 10 equal; 9/10 as training set and 1/10 as test set. 0 0.2 0.4 0.6 0.8 1.0 0 0.2 0.4 0.6 0.8 1.0 Test Development
  48. corpus de nition • Classi cation algorithm: Naïve Bayes •

    Two sub-sets: a development (75%) and a test (25%) • Ten-fold validation: the development set was randomly divided to 10 equal; 9/10 as training set and 1/10 as test set. 0 0.2 0.4 0.6 0.8 1.0 0 0.2 0.4 0.6 0.8 1.0 Test Development fp rate
  49. corpus de nition • Classi cation algorithm: Naïve Bayes •

    Two sub-sets: a development (75%) and a test (25%) • Ten-fold validation: the development set was randomly divided to 10 equal; 9/10 as training set and 1/10 as test set. 0 0.2 0.4 0.6 0.8 1.0 0 0.2 0.4 0.6 0.8 1.0 Test Development fp rate tp rate
  50. corpus de nition • Classi cation algorithm: Naïve Bayes •

    Two sub-sets: a development (75%) and a test (25%) • Ten-fold validation: the development set was randomly divided to 10 equal; 9/10 as training set and 1/10 as test set. 0 0.2 0.4 0.6 0.8 1.0 0 0.2 0.4 0.6 0.8 1.0 Test Development fp rate tp rate
  51. None
  52. part 2 how we annotate

  53. the schema - DiLEO

  54. the schema - DiLEO • DiLEO aims to conceptualize the

    DL evaluation domain by exploring its key entities, their attributes and their relationships.
  55. the schema - DiLEO • DiLEO aims to conceptualize the

    DL evaluation domain by exploring its key entities, their attributes and their relationships. • A two layered ontology:
  56. the schema - DiLEO • DiLEO aims to conceptualize the

    DL evaluation domain by exploring its key entities, their attributes and their relationships. • A two layered ontology: - Strategic level: consists of a set of classes related with the scope and aim of an evaluation.
  57. the schema - DiLEO • DiLEO aims to conceptualize the

    DL evaluation domain by exploring its key entities, their attributes and their relationships. • A two layered ontology: - Strategic level: consists of a set of classes related with the scope and aim of an evaluation. - Procedural level: consists of classes dealing with practical issues.
  58. None
  59. the instrument - GoNTogle

  60. the instrument - GoNTogle

  61. the instrument - GoNTogle • We used GoNTogle to generate

    a RDFS knowledge base.
  62. the instrument - GoNTogle • We used GoNTogle to generate

    a RDFS knowledge base. • GoNTogle uses the weighted k-NN algorithm to support either manual, or automated ontology- based annotation.
  63. the instrument - GoNTogle • We used GoNTogle to generate

    a RDFS knowledge base. • GoNTogle uses the weighted k-NN algorithm to support either manual, or automated ontology- based annotation.
  64. the instrument - GoNTogle • We used GoNTogle to generate

    a RDFS knowledge base. • GoNTogle uses the weighted k-NN algorithm to support either manual, or automated ontology- based annotation. • http://bit.ly/12nlryh
  65. None
  66. the process - 1/3

  67. the process - 1/3 • GoNTogle estimates a score for

    each class/subclass, calculating its presence in the k nearest neighbors.
  68. the process - 1/3 • GoNTogle estimates a score for

    each class/subclass, calculating its presence in the k nearest neighbors. • We set a score threshold above which a class is assigned to a new instance (optimal score: 0.18).
  69. the process - 1/3 • GoNTogle estimates a score for

    each class/subclass, calculating its presence in the k nearest neighbors. • We set a score threshold above which a class is assigned to a new instance (optimal score: 0.18). • e user is presented with a ranked list of the suggested classes/ subclasses and their score ranging from 0 to 1.
  70. the process - 1/3 • GoNTogle estimates a score for

    each class/subclass, calculating its presence in the k nearest neighbors. • We set a score threshold above which a class is assigned to a new instance (optimal score: 0.18). • e user is presented with a ranked list of the suggested classes/ subclasses and their score ranging from 0 to 1. • 2,672 annotations were manually generated.
  71. None
  72. the process - 2/3

  73. the process - 2/3 • RDFS statements were processed to

    construct a new data set (removal of stopwords, symbols, lowercasing, etc.)
  74. the process - 2/3 • RDFS statements were processed to

    construct a new data set (removal of stopwords, symbols, lowercasing, etc.) • Experiments both with un-stemmed (4,880 features) and stemmed (3,257 features) words.
  75. the process - 2/3 • RDFS statements were processed to

    construct a new data set (removal of stopwords, symbols, lowercasing, etc.) • Experiments both with un-stemmed (4,880 features) and stemmed (3,257 features) words. • Multi-label classi cation via the ML framework Meka.
  76. the process - 2/3 • RDFS statements were processed to

    construct a new data set (removal of stopwords, symbols, lowercasing, etc.) • Experiments both with un-stemmed (4,880 features) and stemmed (3,257 features) words. • Multi-label classi cation via the ML framework Meka. • Four methods - binary representation - Label powersets - RAkEL - ML-kNN • Four algorithms - Naïve Bayes - Multinomial Naïve Bayes - k-Nearest- Neighbors - Support Vector Machines • Four metrics - Hamming Loss - Accuracy - One-error - F1 macro
  77. None
  78. the process - 3/3

  79. the process - 3/3 • Performance tests were repeated using

    GoNTogle.
  80. the process - 3/3 • Performance tests were repeated using

    GoNTogle. • GoNTogle’s algorithm achieves good results in relation to the tested multi-label classi cation algorithms.
  81. the process - 3/3 • Performance tests were repeated using

    GoNTogle. • GoNTogle’s algorithm achieves good results in relation to the tested multi-label classi cation algorithms. 0 0.2 0.4 0.6 0.8 1.0 Hamming Loss Accuracy One - Error F1 macro 0.44 0.27 0.63 0.02 0.39 0.29 0.49 0.02
  82. the process - 3/3 • Performance tests were repeated using

    GoNTogle. • GoNTogle’s algorithm achieves good results in relation to the tested multi-label classi cation algorithms. 0 0.2 0.4 0.6 0.8 1.0 Hamming Loss Accuracy One - Error F1 macro 0.44 0.27 0.63 0.02 0.39 0.29 0.49 0.02 GoNTogle Meka
  83. None
  84. part 3 how we discover

  85. None
  86. clustering - 1/3

  87. clustering - 1/3 • e nal data set consists of

    224 vectors of 53 features
  88. clustering - 1/3 • e nal data set consists of

    224 vectors of 53 features - represents the assigned annotations from the DiLEO vocabulary to the document corpus.
  89. clustering - 1/3 • e nal data set consists of

    224 vectors of 53 features - represents the assigned annotations from the DiLEO vocabulary to the document corpus. • We represent the annotated documents by 2 vector models:
  90. clustering - 1/3 • e nal data set consists of

    224 vectors of 53 features - represents the assigned annotations from the DiLEO vocabulary to the document corpus. • We represent the annotated documents by 2 vector models: - binary: fi has the value of 1, if the respective to fi subclass is assigned to the document m, otherwise 0.
  91. clustering - 1/3 • e nal data set consists of

    224 vectors of 53 features - represents the assigned annotations from the DiLEO vocabulary to the document corpus. • We represent the annotated documents by 2 vector models: - binary: fi has the value of 1, if the respective to fi subclass is assigned to the document m, otherwise 0. - tf-idf: feature frequency ffi of fi in all vectors is equal to 1 when the respective subclass is annotated to the respective document m; idfi is the inverse document frequency of the feature i in documents M.
  92. None
  93. clustering - 2/3

  94. clustering - 2/3 • We cluster the vector representations of

    the annotations by applying 2 clustering algorithms:
  95. clustering - 2/3 • We cluster the vector representations of

    the annotations by applying 2 clustering algorithms: - K-Means: partitions M data points to K clusters. e rate of decrease peaked for K near 11 when plotted the Objective function (cost or error) for various values of K.
  96. clustering - 2/3 • We cluster the vector representations of

    the annotations by applying 2 clustering algorithms: - K-Means: partitions M data points to K clusters. e rate of decrease peaked for K near 11 when plotted the Objective function (cost or error) for various values of K. - Agglomerative Hierarchical Clustering: a ‘bottom up’ built hierarchy of clusters.
  97. None
  98. clustering - 3/3

  99. clustering - 3/3 • We assess each feature of each

    cluster using the frequency increase metric.
  100. clustering - 3/3 • We assess each feature of each

    cluster using the frequency increase metric. - it calculates the increase of the frequency of a feature fi in the cluster k (cfi,k) compared to its document frequency dfi in the entire data set
  101. clustering - 3/3 • We assess each feature of each

    cluster using the frequency increase metric. - it calculates the increase of the frequency of a feature fi in the cluster k (cfi,k) compared to its document frequency dfi in the entire data set • We select the threshold a that maximizes the F1-measure, the harmonic mean of Coverage and Dissimilarity mean.
  102. clustering - 3/3 • We assess each feature of each

    cluster using the frequency increase metric. - it calculates the increase of the frequency of a feature fi in the cluster k (cfi,k) compared to its document frequency dfi in the entire data set • We select the threshold a that maximizes the F1-measure, the harmonic mean of Coverage and Dissimilarity mean. - Coverage: the proportion of features participating in the clusters to the total number of features
  103. clustering - 3/3 • We assess each feature of each

    cluster using the frequency increase metric. - it calculates the increase of the frequency of a feature fi in the cluster k (cfi,k) compared to its document frequency dfi in the entire data set • We select the threshold a that maximizes the F1-measure, the harmonic mean of Coverage and Dissimilarity mean. - Coverage: the proportion of features participating in the clusters to the total number of features - Dissimilarity mean: the average of the distinctiveness of the clusters, de ned in terms of the dissimilarity di,j between all the possible pairs of the clusters.
  104. None
  105. metrics - F1-measure

  106. metrics - F1-measure 0 0.1 0.2 0.3 0.4 0.6 0.7

    0.8 0.9 1.0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
  107. metrics - F1-measure 0 0.1 0.2 0.3 0.4 0.6 0.7

    0.8 0.9 1.0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 K-Means tf-idf K-Means binary Hierarchical tf-idf
  108. None
  109. part 4 how (and what) we interpret

  110. Levels patterns hasDimensionsType isAimingAt Research Questions isSupporting/ isSupportedBy hasPerformed/ isPerformedIn

    isUsedIn/ isUsing Findings Criteria Metrics Factors Means Types Criteria Categories hasConstituent/ isConstituting Dimensions technical excellence Instruments software Activity report Goals design Subjects human agents Dimension Type summative Means survey studies isParticipatingIn Means laboratory studies Characteristics count Characteristics discipline Dimensions effectiveness Objects PROCEDURAL LAYER STRATEGIC LAYER K-Means tf-idf
  111. patterns Research Questions hasPerformed/ isPerformedIn Findings Criteria Metrics Factors Criteria

    Categories hasConstituent/ isConstituting isParticipatingIn Instruments Dimensions effectiveness Dimensions Types means survey studies means laboratory studies Characteristics Goal describe means type quantitative hasMeansType activity record activity compare Level interface isAimingAt isAffecting/ isAffectedBy Objects Subjects human agents PROCEDURAL LAYER STRATEGIC LAYER Hierarchical
  112. None
  113. part 5 conclusions

  114. None
  115. conclusions

  116. conclusions • e patterns re ect and - up to

    a point - con rm the anecdotally evident research practices of DL researchers.
  117. conclusions • e patterns re ect and - up to

    a point - con rm the anecdotally evident research practices of DL researchers. • Patterns have similar properties to a map.
  118. conclusions • e patterns re ect and - up to

    a point - con rm the anecdotally evident research practices of DL researchers. • Patterns have similar properties to a map. - ey can provide the main and the alternative routes one can follow to reach to a destination, taking into account several practical parameters that might not know.
  119. conclusions • e patterns re ect and - up to

    a point - con rm the anecdotally evident research practices of DL researchers. • Patterns have similar properties to a map. - ey can provide the main and the alternative routes one can follow to reach to a destination, taking into account several practical parameters that might not know. • By exploring previous pro les, one can weight all the available options.
  120. conclusions • e patterns re ect and - up to

    a point - con rm the anecdotally evident research practices of DL researchers. • Patterns have similar properties to a map. - ey can provide the main and the alternative routes one can follow to reach to a destination, taking into account several practical parameters that might not know. • By exploring previous pro les, one can weight all the available options. • is approach can extend other coding methodologies in terms of transparency, standardization and reusability.
  121. ank you for your attention. questions?