Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The “Nomenclature of Multidimensionality” in th...

Giannis Tsakonas
September 08, 2016

The “Nomenclature of Multidimensionality” in the Digital Libraries Evaluation Domain

Digital libraries evaluation is characterised as an interdisciplinary and multidisciplinary domain posing a set of challenges to the research communities that intend to utilise and assess criteria, methods and tools. The amount of scientific production, which is published on the field, hinders and disorientates the researchers who are interested in the domain. The researchers need guidance in order to exploit the considerable amount of data and the diversity of methods effectively as well as to identify new research goals and develop their plans for future works. This paper proposes a methodological pathway to investigate the core topics of the digital library evaluation domain, author communities, their relationships, as well as the researchers who significantly contribute to major topics. The proposed methodology exploits topic modelling algorithms and network analysis on a corpus consisting of the digital library evaluation papers presented in JCDL,ECDL/TDPL and ICADL conferences in the period 2001–2013.

Full text at: dx.doi.org/10.1007/978-3-319-43997-6_19
Session: Digital Library Evaluation
Time: Thursday, 08/Sep/2016, 9:00am - 10:30am
Chair: Claus-Peter Klas
Location: Blauer Saal, Hannover Congress Centrum

[CC BY-NC-SA]

Giannis Tsakonas

September 08, 2016
Tweet

More Decks by Giannis Tsakonas

Other Decks in Research

Transcript

  1. The “nomenclature of multidimensionality” in the digital libraries evaluation domain

    Leonidas Papachristopoulos1,2, Giannis Tsakonas3, Michalis Sfakakis1, Nikos Kleidis4, and Christos Papatheodorou1,2 1 Dept. of Archives, Library Science and Museology, Ionian University, Corfu, Greece 2 Digital Curation Unit, Institute for the Management of Information Systems, ‘Athena’ Research Centre, Athens, Greece 3 Library and Information Center University of Patras, Patras, Greece 4 Dept. of Informatics, Athens University of Economics and Business, Greece
  2. Introduction / aim / scope 1. We aimed to detect

    important topics and key persons of the Digital Library evaluation domain by applying the Latent Dirichlet Allocation (LDA) modelling technique on a corpus of conference papers: • Source: JCDL, ECDL/TPDL & ICADL • Period: 2001–2013 • Topics: 13 topics 2. We used network analysis centrality metrics to gain awareness of the relationships between these topics. / 3 /
  3. Research questions 1. What is the importance of these topics?

    1a Which are the most prominent topics emerged in DL evaluation? 1b How they interact each other? 2. Which are the most important research groups or individuals in the DL evaluation domain? 3. How ‘multidimensional’ is the behavior of the researchers in the field? / 4 /
  4. Selection stage • 395 papers (both full and short) from

    a pool of 2001 were classified as DL evaluation papers by a Naïve Bayes classifier. • The classifier was assessed by three domain experts, having achieved a high inter-raters’ agreement score. / 5 /
  5. Topic extraction stage • The documents were converted to text.

    • The texts were tokenized to construct a ‘bag of words’. • The ‘bag of words’ was crosschecked to exclude stop words and remove all frequent (>2,100) and rare words (<5). • A vocabulary of 38,298 unique terms and 742,224 tokens was formed. • Each paper contributes on average 1,879 tokens / 6 /
  6. Topic modelling stage 1/2 • Topic modeling analyzes large quantities

    of unlabeled data. • A topic is a probability distribution over a collection of words. • Each document is a random composition of a number of topics. / 7 /
  7. Topic modelling stage 2/2 • Our texts were imported to

    Mimno’s jsLDA (javascript LDA) tool. • 1,000 training iterations were run to achieve a stable structure of topics. • Several tests were executed to specify the optimal interpretable number of topics. • Three domain experts examined the word structure of each topic. • The optimal interpretable number of topics was found to be thirteen (13). / 8 /
  8. Topics correlation • jsLDA offers a topic correlation functionality based

    on the Pointwise Mutual Information (PMI) indicator. • PMI compares the probability of two topics co-occurring in a document with the independent existence of each one within the same document. • The result was to construct a graph with 13 nodes (topics) and 36 edges (correlation probabilities). / 9 /
  9. RQ 1a: Topics significance - metrics • Degree centrality: the

    ability of one topic to communicate on a semantic level with others • Closeness centrality: the ability of one topic to directly connect with others • Betweenness centrality: the ability of a topic to stand in a central position and bridge other topics • Clustering Coefficient: localization of topics clusters / 10 /
  10. RQ 1a: Topics significance Degree Centrality Closeness Centrality Betweenness Centrality

    Clustering Coefficient Distributed Services 5 1.58 2.75 0.20 Educational Content 4 1.67 0.33 0.83 Information Retrieval 6 1.50 2.08 0.60 Information Seeking 11 1.08 19.92 0.36 Interface Usability 5 1.58 1.00 0.70 Multimedia 4 1.67 1.00 0.67 Metadata Quality 5 1.58 3.03 0.40 Preservation 4 1.67 0.45 0.67 Reading Behavior 6 1.50 2.17 0.60 Recommendation Systems 5 1.58 0.78 0.70 Search Engines 5 1.58 2.95 0.40 Similarity Performance 5 1.58 1.17 0.70 Text Classification 7 1.42 4.37 0.52 / 11 /
  11. RQ 1b: Topics interaction -1- • Reading behavior • Information

    seeking • Interface usability • Metadata quality • Educational content -2- • Information retrieval • Search engines • Text classification • Similarity performance • Recommendation systems • Information seeking • Two main subgraphs • based on PMI and clustering coefficient / 12 /
  12. RQ 2: authors contribution • Our corpus consists of 395

    papers by 905 unique authors. • An author participates to more than one paper; thus, the total number of author participations equals to 1,335. • a paper has an average of 3.38 of author participations • an author participates on average 1.47 times in the papers. / 13 /
  13. RQ 2: authors contribution TOPIC AUTHORS PER PAPER Educational content

    4.4 Metadata quality 3.82 Distributed Services 3.58 Similarity performance 3.45 Interface usability 3.44 Multimedia 3.41 Information seeking 3.37 Recommendation systems 3.27 Search engines 3.19 Information retrieval 3.02 Text classification 3.01 Preservation 2.93 Reading behavior 2.88 / 14 /
  14. RQ 3: authors’ multidimensionality / 15 / • An author

    contributes to one or more topics. • 3 topics: 382 authors • 2 topics: 207 authors • 1 topic: 37 authors
  15. Summary 1. We applied Latent Dirichlet Allocation (LDA) on a

    corpus of papers to identify key topics of the DL evaluation domain. • We created a topic map of the domain and helped to discover groups of authors that have impact on several topics. 2. We used Network Analysis centrality metrics to gain awareness of the structure, relationships and information flows. • We revealed bipartite relationships between key topics and key authors/groups of the DL evaluation domain. / 16 /
  16. Thank you for your attention Questions? Full text at: dx.doi.org/10.1007/978-3-319-43997-6_19

    Session: Digital Library Evaluation Time: Thursday, 08/Sep/2016, 9:00am - 10:30am Chair: Claus-Peter Klas Location: Blauer Saal, Hannover Congress Centrum