Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Sequential Action Patterns in Collaborative Ontology Engineering Projects: A Case-study in the Biomedical Domain

Sequential Action Patterns in Collaborative Ontology Engineering Projects: A Case-study in the Biomedical Domain

Simon Walk's talk at CIKM '14 about our paper titled "Sequential Action Patterns in Collaborative Ontology Engineering Projects: A Case-study in the Biomedical Domain"

Philipp Singer

April 20, 2016
Tweet

More Decks by Philipp Singer

Other Decks in Science

Transcript

  1. 1
     Graz University of Technology CIKM2014
    S C I E N C E 
    P A S S I O N 
    T E C H N O L O G Y
     Graz University of Technology CIKM2014
    Sequential Action Patterns in
    Collaborative Ontology-Engineering Projects:
    A Case-Study in the Biomedical Domain
    Simon Walk1, Philipp Singer2 and Markus Strohmaier2,3
    1 Graz University of Technology
    2 Gesis – Leibniz Institute for the Social Sciences
    3 University of Koblenz

    View full-size slide

  2. 2
     Graz University of Technology CIKM2014
    2 Introduction & Motivation
    The importance of collaborative ontology-engineering
    projects increased over recent years due to an
    increase in
    • complexity of the modeled domains
    • requirements for the resulting ontology
    No individual is able to single-handedly cover the
    increased complexity and requirements.
    Hence, it is crucial to better understand and steer the
    underlying processes of how users collaboratively
    work on an ontology (i.e., via predictive models).

    View full-size slide

  3. 3
     Graz University of Technology CIKM2014
    3 Approach & Objective
    To that extend we analyzed five collaborative ontology-
    engineering projects from the biomedical domain to:
    1. explore regularities and common patterns in user
    action sequences
    2. fit and select models using Markov chains of
    varying order
    3. predict user actions via the fitted Markov chains
    Our main objective is to predict future user actions
    in collaborative ontology-engineering projects.

    View full-size slide

  4. 4
     Graz University of Technology CIKM2014
    4 Datasets
    Five collaborative ontology-engineering projects from
    the biomedical domain with varying sizes of features.
    Note that all ontologies were created with WebProtégé
    or derivatives of WebProtégé!

    View full-size slide

  5. 5
     Graz University of Technology CIKM2014
    5 Types of Action Paths

    View full-size slide

  6. 6
     Graz University of Technology CIKM2014
    6 Types of Action Paths

    View full-size slide

  7. 7
     Graz University of Technology CIKM2014
    7 Types of Action Paths

    View full-size slide

  8. 8
     Graz University of Technology CIKM2014
    8 Types of Action Paths

    View full-size slide

  9. 9
     Graz University of Technology CIKM2014
    9 Extracted Action Paths
    1. Users for Classes
     Sequences of users that changed a class.
    2. Change Types for Users & Classes
     Sequences of change types performed by a user / on
    a class.
    3. Properties for Users & Classes
     Sequences of properties changed by a user / for a
    class.

    View full-size slide

  10. 10
     Graz University of Technology CIKM2014
    10
    Exploring Regularities and
    Sequential Patterns

    View full-size slide

  11. 11
     Graz University of Technology CIKM2014
    11 Exploring Regularities
    Randomness & Regularities
    Wald-Wolfowitz runs test
     Adapted by O’Brien and Dyck (1985)
     For ~60% of our paths, regularities could be detected.1
    Sequential Pattern Mining
    PrefixSpan to investigate commonly used sequential
    patterns.
    Only immediately succeeding states build patterns.
     E.g., “A B C” contains “A B” and “B C” but not “A C”
    1https://github.com/psinger/RunsTest

    View full-size slide

  12. 12
     Graz University of Technology CIKM2014
    12 Results for the Sequential Pattern Analysis
    Users for Classes Paths

    View full-size slide

  13. 13
     Graz University of Technology CIKM2014
    13 Results for the Sequential Pattern Analysis
    Users for Classes Paths

    View full-size slide

  14. 14
     Graz University of Technology CIKM2014
    14
    Model Fitting & Selection

    View full-size slide

  15. 15
     Graz University of Technology CIKM2014
    Modeling Fitting
     Markov chains are stochastic processes
    representing transition probabilities between
    a countable number of known states.
     A state space: listing all possible states
     A transition matrix: listing all transition-probabilities
    between states
     A Markov chain of n-th order means that n previous
    states contain predictive information about the next
    state.

    View full-size slide

  16. 16
     Graz University of Technology CIKM2014
    16 Modeling Fitting & Selection
    We fitted Models from orders of zero to five.2
     Lower order models are nested within higher order
    models.
     Higher orders need exponentially more parameters
    and may result in overfitting.
    Bayesian model selection (Singer et al. 2014)2
     Higher order models receive a penalty due to higher
    complexity.
    2 https://github.com/psinger/PathTools

    View full-size slide

  17. 17
     Graz University of Technology CIKM2014
    17 Results Bayesian Model Selection

    View full-size slide

  18. 18
     Graz University of Technology CIKM2014
    18
    Predicting User Actions

    View full-size slide

  19. 19
     Graz University of Technology CIKM2014
    19 K-Fold Cross-Fold Prediction Experiment
    1. Fit Markov chain model.
     Split Paths into training and test set (stratified).
     Rank transitions for each row in the transition matrix.
    1. Determine position of test set transition in the fitted
    Markov chain model.
    1. Calculate average over all positions.
     Average Position of 1 equals best prediction
    accuracy.

    View full-size slide

  20. 20
     Graz University of Technology CIKM2014
    20 K-Fold Cross-Fold Prediction Results

    View full-size slide

  21. 21
     Graz University of Technology CIKM2014
    21 Results for the Prediction Task

    View full-size slide

  22. 22
     Graz University of Technology CIKM2014
    22 Conclusions
     A number of sequences were produced in a non-
    random way and frequent patterns can be extracted.
     Memory effects (serial dependence) can increase
    prediction accuracy.
     The resulting prediction models can (potentially) be
    used for
     the creation of various recommendations as well as
     to assess the impact of potential changes on the
    ontology and the community.

    View full-size slide

  23. 23
     Graz University of Technology CIKM2014
    23 Future Work
     Include additional data sources (e.g., Semantic
    MediaWikis).
     Analyze higher order patterns and compare patterns
    of different data sources
     Conduct live-lab experiments with generated
    prediction-models (recommendations).

    View full-size slide

  24. 24
     Graz University of Technology CIKM2014
    24
    Questions?

    View full-size slide

  25. 25
     Graz University of Technology CIKM2014
     Graz University of Technology CIKM2014
    Thank you for your attention!

    View full-size slide

  26. 26
     Graz University of Technology CIKM2014
    26 References
    Wald and J. Wolfowitz. On a test whether two samples are from
    the same population. The Annals of Mathematical Statistics,
    11(2):147–162, 1940.
    P. C. O’Brien and P. J. Dyck. A runs test based on run lengths.
    Biometrics, pages 237–244, 1985.
    P. Singer, D. Helic, B. Taraghi, and M. Strohmaier. Detecting
    memory and structure in human navigation patterns using
    markov chain models of varying order. PloS one,
    9(7):e102070, 2014.

    View full-size slide