Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Modeling and Mining Sequential Data

Modeling and Mining Sequential Data

Philipp Singer

April 20, 2016
Tweet

More Decks by Philipp Singer

Other Decks in Education

Transcript

  1. Modeling and Mining Sequential Data Machine Learning and Data Mining

    Philipp Singer CC image courtesy of user puliarfanita on Flickr
  2. 8 Let us distinguish two types of sequence data •

    Continuous time series • Categorical (discrete) sequences
  3. 9 Let us distinguish two types of sequence data •

    Continuous time series – Stock share price – Daily degree in Cologne • Categorical (discrete) sequences (focus) – Sunny/Rainy weather sequence – Human mobility – Web navigation – Song listening sequences
  4. 11 This lecture is about... • Modeling • Predicting •

    Pattern Mining Markov Chains S1 S1 S2 S2 S3 S3 1/2 1/2 1/3 2/3 1
  5. 13 Markov Chain Model • Stochastic Model • Transitions between

    states S1 S1 S2 S2 S3 S3 1/2 1/2 1/3 2/3 1 States Transition probabilities
  6. 14 Markov Chain Model • Markovian property – The next

    state in a sequence only depends on the current one, and not on a sequence of preceding ones S1 S1 S2 S2 S3 S3 1/2 1/2 1/3 2/3 1 States Transition probabilities
  7. 16 Formal definition • State space • Amounts to sequence

    of random variables • Markovian memoryless property
  8. 20 Maximum Likelihood (MLE) • Given some sequence data, how

    can we determine parameters? • MLE estimation Maximize! See ref [1] [1] http://journals.plos.org/plosone/articleid=10.1371/journal.pone.0102070
  9. 23 Pattern mining • Simply derived from (non-normalized) transition matrix

    90 2 2 1 Most common transition Sequential pattern
  10. 25 Full example 5 2 2 1 Transition counts 5/7

    2/7 2/3 1/3 Transition matrix (MLE)
  11. 26 Full example 5/7 2/7 2/3 1/3 Transition matrix (MLE)

    Likelihood of given sequence We calculate the probability of the sequence with the assumption that we start with sunny.
  12. 29 Higher order Markov Chain models • Drop the memoryless

    assumption? • Models of increasing order – 2nd order MC model – 3rd order MC model – ...
  13. 30 Higher order Markov Chain models • Drop the memoryless

    assumption? • Models of increasing order – 2nd order MC model – 3rd order MC model – ... 2nd order example depends on
  14. 31 Higher order to first order transformation • Transform state

    space • 2nd order example – new compound states
  15. 32 2nd order example 3 1 1 1 1 0

    1 1 3/4 1/4 1/2 1/2 1/1 0 1/2 1/2
  16. 33 Reset states R R ... R R R R

    • Marking start and end of sequences • Transformation easier (same #transitions)
  17. 34 Comparing models • 1st vs. 2nd order • Statistical

    model comparison necessary • Nested models → higher order always fits better • Account for potential overfitting
  18. 35 Model comparison • Likelihood ratio test – Ratio between

    likelihoods for order m and k – Follows a Chi2 distribution with degrees of freedom – Only for nested models • Akaike Information Criterion (AIC) – – The lower the better • Bayesian Information Criterion (BIC) – • Bayes Factors – Ratio of evidences (marginal likelihoods) • Cross validation See http://journals.plos.org/plosone/articleid=10.1371/journal.pone.0102070
  19. 36 AIC example R R ... R R R R

    5/8 2/8 2/3 1/3 R R 1/8 0/3 1/1 0/1 0/1 3/5 1/5 1/2 1/2 0 1/2 1/2 R R R R R R R 1/5 0 1/1 0 0 1/1 0 0 1/1 0 0 0 0 0 0 0 0 0 0 0 0 0 1st order parameters 2nd order parameters
  20. 37 AIC example 5/8 2/8 2/3 1/3 R R 1/8

    0/3 1/1 0/1 0/1 3/5 1/5 1/2 1/2 0 1/2 1/2 R R R R R R R 1/5 0 1/1 0 0 1/1 0 0 1/1 0 0 0 0 0 0 0 0 0 0 0 0 0 1st order parameters 2nd order parameters Example on blackboard
  21. 40 Hidden Markov Models • Extends Markov chain model •

    Hidden state sequence • Observed emissions What is the weather like?
  22. 41 Forward-Backward algorithm • Given emission sequence • Probability of

    emission sequence? • Probable sequence of hidden states? Hidden seq. Obs. seq. Check out YouTube tutorial: https://www.youtube.com/watch?v=7zDARfKVm7s Further material: cs229.stanford.edu/section/cs229-hmm.pdf
  23. 42 Setup 0.7 0.3 0.6 0.4 R 0.5 0.5 0.9

    0.2 0.1 0.8 R 0.5 0.5 Note: Literature usually uses a start probability and uniform end probability for the forward-backward algorithm.
  24. 43 Forward 0.7 0.3 0.6 0.4 R 0.5 0.5 0.9

    0.2 0.1 0.8 R 0.4 0.1 R 0.5 0.5
  25. 44 Forward 0.7 0.3 0.6 0.4 R 0.5 0.5 0.9

    0.2 0.1 0.8 R 0.4 0.1 0.034 0.144 R 0.5 0.5 What is the probability of going to each possible state at t2 given t1?
  26. 45 Forward 0.7 0.3 0.6 0.4 R 0.5 0.5 0.9

    0.2 0.1 0.8 R 0.4 0.1 0.034 0.144 0.011 0.061 R 0.5 0.5
  27. 46 Forward 0.7 0.3 0.6 0.4 R 0.5 0.5 0.9

    0.2 0.1 0.8 R 0.4 0.1 0.034 0.144 0.011 0.061 0.035 0.006 R 0.5 0.5 forward R 0.5 0.5 reset transition
  28. 47 Backwards 0.7 0.3 0.6 0.4 R 0.5 0.5 0.9

    0.2 0.1 0.8 R R 0.5 0.5 0.5 0.5
  29. 48 Backwards 0.7 0.3 0.6 0.4 R 0.5 0.5 0.9

    0.2 0.1 0.8 0.31 0.28 R 0.5 0.5 What is the probability of arriving at t4 given each possible state at t3? R 0.5 0.5
  30. 49 Backwards 0.7 0.3 0.6 0.4 R 0.5 0.5 0.9

    0.2 0.1 0.8 R 0.5 0.5 0.010 0.12 R 0.31 0.28 0.5 0.5
  31. 50 Backwards 0.7 0.3 0.6 0.4 R 0.5 0.5 0.9

    0.2 0.1 0.8 R 0.5 0.5 0.039 0.049 R 0.097 0.12 0.31 0.28 0.5 0.5 R backward reset emission
  32. 51 Forward-Backward Most likely state at t2 0.4 0.1 0.034

    0.144 0.011 0.061 0.035 0.006 0.039 0.049 0.097 0.12 0.31 0.28 0.5 0.5
  33. 52 Forward-Backward • Posterior decoding • Most likely state at

    each t • For most likely sequence: Viterbi algorithm
  34. 53 Learning parameters • Train parameters of HMM • No

    tractable solution for MLE known • Baum-Welch algorithm – Special case of EM algorithm – Uses Forward-Backward
  35. 56 Sequential Pattern Mining • PrefixSpan • Apriori Algorithm •

    GSP Algorithm • SPADE Reference: rakesh.agrawal-family.com/papers/icde95seq.pdf
  36. 57 Graphical models • Bayesian networks – Random variables –

    Conditional dependence – Directed acyclic graph • Markov random fields – Random variables – Markov property – Undirected graph