Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Modeling and Mining Sequential Data

Modeling and Mining Sequential Data

6dd5a1c14ac7614e279cb2a3ea112790?s=128

Philipp Singer

April 20, 2016
Tweet

More Decks by Philipp Singer

Other Decks in Education

Transcript

  1. Modeling and Mining Sequential Data Machine Learning and Data Mining

    Philipp Singer CC image courtesy of user puliarfanita on Flickr
  2. 2 What is sequential data?

  3. 3 Stock share price (Bitcoin) Screenshot from bitcoinwisdom.com

  4. 4 Daily degrees in Cologne Screenshot from google.com (data from

    weather.com)
  5. 5 Human mobility Screenshot from maps.google.com

  6. 6 Web navigation Austria Germany C.F. Gauss

  7. 7 Song listening sequences Screenshots from youtube.com

  8. 8 Let us distinguish two types of sequence data •

    Continuous time series • Categorical (discrete) sequences
  9. 9 Let us distinguish two types of sequence data •

    Continuous time series – Stock share price – Daily degree in Cologne • Categorical (discrete) sequences (focus) – Sunny/Rainy weather sequence – Human mobility – Web navigation – Song listening sequences
  10. 10 This lecture is about... • Modeling • Predicting •

    Pattern Mining
  11. 11 This lecture is about... • Modeling • Predicting •

    Pattern Mining Markov Chains S1 S1 S2 S2 S3 S3 1/2 1/2 1/3 2/3 1
  12. 12 Markov Chain Model

  13. 13 Markov Chain Model • Stochastic Model • Transitions between

    states S1 S1 S2 S2 S3 S3 1/2 1/2 1/3 2/3 1 States Transition probabilities
  14. 14 Markov Chain Model • Markovian property – The next

    state in a sequence only depends on the current one, and not on a sequence of preceding ones S1 S1 S2 S2 S3 S3 1/2 1/2 1/3 2/3 1 States Transition probabilities
  15. 15 Classic weather example 0.1 Sunny Rainy 0.9 0.5 0.5

  16. 16 Formal definition • State space • Amounts to sequence

    of random variables • Markovian memoryless property
  17. 17 Transition matrix Rows sum to 1 Transition matrix P

    Single transition probability
  18. 18 Example 0.1 Sunny Rainy 0.9 0.5 0.5 Transition matrix

  19. 19 Likelihood • Transition probabilities are parameters Transition probability Transition

    count
  20. 20 Maximum Likelihood (MLE) • Given some sequence data, how

    can we determine parameters? • MLE estimation Maximize! See ref [1] [1] http://journals.plos.org/plosone/articleid=10.1371/journal.pone.0102070
  21. 21 Prediction • Simply derived from transition probabilities ? One

    option: Take max prob.
  22. 22 Prediction • What about t+3? ?

  23. 23 Pattern mining • Simply derived from (non-normalized) transition matrix

    90 2 2 1 Most common transition Sequential pattern
  24. 24 Full example Training sequence

  25. 25 Full example 5 2 2 1 Transition counts 5/7

    2/7 2/3 1/3 Transition matrix (MLE)
  26. 26 Full example 5/7 2/7 2/3 1/3 Transition matrix (MLE)

    Likelihood of given sequence We calculate the probability of the sequence with the assumption that we start with sunny.
  27. 27 Full example 5/7 2/7 2/3 1/3 Transition matrix (MLE)

    ? Prediction?
  28. 28 Full example 5/7 2/7 2/3 1/3 Transition matrix (MLE)

    ? Prediction?
  29. 29 Higher order Markov Chain models • Drop the memoryless

    assumption? • Models of increasing order – 2nd order MC model – 3rd order MC model – ...
  30. 30 Higher order Markov Chain models • Drop the memoryless

    assumption? • Models of increasing order – 2nd order MC model – 3rd order MC model – ... 2nd order example depends on
  31. 31 Higher order to first order transformation • Transform state

    space • 2nd order example – new compound states
  32. 32 2nd order example 3 1 1 1 1 0

    1 1 3/4 1/4 1/2 1/2 1/1 0 1/2 1/2
  33. 33 Reset states R R ... R R R R

    • Marking start and end of sequences • Transformation easier (same #transitions)
  34. 34 Comparing models • 1st vs. 2nd order • Statistical

    model comparison necessary • Nested models → higher order always fits better • Account for potential overfitting
  35. 35 Model comparison • Likelihood ratio test – Ratio between

    likelihoods for order m and k – Follows a Chi2 distribution with degrees of freedom – Only for nested models • Akaike Information Criterion (AIC) – – The lower the better • Bayesian Information Criterion (BIC) – • Bayes Factors – Ratio of evidences (marginal likelihoods) • Cross validation See http://journals.plos.org/plosone/articleid=10.1371/journal.pone.0102070
  36. 36 AIC example R R ... R R R R

    5/8 2/8 2/3 1/3 R R 1/8 0/3 1/1 0/1 0/1 3/5 1/5 1/2 1/2 0 1/2 1/2 R R R R R R R 1/5 0 1/1 0 0 1/1 0 0 1/1 0 0 0 0 0 0 0 0 0 0 0 0 0 1st order parameters 2nd order parameters
  37. 37 AIC example 5/8 2/8 2/3 1/3 R R 1/8

    0/3 1/1 0/1 0/1 3/5 1/5 1/2 1/2 0 1/2 1/2 R R R R R R R 1/5 0 1/1 0 0 1/1 0 0 1/1 0 0 0 0 0 0 0 0 0 0 0 0 0 1st order parameters 2nd order parameters Example on blackboard
  38. 38 Markov Chain applications • Google's PageRank • DNA sequence

    modeling • Web navigation • Mobility
  39. 39 Hidden Markov Chain Model

  40. 40 Hidden Markov Models • Extends Markov chain model •

    Hidden state sequence • Observed emissions What is the weather like?
  41. 41 Forward-Backward algorithm • Given emission sequence • Probability of

    emission sequence? • Probable sequence of hidden states? Hidden seq. Obs. seq. Check out YouTube tutorial: https://www.youtube.com/watch?v=7zDARfKVm7s Further material: cs229.stanford.edu/section/cs229-hmm.pdf
  42. 42 Setup 0.7 0.3 0.6 0.4 R 0.5 0.5 0.9

    0.2 0.1 0.8 R 0.5 0.5 Note: Literature usually uses a start probability and uniform end probability for the forward-backward algorithm.
  43. 43 Forward 0.7 0.3 0.6 0.4 R 0.5 0.5 0.9

    0.2 0.1 0.8 R 0.4 0.1 R 0.5 0.5
  44. 44 Forward 0.7 0.3 0.6 0.4 R 0.5 0.5 0.9

    0.2 0.1 0.8 R 0.4 0.1 0.034 0.144 R 0.5 0.5 What is the probability of going to each possible state at t2 given t1?
  45. 45 Forward 0.7 0.3 0.6 0.4 R 0.5 0.5 0.9

    0.2 0.1 0.8 R 0.4 0.1 0.034 0.144 0.011 0.061 R 0.5 0.5
  46. 46 Forward 0.7 0.3 0.6 0.4 R 0.5 0.5 0.9

    0.2 0.1 0.8 R 0.4 0.1 0.034 0.144 0.011 0.061 0.035 0.006 R 0.5 0.5 forward R 0.5 0.5 reset transition
  47. 47 Backwards 0.7 0.3 0.6 0.4 R 0.5 0.5 0.9

    0.2 0.1 0.8 R R 0.5 0.5 0.5 0.5
  48. 48 Backwards 0.7 0.3 0.6 0.4 R 0.5 0.5 0.9

    0.2 0.1 0.8 0.31 0.28 R 0.5 0.5 What is the probability of arriving at t4 given each possible state at t3? R 0.5 0.5
  49. 49 Backwards 0.7 0.3 0.6 0.4 R 0.5 0.5 0.9

    0.2 0.1 0.8 R 0.5 0.5 0.010 0.12 R 0.31 0.28 0.5 0.5
  50. 50 Backwards 0.7 0.3 0.6 0.4 R 0.5 0.5 0.9

    0.2 0.1 0.8 R 0.5 0.5 0.039 0.049 R 0.097 0.12 0.31 0.28 0.5 0.5 R backward reset emission
  51. 51 Forward-Backward Most likely state at t2 0.4 0.1 0.034

    0.144 0.011 0.061 0.035 0.006 0.039 0.049 0.097 0.12 0.31 0.28 0.5 0.5
  52. 52 Forward-Backward • Posterior decoding • Most likely state at

    each t • For most likely sequence: Viterbi algorithm
  53. 53 Learning parameters • Train parameters of HMM • No

    tractable solution for MLE known • Baum-Welch algorithm – Special case of EM algorithm – Uses Forward-Backward
  54. 54 HMM applications • Speech recognition • POS tagging •

    Translation • Gene prediction
  55. 55 Other related methods

  56. 56 Sequential Pattern Mining • PrefixSpan • Apriori Algorithm •

    GSP Algorithm • SPADE Reference: rakesh.agrawal-family.com/papers/icde95seq.pdf
  57. 57 Graphical models • Bayesian networks – Random variables –

    Conditional dependence – Directed acyclic graph • Markov random fields – Random variables – Markov property – Undirected graph
  58. 58 Questions? Philipp Singer philipp.singer@gesis.org