Modeling and Mining Sequential Data

Modeling and Mining Sequential Data

6dd5a1c14ac7614e279cb2a3ea112790?s=128

Philipp Singer

April 20, 2016
Tweet

Transcript

  1. Modeling and Mining Sequential Data Machine Learning and Data Mining

    Philipp Singer CC image courtesy of user puliarfanita on Flickr
  2. 2 What is sequential data?

  3. 3 Stock share price (Bitcoin) Screenshot from bitcoinwisdom.com

  4. 4 Daily degrees in Cologne Screenshot from google.com (data from

    weather.com)
  5. 5 Human mobility Screenshot from maps.google.com

  6. 6 Web navigation Austria Germany C.F. Gauss

  7. 7 Song listening sequences Screenshots from youtube.com

  8. 8 Let us distinguish two types of sequence data •

    Continuous time series • Categorical (discrete) sequences
  9. 9 Let us distinguish two types of sequence data •

    Continuous time series – Stock share price – Daily degree in Cologne • Categorical (discrete) sequences (focus) – Sunny/Rainy weather sequence – Human mobility – Web navigation – Song listening sequences
  10. 10 This lecture is about... • Modeling • Predicting •

    Pattern Mining
  11. 11 This lecture is about... • Modeling • Predicting •

    Pattern Mining Markov Chains S1 S1 S2 S2 S3 S3 1/2 1/2 1/3 2/3 1
  12. 12 Markov Chain Model

  13. 13 Markov Chain Model • Stochastic Model • Transitions between

    states S1 S1 S2 S2 S3 S3 1/2 1/2 1/3 2/3 1 States Transition probabilities
  14. 14 Markov Chain Model • Markovian property – The next

    state in a sequence only depends on the current one, and not on a sequence of preceding ones S1 S1 S2 S2 S3 S3 1/2 1/2 1/3 2/3 1 States Transition probabilities
  15. 15 Classic weather example 0.1 Sunny Rainy 0.9 0.5 0.5

  16. 16 Formal definition • State space • Amounts to sequence

    of random variables • Markovian memoryless property
  17. 17 Transition matrix Rows sum to 1 Transition matrix P

    Single transition probability
  18. 18 Example 0.1 Sunny Rainy 0.9 0.5 0.5 Transition matrix

  19. 19 Likelihood • Transition probabilities are parameters Transition probability Transition

    count
  20. 20 Maximum Likelihood (MLE) • Given some sequence data, how

    can we determine parameters? • MLE estimation Maximize! See ref [1] [1] http://journals.plos.org/plosone/articleid=10.1371/journal.pone.0102070
  21. 21 Prediction • Simply derived from transition probabilities ? One

    option: Take max prob.
  22. 22 Prediction • What about t+3? ?

  23. 23 Pattern mining • Simply derived from (non-normalized) transition matrix

    90 2 2 1 Most common transition Sequential pattern
  24. 24 Full example Training sequence

  25. 25 Full example 5 2 2 1 Transition counts 5/7

    2/7 2/3 1/3 Transition matrix (MLE)
  26. 26 Full example 5/7 2/7 2/3 1/3 Transition matrix (MLE)

    Likelihood of given sequence We calculate the probability of the sequence with the assumption that we start with sunny.
  27. 27 Full example 5/7 2/7 2/3 1/3 Transition matrix (MLE)

    ? Prediction?
  28. 28 Full example 5/7 2/7 2/3 1/3 Transition matrix (MLE)

    ? Prediction?
  29. 29 Higher order Markov Chain models • Drop the memoryless

    assumption? • Models of increasing order – 2nd order MC model – 3rd order MC model – ...
  30. 30 Higher order Markov Chain models • Drop the memoryless

    assumption? • Models of increasing order – 2nd order MC model – 3rd order MC model – ... 2nd order example depends on
  31. 31 Higher order to first order transformation • Transform state

    space • 2nd order example – new compound states
  32. 32 2nd order example 3 1 1 1 1 0

    1 1 3/4 1/4 1/2 1/2 1/1 0 1/2 1/2
  33. 33 Reset states R R ... R R R R

    • Marking start and end of sequences • Transformation easier (same #transitions)
  34. 34 Comparing models • 1st vs. 2nd order • Statistical

    model comparison necessary • Nested models → higher order always fits better • Account for potential overfitting
  35. 35 Model comparison • Likelihood ratio test – Ratio between

    likelihoods for order m and k – Follows a Chi2 distribution with degrees of freedom – Only for nested models • Akaike Information Criterion (AIC) – – The lower the better • Bayesian Information Criterion (BIC) – • Bayes Factors – Ratio of evidences (marginal likelihoods) • Cross validation See http://journals.plos.org/plosone/articleid=10.1371/journal.pone.0102070
  36. 36 AIC example R R ... R R R R

    5/8 2/8 2/3 1/3 R R 1/8 0/3 1/1 0/1 0/1 3/5 1/5 1/2 1/2 0 1/2 1/2 R R R R R R R 1/5 0 1/1 0 0 1/1 0 0 1/1 0 0 0 0 0 0 0 0 0 0 0 0 0 1st order parameters 2nd order parameters
  37. 37 AIC example 5/8 2/8 2/3 1/3 R R 1/8

    0/3 1/1 0/1 0/1 3/5 1/5 1/2 1/2 0 1/2 1/2 R R R R R R R 1/5 0 1/1 0 0 1/1 0 0 1/1 0 0 0 0 0 0 0 0 0 0 0 0 0 1st order parameters 2nd order parameters Example on blackboard
  38. 38 Markov Chain applications • Google's PageRank • DNA sequence

    modeling • Web navigation • Mobility
  39. 39 Hidden Markov Chain Model

  40. 40 Hidden Markov Models • Extends Markov chain model •

    Hidden state sequence • Observed emissions What is the weather like?
  41. 41 Forward-Backward algorithm • Given emission sequence • Probability of

    emission sequence? • Probable sequence of hidden states? Hidden seq. Obs. seq. Check out YouTube tutorial: https://www.youtube.com/watch?v=7zDARfKVm7s Further material: cs229.stanford.edu/section/cs229-hmm.pdf
  42. 42 Setup 0.7 0.3 0.6 0.4 R 0.5 0.5 0.9

    0.2 0.1 0.8 R 0.5 0.5 Note: Literature usually uses a start probability and uniform end probability for the forward-backward algorithm.
  43. 43 Forward 0.7 0.3 0.6 0.4 R 0.5 0.5 0.9

    0.2 0.1 0.8 R 0.4 0.1 R 0.5 0.5
  44. 44 Forward 0.7 0.3 0.6 0.4 R 0.5 0.5 0.9

    0.2 0.1 0.8 R 0.4 0.1 0.034 0.144 R 0.5 0.5 What is the probability of going to each possible state at t2 given t1?
  45. 45 Forward 0.7 0.3 0.6 0.4 R 0.5 0.5 0.9

    0.2 0.1 0.8 R 0.4 0.1 0.034 0.144 0.011 0.061 R 0.5 0.5
  46. 46 Forward 0.7 0.3 0.6 0.4 R 0.5 0.5 0.9

    0.2 0.1 0.8 R 0.4 0.1 0.034 0.144 0.011 0.061 0.035 0.006 R 0.5 0.5 forward R 0.5 0.5 reset transition
  47. 47 Backwards 0.7 0.3 0.6 0.4 R 0.5 0.5 0.9

    0.2 0.1 0.8 R R 0.5 0.5 0.5 0.5
  48. 48 Backwards 0.7 0.3 0.6 0.4 R 0.5 0.5 0.9

    0.2 0.1 0.8 0.31 0.28 R 0.5 0.5 What is the probability of arriving at t4 given each possible state at t3? R 0.5 0.5
  49. 49 Backwards 0.7 0.3 0.6 0.4 R 0.5 0.5 0.9

    0.2 0.1 0.8 R 0.5 0.5 0.010 0.12 R 0.31 0.28 0.5 0.5
  50. 50 Backwards 0.7 0.3 0.6 0.4 R 0.5 0.5 0.9

    0.2 0.1 0.8 R 0.5 0.5 0.039 0.049 R 0.097 0.12 0.31 0.28 0.5 0.5 R backward reset emission
  51. 51 Forward-Backward Most likely state at t2 0.4 0.1 0.034

    0.144 0.011 0.061 0.035 0.006 0.039 0.049 0.097 0.12 0.31 0.28 0.5 0.5
  52. 52 Forward-Backward • Posterior decoding • Most likely state at

    each t • For most likely sequence: Viterbi algorithm
  53. 53 Learning parameters • Train parameters of HMM • No

    tractable solution for MLE known • Baum-Welch algorithm – Special case of EM algorithm – Uses Forward-Backward
  54. 54 HMM applications • Speech recognition • POS tagging •

    Translation • Gene prediction
  55. 55 Other related methods

  56. 56 Sequential Pattern Mining • PrefixSpan • Apriori Algorithm •

    GSP Algorithm • SPADE Reference: rakesh.agrawal-family.com/papers/icde95seq.pdf
  57. 57 Graphical models • Bayesian networks – Random variables –

    Conditional dependence – Directed acyclic graph • Markov random fields – Random variables – Markov property – Undirected graph
  58. 58 Questions? Philipp Singer philipp.singer@gesis.org