# Modeling and Mining Sequential Data

April 20, 2016

## Transcript

1. ### Modeling and Mining Sequential Data Machine Learning and Data Mining

Philipp Singer CC image courtesy of user puliarfanita on Flickr

weather.com)

8. ### 8 Let us distinguish two types of sequence data •

Continuous time series • Categorical (discrete) sequences
9. ### 9 Let us distinguish two types of sequence data •

Continuous time series – Stock share price – Daily degree in Cologne • Categorical (discrete) sequences (focus) – Sunny/Rainy weather sequence – Human mobility – Web navigation – Song listening sequences
10. ### 10 This lecture is about... • Modeling • Predicting •

Pattern Mining
11. ### 11 This lecture is about... • Modeling • Predicting •

Pattern Mining Markov Chains S1 S1 S2 S2 S3 S3 1/2 1/2 1/3 2/3 1

13. ### 13 Markov Chain Model • Stochastic Model • Transitions between

states S1 S1 S2 S2 S3 S3 1/2 1/2 1/3 2/3 1 States Transition probabilities
14. ### 14 Markov Chain Model • Markovian property – The next

state in a sequence only depends on the current one, and not on a sequence of preceding ones S1 S1 S2 S2 S3 S3 1/2 1/2 1/3 2/3 1 States Transition probabilities

16. ### 16 Formal definition • State space • Amounts to sequence

of random variables • Markovian memoryless property
17. ### 17 Transition matrix Rows sum to 1 Transition matrix P

Single transition probability

count
20. ### 20 Maximum Likelihood (MLE) • Given some sequence data, how

can we determine parameters? • MLE estimation Maximize! See ref   http://journals.plos.org/plosone/articleid=10.1371/journal.pone.0102070
21. ### 21 Prediction • Simply derived from transition probabilities ? One

option: Take max prob.

23. ### 23 Pattern mining • Simply derived from (non-normalized) transition matrix

90 2 2 1 Most common transition Sequential pattern

25. ### 25 Full example 5 2 2 1 Transition counts 5/7

2/7 2/3 1/3 Transition matrix (MLE)
26. ### 26 Full example 5/7 2/7 2/3 1/3 Transition matrix (MLE)

Likelihood of given sequence We calculate the probability of the sequence with the assumption that we start with sunny.
27. ### 27 Full example 5/7 2/7 2/3 1/3 Transition matrix (MLE)

? Prediction?
28. ### 28 Full example 5/7 2/7 2/3 1/3 Transition matrix (MLE)

? Prediction?
29. ### 29 Higher order Markov Chain models • Drop the memoryless

assumption? • Models of increasing order – 2nd order MC model – 3rd order MC model – ...
30. ### 30 Higher order Markov Chain models • Drop the memoryless

assumption? • Models of increasing order – 2nd order MC model – 3rd order MC model – ... 2nd order example depends on
31. ### 31 Higher order to first order transformation • Transform state

space • 2nd order example – new compound states
32. ### 32 2nd order example 3 1 1 1 1 0

1 1 3/4 1/4 1/2 1/2 1/1 0 1/2 1/2
33. ### 33 Reset states R R ... R R R R

• Marking start and end of sequences • Transformation easier (same #transitions)
34. ### 34 Comparing models • 1st vs. 2nd order • Statistical

model comparison necessary • Nested models → higher order always fits better • Account for potential overfitting
35. ### 35 Model comparison • Likelihood ratio test – Ratio between

likelihoods for order m and k – Follows a Chi2 distribution with degrees of freedom – Only for nested models • Akaike Information Criterion (AIC) – – The lower the better • Bayesian Information Criterion (BIC) – • Bayes Factors – Ratio of evidences (marginal likelihoods) • Cross validation See http://journals.plos.org/plosone/articleid=10.1371/journal.pone.0102070
36. ### 36 AIC example R R ... R R R R

5/8 2/8 2/3 1/3 R R 1/8 0/3 1/1 0/1 0/1 3/5 1/5 1/2 1/2 0 1/2 1/2 R R R R R R R 1/5 0 1/1 0 0 1/1 0 0 1/1 0 0 0 0 0 0 0 0 0 0 0 0 0 1st order parameters 2nd order parameters
37. ### 37 AIC example 5/8 2/8 2/3 1/3 R R 1/8

0/3 1/1 0/1 0/1 3/5 1/5 1/2 1/2 0 1/2 1/2 R R R R R R R 1/5 0 1/1 0 0 1/1 0 0 1/1 0 0 0 0 0 0 0 0 0 0 0 0 0 1st order parameters 2nd order parameters Example on blackboard
38. ### 38 Markov Chain applications • Google's PageRank • DNA sequence

modeling • Web navigation • Mobility

40. ### 40 Hidden Markov Models • Extends Markov chain model •

Hidden state sequence • Observed emissions What is the weather like?
41. ### 41 Forward-Backward algorithm • Given emission sequence • Probability of

emission sequence? • Probable sequence of hidden states? Hidden seq. Obs. seq. Check out YouTube tutorial: https://www.youtube.com/watch?v=7zDARfKVm7s Further material: cs229.stanford.edu/section/cs229-hmm.pdf
42. ### 42 Setup 0.7 0.3 0.6 0.4 R 0.5 0.5 0.9

0.2 0.1 0.8 R 0.5 0.5 Note: Literature usually uses a start probability and uniform end probability for the forward-backward algorithm.
43. ### 43 Forward 0.7 0.3 0.6 0.4 R 0.5 0.5 0.9

0.2 0.1 0.8 R 0.4 0.1 R 0.5 0.5
44. ### 44 Forward 0.7 0.3 0.6 0.4 R 0.5 0.5 0.9

0.2 0.1 0.8 R 0.4 0.1 0.034 0.144 R 0.5 0.5 What is the probability of going to each possible state at t2 given t1?
45. ### 45 Forward 0.7 0.3 0.6 0.4 R 0.5 0.5 0.9

0.2 0.1 0.8 R 0.4 0.1 0.034 0.144 0.011 0.061 R 0.5 0.5
46. ### 46 Forward 0.7 0.3 0.6 0.4 R 0.5 0.5 0.9

0.2 0.1 0.8 R 0.4 0.1 0.034 0.144 0.011 0.061 0.035 0.006 R 0.5 0.5 forward R 0.5 0.5 reset transition
47. ### 47 Backwards 0.7 0.3 0.6 0.4 R 0.5 0.5 0.9

0.2 0.1 0.8 R R 0.5 0.5 0.5 0.5
48. ### 48 Backwards 0.7 0.3 0.6 0.4 R 0.5 0.5 0.9

0.2 0.1 0.8 0.31 0.28 R 0.5 0.5 What is the probability of arriving at t4 given each possible state at t3? R 0.5 0.5
49. ### 49 Backwards 0.7 0.3 0.6 0.4 R 0.5 0.5 0.9

0.2 0.1 0.8 R 0.5 0.5 0.010 0.12 R 0.31 0.28 0.5 0.5
50. ### 50 Backwards 0.7 0.3 0.6 0.4 R 0.5 0.5 0.9

0.2 0.1 0.8 R 0.5 0.5 0.039 0.049 R 0.097 0.12 0.31 0.28 0.5 0.5 R backward reset emission
51. ### 51 Forward-Backward Most likely state at t2 0.4 0.1 0.034

0.144 0.011 0.061 0.035 0.006 0.039 0.049 0.097 0.12 0.31 0.28 0.5 0.5
52. ### 52 Forward-Backward • Posterior decoding • Most likely state at

each t • For most likely sequence: Viterbi algorithm
53. ### 53 Learning parameters • Train parameters of HMM • No

tractable solution for MLE known • Baum-Welch algorithm – Special case of EM algorithm – Uses Forward-Backward
54. ### 54 HMM applications • Speech recognition • POS tagging •

Translation • Gene prediction

56. ### 56 Sequential Pattern Mining • PrefixSpan • Apriori Algorithm •

GSP Algorithm • SPADE Reference: rakesh.agrawal-family.com/papers/icde95seq.pdf
57. ### 57 Graphical models • Bayesian networks – Random variables –

Conditional dependence – Directed acyclic graph • Markov random fields – Random variables – Markov property – Undirected graph