Modeling and Mining Sequential Data

Modeling and Mining Sequential Data Machine Learning and Data Mining
Philipp Singer CC image courtesy of user puliarfanita on Flickr

2 What is sequential data?

3 Stock share price (Bitcoin) Screenshot from bitcoinwisdom.com

4 Daily degrees in Cologne Screenshot from google.com (data from
weather.com)

5 Human mobility Screenshot from maps.google.com

6 Web navigation Austria Germany C.F. Gauss

7 Song listening sequences Screenshots from youtube.com

8 Let us distinguish two types of sequence data •
Continuous time series • Categorical (discrete) sequences

9 Let us distinguish two types of sequence data •
Continuous time series – Stock share price – Daily degree in Cologne • Categorical (discrete) sequences (focus) – Sunny/Rainy weather sequence – Human mobility – Web navigation – Song listening sequences

10 This lecture is about... • Modeling • Predicting •
Pattern Mining

11 This lecture is about... • Modeling • Predicting •
Pattern Mining Markov Chains S1 S1 S2 S2 S3 S3 1/2 1/2 1/3 2/3 1

12 Markov Chain Model

13 Markov Chain Model • Stochastic Model • Transitions between
states S1 S1 S2 S2 S3 S3 1/2 1/2 1/3 2/3 1 States Transition probabilities

14 Markov Chain Model • Markovian property – The next
state in a sequence only depends on the current one, and not on a sequence of preceding ones S1 S1 S2 S2 S3 S3 1/2 1/2 1/3 2/3 1 States Transition probabilities

15 Classic weather example 0.1 Sunny Rainy 0.9 0.5 0.5

16 Formal definition • State space • Amounts to sequence
of random variables • Markovian memoryless property

17 Transition matrix Rows sum to 1 Transition matrix P
Single transition probability

18 Example 0.1 Sunny Rainy 0.9 0.5 0.5 Transition matrix

19 Likelihood • Transition probabilities are parameters Transition probability Transition
count

20 Maximum Likelihood (MLE) • Given some sequence data, how
can we determine parameters? • MLE estimation Maximize! See ref [1] [1] http://journals.plos.org/plosone/articleid=10.1371/journal.pone.0102070

21 Prediction • Simply derived from transition probabilities ? One
option: Take max prob.

22 Prediction • What about t+3? ?

23 Pattern mining • Simply derived from (non-normalized) transition matrix
90 2 2 1 Most common transition Sequential pattern

24 Full example Training sequence

25 Full example 5 2 2 1 Transition counts 5/7
2/7 2/3 1/3 Transition matrix (MLE)

26 Full example 5/7 2/7 2/3 1/3 Transition matrix (MLE)
Likelihood of given sequence We calculate the probability of the sequence with the assumption that we start with sunny.

? Prediction?

29 Higher order Markov Chain models • Drop the memoryless
assumption? • Models of increasing order – 2nd order MC model – 3rd order MC model – ...

30 Higher order Markov Chain models • Drop the memoryless
assumption? • Models of increasing order – 2nd order MC model – 3rd order MC model – ... 2nd order example depends on

31 Higher order to first order transformation • Transform state
space • 2nd order example – new compound states

32 2nd order example 3 1 1 1 1 0
1 1 3/4 1/4 1/2 1/2 1/1 0 1/2 1/2

33 Reset states R R ... R R R R
• Marking start and end of sequences • Transformation easier (same #transitions)

34 Comparing models • 1st vs. 2nd order • Statistical
model comparison necessary • Nested models → higher order always fits better • Account for potential overfitting

35 Model comparison • Likelihood ratio test – Ratio between
likelihoods for order m and k – Follows a Chi2 distribution with degrees of freedom – Only for nested models • Akaike Information Criterion (AIC) – – The lower the better • Bayesian Information Criterion (BIC) – • Bayes Factors – Ratio of evidences (marginal likelihoods) • Cross validation See http://journals.plos.org/plosone/articleid=10.1371/journal.pone.0102070

36 AIC example R R ... R R R R
5/8 2/8 2/3 1/3 R R 1/8 0/3 1/1 0/1 0/1 3/5 1/5 1/2 1/2 0 1/2 1/2 R R R R R R R 1/5 0 1/1 0 0 1/1 0 0 1/1 0 0 0 0 0 0 0 0 0 0 0 0 0 1st order parameters 2nd order parameters

37 AIC example 5/8 2/8 2/3 1/3 R R 1/8
0/3 1/1 0/1 0/1 3/5 1/5 1/2 1/2 0 1/2 1/2 R R R R R R R 1/5 0 1/1 0 0 1/1 0 0 1/1 0 0 0 0 0 0 0 0 0 0 0 0 0 1st order parameters 2nd order parameters Example on blackboard

38 Markov Chain applications • Google's PageRank • DNA sequence
modeling • Web navigation • Mobility

39 Hidden Markov Chain Model

40 Hidden Markov Models • Extends Markov chain model •
Hidden state sequence • Observed emissions What is the weather like?

41 Forward-Backward algorithm • Given emission sequence • Probability of
emission sequence? • Probable sequence of hidden states? Hidden seq. Obs. seq. Check out YouTube tutorial: https://www.youtube.com/watch?v=7zDARfKVm7s Further material: cs229.stanford.edu/section/cs229-hmm.pdf

42 Setup 0.7 0.3 0.6 0.4 R 0.5 0.5 0.9
0.2 0.1 0.8 R 0.5 0.5 Note: Literature usually uses a start probability and uniform end probability for the forward-backward algorithm.

43 Forward 0.7 0.3 0.6 0.4 R 0.5 0.5 0.9
0.2 0.1 0.8 R 0.4 0.1 R 0.5 0.5

44 Forward 0.7 0.3 0.6 0.4 R 0.5 0.5 0.9
0.2 0.1 0.8 R 0.4 0.1 0.034 0.144 R 0.5 0.5 What is the probability of going to each possible state at t2 given t1?

45 Forward 0.7 0.3 0.6 0.4 R 0.5 0.5 0.9
0.2 0.1 0.8 R 0.4 0.1 0.034 0.144 0.011 0.061 R 0.5 0.5

46 Forward 0.7 0.3 0.6 0.4 R 0.5 0.5 0.9
0.2 0.1 0.8 R 0.4 0.1 0.034 0.144 0.011 0.061 0.035 0.006 R 0.5 0.5 forward R 0.5 0.5 reset transition

47 Backwards 0.7 0.3 0.6 0.4 R 0.5 0.5 0.9
0.2 0.1 0.8 R R 0.5 0.5 0.5 0.5

48 Backwards 0.7 0.3 0.6 0.4 R 0.5 0.5 0.9
0.2 0.1 0.8 0.31 0.28 R 0.5 0.5 What is the probability of arriving at t4 given each possible state at t3? R 0.5 0.5

49 Backwards 0.7 0.3 0.6 0.4 R 0.5 0.5 0.9
0.2 0.1 0.8 R 0.5 0.5 0.010 0.12 R 0.31 0.28 0.5 0.5

50 Backwards 0.7 0.3 0.6 0.4 R 0.5 0.5 0.9
0.2 0.1 0.8 R 0.5 0.5 0.039 0.049 R 0.097 0.12 0.31 0.28 0.5 0.5 R backward reset emission

51 Forward-Backward Most likely state at t2 0.4 0.1 0.034
0.144 0.011 0.061 0.035 0.006 0.039 0.049 0.097 0.12 0.31 0.28 0.5 0.5

52 Forward-Backward • Posterior decoding • Most likely state at
each t • For most likely sequence: Viterbi algorithm

53 Learning parameters • Train parameters of HMM • No
tractable solution for MLE known • Baum-Welch algorithm – Special case of EM algorithm – Uses Forward-Backward

54 HMM applications • Speech recognition • POS tagging •
Translation • Gene prediction

55 Other related methods

56 Sequential Pattern Mining • PrefixSpan • Apriori Algorithm •
GSP Algorithm • SPADE Reference: rakesh.agrawal-family.com/papers/icde95seq.pdf

57 Graphical models • Bayesian networks – Random variables –
Conditional dependence – Directed acyclic graph • Markov random fields – Random variables – Markov property – Undirected graph

58 Questions? Philipp Singer [email protected]

Modeling and Mining Sequential Data

Modeling and Mining Sequential Data

More Decks by Philipp Singer

Other Decks in Education

Featured

Transcript