Markov Chains and Markov State Models in Molecular Dynamics

Markov Chains

Outline We will cover: • Markov Chains by example of
the drunken walk. • Some of the key mathematical results (for MSMs). • Markov State Models for molecular dynamics. • Some of the stuff I found confusing. This will not be: • How to make a good Markov Model. • Proofs of all the things.

Reading These slides are the agglomeration and synthesis of the
following: • Chapter 2,3 and 4.1 of Takis Konstantopoulos’s lecture notes. • Wikipedia on Markov Chains is pretty good! Markov State Models for Molecular Dynamics: • Bowman, G. R. (2014), Advances in Experimental Medicine and Biology, 797, 7–22. • Pande, V. S., Beauchamp, K., & Bowman, G. R. (2010). • Noé, F., & Fischer, S. (2008). Current Opinion in Structural Biology, 18(2), 154–162.

Markov Chains • Model a stochastic process by assuming there
is no (or little) memory. • Used in lots of applications: • Speech Recognition (Siri etc). • Text prediction. • Molecular dynamics. • Google search.

A Drunken Walk 5

A Drunken Walk: Discrete Space and Time. 6 0 1
2 3 4 5 Markov Property: Where the drunk goes next depends only on their current position.

A Drunken Walk: transition probabilities. 7 0.8 0.2 0.9 0.1
0.1 0.9 0.7 0.3 0.5 0.5 0.5 0.5 0 1 2 3 4 5 Transition Probability: Given the drunk is in state ! at step ", the probability that they will transition to another state at step " + 1. Intuition check: Starting in state 0, what’s the probability of state 1 one step later, then state 2 one step after that?

Mathematical Notation • !" : Random variable for state at
step %. • &(!"() = +) : The probability of being in state + at step % + 1. Markov Property Again: & !"() = +"() !" = +" , !"0) = +"0) , … , !2 = +2 ) = &(!"() = +"() | !" = +" ) Short Hand (Time homogenous): 456 ≡ &(!"() = +| !" = 8)

Markov Property ? 0 1 2 3 4 5

Markov Property 0.5 0 1 2 3 4 5

Joint Probability and Initial Conditions. The evolution of a chain
follows our intuitive picture (proof?): ! "# = %# , "' = %' , … , ")*' = %)*' , ") = %) = ! "# = %# ! "' = %' "# = %# … ! ") = %) ")*' = %)*' ) We write this as: = , %# -./,.0 … -.10 ,.1

A Drunken Walk: End to end. 12 0.8 0.2 0.9
0.1 0.1 0.9 0.7 0.3 0.5 0.5 0.5 0.5 0 1 2 3 4 5 ! "# = 0, "' = 1, … , "* = 5 ? Assuming , 0 = 1.

Transition Matrix & Distributions • We’d like to ask more
general questions: • What’s the probability of being in state 5 after 10 steps, if I start in state 0? • What’s the distribution of states after 100 states, if I start in a random state? • What happens if we leave the drunk there for an infinite amount of time? • To do that, we start making use of matrices and linear algebra.

A Really Short Walk: Transition Matrix 14 0.1 0.5 0.5
0 1 ! = 0.5 0.5 0.1 0.9 0.9

Distributions • Even more notation! • Let ! be the
distribution of states: • !" ≔ $" % = ' (" = % )∈+ • Let !, be the initial distribution. • !- = [1, 0] – The drunk is definitely in state 0. • !- = [0.5, 0.5] – The drunk is somewhere 15

Distributions • The matrix makes evaluating the joint probabilities more
straight forward. • Distribution 1 step later: • "# = "% & • Distribution ' steps later: • "( = "% &)

Distributions Calculate !" , !$ ? What’s the probability of
being in state 1 after 3 steps? 0.1 0.5 0.5 0 1 % = 0.5 0.5 0.1 0.9 !, = [1, 0] 0.9

Stationarity • Consider a distribution with the following property: •
! = !# • Distribution does not change when put through the transition matrix. • This is a special distribution known as the Stationary Distribution. • Notice this is an eigenvalue problem!

Skip over lots of maths. • If a Markov chain
is irreducible and aperiodic then: lim $→& '$ → ( • Where each row of ( is ). • ) represents the long-term, or equilibrium distribution. • Furthermore, ) is unique • This sounds like a molecular dynamics system! • Expected return time for state * is 1/-. .

Why Care? Google Search - PageRank • Surf the web
as a random process. • Count how many links in and out each website has. • Use that to form transition matrix. 0.1 BLG BBC NYT

Why Care? Google Search - PageRank • Stationary distribution is
probability of being on a given website. • More important websites => more inward links. • The real algorithm is a lot more complicated now! 0.1 BLG BBC NYT 0.5 0.5 0.25 0.75

Calculate the stationary distribution. 0.1 0.5 0.5 0 1 !
= 0.5 0.5 0.1 0.9 (! = ( ? lim -→/ !- ? 0.9

Calculate the stationary distribution. 0.1 0.5 0.5 0 1 !
= 0.5 0.5 0.1 0.9 (! = ( ? lim -→/ !- = 0.167 0.833 0.167 0.833 0.9

Markov Chains: A Summary • A Markovian process that is
discrete in time, states or some combination. • Markov Property (multiple ways of saying the same thing): • The next state is only dependent on the current state. • The process is “memoryless”, it does not remember where it has been, only where it is now. • Stationary distribution: Time independent distribution. • For an irreducible Markov Chain, it is unique and the limiting distribution

Markov State Models For Molecular Dynamics The high level way
of thinking about a molecular system: • A model of N states, with rates of going between the different states. • Could be protein folding, or a reaction. • MSM: Use Markov Chains to analyse MD in a way that gives us this high level information 0.1 0.5 0.5 R P 0.9

Markov State Model Recipe 1. Run loads of MD. 2.
Project onto lower dimensional space (contacts, dihedrals, tICA) 3. Discretize the data by clustering in N microstates (kmeans, etc). 4. Count number of transitions between states after lag time ! 5. Convert from counts to transition matrix. 6. We have a Markov model! " # + 1 ! = " #! '(!) 7. Test Markov model is self-consistent. 8. Coarse-grain the model to make it more human-readable.

Some nice properties • Molecular systems should be ergodic so
the stationary distribution is equilibrium distribution. • Molecular systems satisfy detailed balance: !" #$,& = !$ #&,$ • The flux of stuff from i to j matches the flux from j to i. • These conditions mean that the eigenvalues can be interpreted as modes. F Noe, S. Fischer, Current Opinion in Structural Biology, 2008 W. Swope et al, JPC B, 2004

Eigenvectors and Modes • Eigenvector !"with eigenvalue #" • (1
− #" ) is the probability of transition described by !" occurring. • #( = 1: The equilibrium distribution • #" ~1: Slow modes – These are what we care about! • #" ~0: Fast modes. • Good review on this: F Noe, S. Fischer, Current Opinion in Structural Biology, 2008

Lagtimes • Clustering MD straight into microstates => Not Markovian.
• Have to discretize time further to ignore such effects. 0 1 2 3 4 5

Implied Timescales • Relaxation time of a mode: !" =
− % ln()" ) • Chapman-Kolmogrov: !" should be constant if we substitute for +% • Implied timescale plots do this. • Also help identify separation of timescales. http://www.emma- project.org/v2.4/generated/pentapeptide_msm.html

Conclusions • Markov Chains are a nice way of modelling
stochastic systems. • There are loads of different types & varieties. The irreducible variety described here have particularly useful properties. • Molecular systems map onto Markov chains well. • The process of building a Markov state model sounds straightforward, but the devil is in the detail. There are a lot of variables! • Discretization choice. • Coarse graining. • Sampling problems. • Rob’s talk next week will be on this problem.

Markov Chains and Markov State Models in Molecu...

Markov Chains and Markov State Models in Molecular Dynamics

Mike O'Connor

More Decks by Mike O'Connor

Other Decks in Science

Featured

Transcript

Markov Chains

Outline We will cover: • Markov Chains by example of

Reading These slides are the agglomeration and synthesis of the

Markov Chains • Model a stochastic process by assuming there

A Drunken Walk 5

A Drunken Walk: Discrete Space and Time. 6 0 1

A Drunken Walk: transition probabilities. 7 0.8 0.2 0.9 0.1

Mathematical Notation • !" : Random variable for state at

Markov Property ? 0 1 2 3 4 5

Markov Property 0.5 0 1 2 3 4 5

Joint Probability and Initial Conditions. The evolution of a chain

A Drunken Walk: End to end. 12 0.8 0.2 0.9

Transition Matrix & Distributions • We’d like to ask more

A Really Short Walk: Transition Matrix 14 0.1 0.5 0.5

Distributions • Even more notation! • Let ! be the

Distributions • The matrix makes evaluating the joint probabilities more

Distributions Calculate !" , !$ ? What’s the probability of

Stationarity • Consider a distribution with the following property: •

Skip over lots of maths. • If a Markov chain

Why Care? Google Search - PageRank • Surf the web

Why Care? Google Search - PageRank • Stationary distribution is

Calculate the stationary distribution. 0.1 0.5 0.5 0 1 !

Calculate the stationary distribution. 0.1 0.5 0.5 0 1 !

Markov Chains: A Summary • A Markovian process that is

Markov State Models For Molecular Dynamics The high level way

Markov State Model Recipe 1. Run loads of MD. 2.

Some nice properties • Molecular systems should be ergodic so

Eigenvectors and Modes • Eigenvector !"with eigenvalue #" • (1

Lagtimes • Clustering MD straight into microstates => Not Markovian.

Implied Timescales • Relaxation time of a mode: !" =

Conclusions • Markov Chains are a nice way of modelling