100

# Fun with Markov Chains (DevSpace 2016)

Presented at DevSpace 2016 -- Have you seen those funny accounts online that are strange, yet almost coherent? The twitter accounts like captain_markov & horse_ebooks, or the venerable King James Programming on tumblr?

They're generated by Markov Chains! This talk will take you on a journey into one of the original markov chain bots, dig into some python source code, and explain how markov chains work without using too many mathematical equations.

October 15, 2016

## Transcript

2. ### what is a markov chain? A stochastic process… sequence of

random variables… serial dependence only between adjacent periods… …systems that follow a chain of linked events, where what happens next depends only on the current state of the system.

6. ### markov models & text A Mathematical Theory of Communication 1949

paper by Claude Shannon

8. ### Example from Wikipedia: A creature who eats only grapes, cheese,

or lettuce, based on these rules: • It eats once a day. • If it ate cheese today, tomorrow it will eat lettuce or grapes with equal probability. • If it ate grapes today, tomorrow it will eat • grapes with probability 1/10 • cheese with probability 4/10 • lettuce with probability 5/10. • If it ate lettuce today, tomorrow it will eat • grapes with probability 4/10 • cheese with probability 6/10 • It will not eat lettuce again tomorrow.
9. ### Example from Wikipedia: A creature who eats only grapes, cheese,

or lettuce, based on these rules: • It eats once a day. • If it ate cheese today, tomorrow it will eat lettuce or grapes with equal probability. • If it ate grapes today, tomorrow it will eat • grapes with probability 1/10 • cheese with probability 4/10 • lettuce with probability 5/10. • If it ate lettuce today, tomorrow it will eat • grapes with probability 4/10 • cheese with probability 6/10 • It will not eat lettuce again tomorrow.
10. ### Example from Wikipedia: A creature who eats only grapes, cheese,

or lettuce, based on these rules: • It eats once a day. • If it ate cheese today, tomorrow it will eat lettuce or grapes with equal probability. • If it ate grapes today, tomorrow it will eat • grapes with probability 1/10 • cheese with probability 4/10 • lettuce with probability 5/10. • If it ate lettuce today, tomorrow it will eat • grapes with probability 4/10 • cheese with probability 6/10 • It will not eat lettuce again tomorrow.
11. ### Example from Wikipedia: A creature who eats only grapes, cheese,

or lettuce, based on these rules: • It eats once a day. • If it ate cheese today, tomorrow it will eat lettuce or grapes with equal probability. • If it ate grapes today, tomorrow it will eat • grapes with probability 1/10 • cheese with probability 4/10 • lettuce with probability 5/10. • If it ate lettuce today, tomorrow it will eat • grapes with probability 4/10 • cheese with probability 6/10 • It will not eat lettuce again tomorrow.
12. ### Example from Wikipedia: A creature who eats only grapes, cheese,

or lettuce, based on these rules: • It eats once a day. • If it ate cheese today, tomorrow it will eat lettuce or grapes with equal probability. • If it ate grapes today, tomorrow it will eat • grapes with probability 1/10 • cheese with probability 4/10 • lettuce with probability 5/10. • If it ate lettuce today, tomorrow it will eat • grapes with probability 4/10 • cheese with probability 6/10 • It will not eat lettuce again tomorrow.
13. ### physics thermodynamics chemistry speech recognition information theory (pattern recognition &

data compression) bioinformatics / DNA sequencing economics / ﬁnance games music etc.

15. ### – King James Programming http://kingjamesprogramming.tumblr.com/ “17:20 O LORD, there is

none end of the regexp.”

net.singles
20. ### How does it work? 1. Train it on a bunch

of text. 2. Let it generate some text. 3. Commence Hijinks
21. ### – Edgar Allen Frost “The woods are lovely dark and

deep But I have not taken My sorrow I could bear no more.”
22. ### A Patch of Old Snow There's a patch of old

snow in a corner That I should have guessed Was a blow-away paper the rain Had brought to rest. It is speckled with grime as if Small print overspread it, The news of a day I've forgotten-- If I ever read it.
23. ### A Patch of Old Snow There's a patch of old

snow in a corner That I should have guessed Was a blow-away paper the rain Had brought to rest. It is speckled with grime as if Small print overspread it, The news of a day I've forgotten-- If I ever read it.
24. ### A Patch of Old Snow There's a patch of old

snow in a corner That I should have guessed Was a blow-away paper the rain Had brought to rest. It is speckled with grime as if Small print overspread it, The news of a day I've forgotten-- If I ever read it.
25. ### A Patch of Old Snow There's a patch of old

snow in a corner That I should have guessed Was a blow-away paper the rain Had brought to rest. It is speckled with grime as if Small print overspread it, The news of a day I've forgotten-- If I ever read it.
26. ### A Patch of Old Snow There's a patch of old

snow in a corner That I should have guessed Was a blow-away paper the rain Had brought to rest. It is speckled with grime as if Small print overspread it, The news of a day I've forgotten-- If I ever read it.
27. ### { (‘And', 'the'): ['sweet', 'pear', 'cloud', 'stars', 'silken', 'only', 'Raven',

‘lamplight’], }
28. ### { (‘And', 'the'): ['sweet', 'pear', 'cloud', 'stars', 'silken', 'only', 'Raven',

‘lamplight’], } A word pair.
29. ### { (‘And', 'the'): ['sweet', 'pear', 'cloud', 'stars', 'silken', 'only', 'Raven',

‘lamplight’], } A word pair. Words that always follow the pair
30. ### A Patch of Old Snow There's a patch of old

snow in a corner That I should have guessed Was a blow-away paper the rain Had brought to rest. It is speckled with grime as if Small print overspread it, The news of a day I've forgotten-- If I ever read it.
31. ### [ ('my', 'loss.'), ('all', 'windstirred.'), ('to', 'rest.'), ('read', 'it.'), ('time',

'talk.'), ('friendly', 'visit.'), ('in', 'ice.'), ('favour', 'fire.'), ('would', ‘suffice.'), … ] A list of all line endings.
32. ### Generate Text! { (‘And', 'the'): ['sweet', 'pear', 'cloud', 'stars', 'silken',

'only', 'Raven', ‘lamplight’], } [ ('my', 'loss.'), ('all', 'windstirred.'), ('to', 'rest.'), ('read', 'it.'), ('time', 'talk.'), ('friendly', 'visit.'), ('in', 'ice.'), ('favour', 'fire.'), ('would', ‘suffice.'), … ]

34. ### 1. Pick a random ending pair. 2. Use that as

a key to pick a random word { (‘still', ‘abide.’): [‘But'] }
35. ### 1. Pick a random ending pair. 2. Use that as

a key to pick a random word { (‘still', ‘abide.’): [‘But'] } Output: But key: (‘abide.’, ‘But’) 3. Keep the word; use it as part of your key.
36. ### Output: But lookup: {(‘abide', ‘But’): [‘tis']} 4. Keep doing this…

Output: But tis lookup: {(‘But', ‘tis’): [‘not']} Output: But tis not lookup: {(‘tis', ‘not’): [‘true']} Output: But tis not true that lookup: {(‘not', ‘true’): [‘that']}
37. ### Output: But lookup: {(‘abide', ‘But’): [‘tis']} 4. Keep doing this…

Output: But tis lookup: {(‘But', ‘tis’): [‘not']} Output: But tis not lookup: {(‘tis', ‘not’): [‘true']} Output: But tis not true that lookup: {(‘not', ‘true’): [‘that']}
38. ### Output: But lookup: {(‘abide', ‘But’): [‘tis']} 4. Keep doing this…

Output: But tis lookup: {(‘But', ‘tis’): [‘not']} Output: But tis not lookup: {(‘tis', ‘not’): [‘true']} Output: But tis not true that lookup: {(‘not', ‘true’): [‘that']}
39. ### Output: But tis not true that thus lookup: {(‘that', ‘thus’):

[‘I']} Output: But tis not true that thus I lookup: {(‘thus', ‘I’): [‘dwelt']} Output: But tis not true that thus I dwelt lookup: {(‘I', ‘dwelt’): [‘aloof']} Output: But tis not true that thus I dwelt aloof lookup: {(‘dwelt', ‘aloof’): [‘For']}
40. ### Key:(‘read', ‘it.’) 5. Until your key is a sentence ending.

But tis not true that thus I dwelt aloof For the rare and radiant maiden whom the angels in Heaven above Nor the demons down under the sea In her sepulchre there by the sea A wind blew out of a day I've forgotten If I ever read it.
41. ### def train(filename, verbose=False): endings = set() # Pairs of words

that end a sentence # Our Data Dict: # - Key: Pairs of words (a tuple) # - Values: List of words that follow those pairs. data = {} # and here's a pair of words that we encounter in a text. prev1 = '' prev2 = '' # Load up a dictionary of data from the input files. for word in read(filename, verbose): if prev1 != '' and prev2 != '': key = (prev2, prev1) if key in data: data[key].append(word) else: data[key] = [word] if _is_ending(key[-1]): endings.add(key) prev2 = prev1 prev1 = word return {'content': data, 'endings': list(endings)}
42. ### def train(filename, verbose=False): endings = set() # Pairs of words

that end a sentence # Our Data Dict: # - Key: Pairs of words (a tuple) # - Values: List of words that follow those pairs. data = {} # and here's a pair of words that we encounter in a text. prev1 = '' prev2 = '' # Load up a dictionary of data from the input files. for word in read(filename, verbose): if prev1 != '' and prev2 != '': key = (prev2, prev1) if key in data: data[key].append(word) else: data[key] = [word] if _is_ending(key[-1]): endings.add(key) prev2 = prev1 prev1 = word return {'content': data, 'endings': list(endings)}
43. ### def train(filename, verbose=False): endings = set() # Pairs of words

that end a sentence # Our Data Dict: # - Key: Pairs of words (a tuple) # - Values: List of words that follow those pairs. data = {} # and here's a pair of words that we encounter in a text. prev1 = '' prev2 = '' # Load up a dictionary of data from the input files. for word in read(filename, verbose): if prev1 != '' and prev2 != '': key = (prev2, prev1) if key in data: data[key].append(word) else: data[key] = [word] if _is_ending(key[-1]): endings.add(key) prev2 = prev1 prev1 = word return {'content': data, 'endings': list(endings)}
44. ### def train(filename, verbose=False): endings = set() # Pairs of words

that end a sentence # Our Data Dict: # - Key: Pairs of words (a tuple) # - Values: List of words that follow those pairs. data = {} # and here's a pair of words that we encounter in a text. prev1 = '' prev2 = '' # Load up a dictionary of data from the input files. for word in read(filename, verbose): if prev1 != '' and prev2 != '': key = (prev2, prev1) if key in data: data[key].append(word) else: data[key] = [word] if _is_ending(key[-1]): endings.add(key) prev2 = prev1 prev1 = word return {'content': data, 'endings': list(endings)}
45. ### def train(filename, verbose=False): endings = set() # Pairs of words

that end a sentence # Our Data Dict: # - Key: Pairs of words (a tuple) # - Values: List of words that follow those pairs. data = {} # and here's a pair of words that we encounter in a text. prev1 = '' prev2 = '' # Load up a dictionary of data from the input files. for word in read(filename, verbose): if prev1 != '' and prev2 != '': key = (prev2, prev1) if key in data: data[key].append(word) else: data[key] = [word] if _is_ending(key[-1]): endings.add(key) prev2 = prev1 prev1 = word return {'content': data, 'endings': list(endings)}
46. ### def train(filename, verbose=False): endings = set() # Pairs of words

that end a sentence # Our Data Dict: # - Key: Pairs of words (a tuple) # - Values: List of words that follow those pairs. data = {} # and here's a pair of words that we encounter in a text. prev1 = '' prev2 = '' # Load up a dictionary of data from the input files. for word in read(filename, verbose): if prev1 != '' and prev2 != '': key = (prev2, prev1) if key in data: data[key].append(word) else: data[key] = [word] if _is_ending(key[-1]): endings.add(key) prev2 = prev1 prev1 = word return {'content': data, 'endings': list(endings)}
47. ### def train(filename, verbose=False): endings = set() # Pairs of words

that end a sentence # Our Data Dict: # - Key: Pairs of words (a tuple) # - Values: List of words that follow those pairs. data = {} # and here's a pair of words that we encounter in a text. prev1 = '' prev2 = '' # Load up a dictionary of data from the input files. for word in read(filename, verbose): if prev1 != '' and prev2 != '': key = (prev2, prev1) if key in data: data[key].append(word) else: data[key] = [word] if _is_ending(key[-1]): endings.add(key) prev2 = prev1 prev1 = word return {'content': data, 'endings': list(endings)}
48. ### def train(filename, verbose=False): endings = set() # Pairs of words

that end a sentence # Our Data Dict: # - Key: Pairs of words (a tuple) # - Values: List of words that follow those pairs. data = {} # and here's a pair of words that we encounter in a text. prev1 = '' prev2 = '' # Load up a dictionary of data from the input files. for word in read(filename, verbose): if prev1 != '' and prev2 != '': key = (prev2, prev1) if key in data: data[key].append(word) else: data[key] = [word] if _is_ending(key[-1]): endings.add(key) prev2 = prev1 prev1 = word return {'content': data, 'endings': list(endings)}
49. ### def train(filename, verbose=False): endings = set() # Pairs of words

that end a sentence # Our Data Dict: # - Key: Pairs of words (a tuple) # - Values: List of words that follow those pairs. data = {} # and here's a pair of words that we encounter in a text. prev1 = '' prev2 = '' # Load up a dictionary of data from the input files. for word in read(filename, verbose): if prev1 != '' and prev2 != '': key = (prev2, prev1) if key in data: data[key].append(word) else: data[key] = [word] if _is_ending(key[-1]): endings.add(key) prev2 = prev1 prev1 = word return {'content': data, 'endings': list(endings)}
50. ### def train(filename, verbose=False): endings = set() # Pairs of words

that end a sentence # Our Data Dict: # - Key: Pairs of words (a tuple) # - Values: List of words that follow those pairs. data = {} # and here's a pair of words that we encounter in a text. prev1 = '' prev2 = '' # Load up a dictionary of data from the input files. for word in read(filename, verbose): if prev1 != '' and prev2 != '': key = (prev2, prev1) if key in data: data[key].append(word) else: data[key] = [word] if _is_ending(key[-1]): endings.add(key) prev2 = prev1 prev1 = word return {'content': data, 'endings': list(endings)}
51. ### def generate_string(data, verbose=True): """Generate a single string from the markov

data.""" output = "" key = () while True: if key in data['content']: word = random.choice(data['content'][key]) output = "{}{} ".format(output, word) key = (key, word) if key in data['endings']: return output else: key = random.choice(data['endings']) return ""
52. ### def generate_string(data, verbose=True): """Generate a single string from the markov

data.""" output = "" key = () while True: if key in data['content']: word = random.choice(data['content'][key]) output = "{}{} ".format(output, word) key = (key, word) if key in data['endings']: return output else: key = random.choice(data['endings']) return ""
53. ### def generate_string(data, verbose=True): """Generate a single string from the markov

data.""" output = "" key = () while True: if key in data['content']: word = random.choice(data['content'][key]) output = "{}{} ".format(output, word) key = (key, word) if key in data['endings']: return output else: key = random.choice(data['endings']) return ""
54. ### def generate_string(data, verbose=True): """Generate a single string from the markov

data.""" output = "" key = () while True: if key in data['content']: word = random.choice(data['content'][key]) output = "{}{} ".format(output, word) key = (key, word) if key in data['endings']: return output else: key = random.choice(data['endings']) return ""
55. ### def generate_string(data, verbose=True): """Generate a single string from the markov

data.""" output = "" key = () while True: if key in data['content']: word = random.choice(data['content'][key]) output = "{}{} ".format(output, word) key = (key, word) if key in data['endings']: return output else: key = random.choice(data['endings']) return ""
56. ### def generate_string(data, verbose=True): """Generate a single string from the markov

data.""" output = "" key = () while True: if key in data['content']: word = random.choice(data['content'][key]) output = "{}{} ".format(output, word) key = (key, word) if key in data['endings']: return output else: key = random.choice(data['endings']) return ""
57. ### def generate_string(data, verbose=True): """Generate a single string from the markov

data.""" output = "" key = () while True: if key in data['content']: word = random.choice(data['content'][key]) output = "{}{} ".format(output, word) key = (key, word) if key in data['endings']: return output else: key = random.choice(data['endings']) return ""
58. ### def generate_string(data, verbose=True): """Generate a single string from the markov

data.""" output = "" key = () while True: if key in data['content']: word = random.choice(data['content'][key]) output = "{}{} ".format(output, word) key = (key, word) if key in data['endings']: return output else: key = random.choice(data['endings']) return ""
59. ### def generate_string(data, verbose=True): """Generate a single string from the markov

data.""" output = "" key = () while True: if key in data['content']: word = random.choice(data['content'][key]) output = "{}{} ".format(output, word) key = (key, word) if key in data['endings']: return output else: key = random.choice(data['endings']) return ""

61. ### 2nd-order markov chain • Generate a word based on the

TWO preceding words. • Technique used by Mark V. Shaney. • Generates pretty good (almost believable) output.
62. ### 3rd-order? • Generate a word based on the THREE preceding

words. • Should be more realistic!
63. ### http://markovian-wizzzdom.herokuapp.com/ A Markov chain trained on developer job ads, fox

news, bodybuilding & startup blogs

65. ### Resources • Khan Academy course on Information Theory • Wikipedia

/ Markov Chain • Markov Chains explained visually • Coding Horror: Markov and You • Wikipedia / Andre Markov & Mark V. Shaney • Scientific American: First Links in the Markov Chain