Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Fun with Markov Chains (DevSpace 2016)

Fun with Markov Chains (DevSpace 2016)

Presented at DevSpace 2016 -- Have you seen those funny accounts online that are strange, yet almost coherent? The twitter accounts like captain_markov & horse_ebooks, or the venerable King James Programming on tumblr?

They're generated by Markov Chains! This talk will take you on a journey into one of the original markov chain bots, dig into some python source code, and explain how markov chains work without using too many mathematical equations.

Brad Montgomery

October 15, 2016
Tweet

More Decks by Brad Montgomery

Other Decks in Programming

Transcript

  1. what is a markov chain? A stochastic process… sequence of

    random variables… serial dependence only between adjacent periods… …systems that follow a chain of linked events, where what happens next depends only on the current state of the system.
  2. Example from Wikipedia: A creature who eats only grapes, cheese,

    or lettuce, based on these rules: • It eats once a day. • If it ate cheese today, tomorrow it will eat lettuce or grapes with equal probability. • If it ate grapes today, tomorrow it will eat • grapes with probability 1/10 • cheese with probability 4/10 • lettuce with probability 5/10. • If it ate lettuce today, tomorrow it will eat • grapes with probability 4/10 • cheese with probability 6/10 • It will not eat lettuce again tomorrow.
  3. Example from Wikipedia: A creature who eats only grapes, cheese,

    or lettuce, based on these rules: • It eats once a day. • If it ate cheese today, tomorrow it will eat lettuce or grapes with equal probability. • If it ate grapes today, tomorrow it will eat • grapes with probability 1/10 • cheese with probability 4/10 • lettuce with probability 5/10. • If it ate lettuce today, tomorrow it will eat • grapes with probability 4/10 • cheese with probability 6/10 • It will not eat lettuce again tomorrow.
  4. Example from Wikipedia: A creature who eats only grapes, cheese,

    or lettuce, based on these rules: • It eats once a day. • If it ate cheese today, tomorrow it will eat lettuce or grapes with equal probability. • If it ate grapes today, tomorrow it will eat • grapes with probability 1/10 • cheese with probability 4/10 • lettuce with probability 5/10. • If it ate lettuce today, tomorrow it will eat • grapes with probability 4/10 • cheese with probability 6/10 • It will not eat lettuce again tomorrow.
  5. Example from Wikipedia: A creature who eats only grapes, cheese,

    or lettuce, based on these rules: • It eats once a day. • If it ate cheese today, tomorrow it will eat lettuce or grapes with equal probability. • If it ate grapes today, tomorrow it will eat • grapes with probability 1/10 • cheese with probability 4/10 • lettuce with probability 5/10. • If it ate lettuce today, tomorrow it will eat • grapes with probability 4/10 • cheese with probability 6/10 • It will not eat lettuce again tomorrow.
  6. Example from Wikipedia: A creature who eats only grapes, cheese,

    or lettuce, based on these rules: • It eats once a day. • If it ate cheese today, tomorrow it will eat lettuce or grapes with equal probability. • If it ate grapes today, tomorrow it will eat • grapes with probability 1/10 • cheese with probability 4/10 • lettuce with probability 5/10. • If it ate lettuce today, tomorrow it will eat • grapes with probability 4/10 • cheese with probability 6/10 • It will not eat lettuce again tomorrow.
  7. physics thermodynamics chemistry speech recognition information theory (pattern recognition &

    data compression) bioinformatics / DNA sequencing economics / finance games music etc.
  8. How does it work? 1. Train it on a bunch

    of text. 2. Let it generate some text. 3. Commence Hijinks
  9. – Edgar Allen Frost “The woods are lovely dark and

    deep But I have not taken My sorrow I could bear no more.”
  10. A Patch of Old Snow There's a patch of old

    snow in a corner That I should have guessed Was a blow-away paper the rain Had brought to rest. It is speckled with grime as if Small print overspread it, The news of a day I've forgotten-- If I ever read it.
  11. A Patch of Old Snow There's a patch of old

    snow in a corner That I should have guessed Was a blow-away paper the rain Had brought to rest. It is speckled with grime as if Small print overspread it, The news of a day I've forgotten-- If I ever read it.
  12. A Patch of Old Snow There's a patch of old

    snow in a corner That I should have guessed Was a blow-away paper the rain Had brought to rest. It is speckled with grime as if Small print overspread it, The news of a day I've forgotten-- If I ever read it.
  13. A Patch of Old Snow There's a patch of old

    snow in a corner That I should have guessed Was a blow-away paper the rain Had brought to rest. It is speckled with grime as if Small print overspread it, The news of a day I've forgotten-- If I ever read it.
  14. A Patch of Old Snow There's a patch of old

    snow in a corner That I should have guessed Was a blow-away paper the rain Had brought to rest. It is speckled with grime as if Small print overspread it, The news of a day I've forgotten-- If I ever read it.
  15. { (‘And', 'the'): ['sweet', 'pear', 'cloud', 'stars', 'silken', 'only', 'Raven',

    ‘lamplight’], } A word pair. Words that always follow the pair
  16. A Patch of Old Snow There's a patch of old

    snow in a corner That I should have guessed Was a blow-away paper the rain Had brought to rest. It is speckled with grime as if Small print overspread it, The news of a day I've forgotten-- If I ever read it.
  17. [ ('my', 'loss.'), ('all', 'windstirred.'), ('to', 'rest.'), ('read', 'it.'), ('time',

    'talk.'), ('friendly', 'visit.'), ('in', 'ice.'), ('favour', 'fire.'), ('would', ‘suffice.'), … ] A list of all line endings.
  18. Generate Text! { (‘And', 'the'): ['sweet', 'pear', 'cloud', 'stars', 'silken',

    'only', 'Raven', ‘lamplight’], } [ ('my', 'loss.'), ('all', 'windstirred.'), ('to', 'rest.'), ('read', 'it.'), ('time', 'talk.'), ('friendly', 'visit.'), ('in', 'ice.'), ('favour', 'fire.'), ('would', ‘suffice.'), … ]
  19. 1. Pick a random ending pair. 2. Use that as

    a key to pick a random word { (‘still', ‘abide.’): [‘But'] }
  20. 1. Pick a random ending pair. 2. Use that as

    a key to pick a random word { (‘still', ‘abide.’): [‘But'] } Output: But key: (‘abide.’, ‘But’) 3. Keep the word; use it as part of your key.
  21. Output: But lookup: {(‘abide', ‘But’): [‘tis']} 4. Keep doing this…

    Output: But tis lookup: {(‘But', ‘tis’): [‘not']} Output: But tis not lookup: {(‘tis', ‘not’): [‘true']} Output: But tis not true that lookup: {(‘not', ‘true’): [‘that']}
  22. Output: But lookup: {(‘abide', ‘But’): [‘tis']} 4. Keep doing this…

    Output: But tis lookup: {(‘But', ‘tis’): [‘not']} Output: But tis not lookup: {(‘tis', ‘not’): [‘true']} Output: But tis not true that lookup: {(‘not', ‘true’): [‘that']}
  23. Output: But lookup: {(‘abide', ‘But’): [‘tis']} 4. Keep doing this…

    Output: But tis lookup: {(‘But', ‘tis’): [‘not']} Output: But tis not lookup: {(‘tis', ‘not’): [‘true']} Output: But tis not true that lookup: {(‘not', ‘true’): [‘that']}
  24. Output: But tis not true that thus lookup: {(‘that', ‘thus’):

    [‘I']} Output: But tis not true that thus I lookup: {(‘thus', ‘I’): [‘dwelt']} Output: But tis not true that thus I dwelt lookup: {(‘I', ‘dwelt’): [‘aloof']} Output: But tis not true that thus I dwelt aloof lookup: {(‘dwelt', ‘aloof’): [‘For']}
  25. Key:(‘read', ‘it.’) 5. Until your key is a sentence ending.

    But tis not true that thus I dwelt aloof For the rare and radiant maiden whom the angels in Heaven above Nor the demons down under the sea In her sepulchre there by the sea A wind blew out of a day I've forgotten If I ever read it.
  26. def train(filename, verbose=False): endings = set() # Pairs of words

    that end a sentence # Our Data Dict: # - Key: Pairs of words (a tuple) # - Values: List of words that follow those pairs. data = {} # and here's a pair of words that we encounter in a text. prev1 = '' prev2 = '' # Load up a dictionary of data from the input files. for word in read(filename, verbose): if prev1 != '' and prev2 != '': key = (prev2, prev1) if key in data: data[key].append(word) else: data[key] = [word] if _is_ending(key[-1]): endings.add(key) prev2 = prev1 prev1 = word return {'content': data, 'endings': list(endings)}
  27. def train(filename, verbose=False): endings = set() # Pairs of words

    that end a sentence # Our Data Dict: # - Key: Pairs of words (a tuple) # - Values: List of words that follow those pairs. data = {} # and here's a pair of words that we encounter in a text. prev1 = '' prev2 = '' # Load up a dictionary of data from the input files. for word in read(filename, verbose): if prev1 != '' and prev2 != '': key = (prev2, prev1) if key in data: data[key].append(word) else: data[key] = [word] if _is_ending(key[-1]): endings.add(key) prev2 = prev1 prev1 = word return {'content': data, 'endings': list(endings)}
  28. def train(filename, verbose=False): endings = set() # Pairs of words

    that end a sentence # Our Data Dict: # - Key: Pairs of words (a tuple) # - Values: List of words that follow those pairs. data = {} # and here's a pair of words that we encounter in a text. prev1 = '' prev2 = '' # Load up a dictionary of data from the input files. for word in read(filename, verbose): if prev1 != '' and prev2 != '': key = (prev2, prev1) if key in data: data[key].append(word) else: data[key] = [word] if _is_ending(key[-1]): endings.add(key) prev2 = prev1 prev1 = word return {'content': data, 'endings': list(endings)}
  29. def train(filename, verbose=False): endings = set() # Pairs of words

    that end a sentence # Our Data Dict: # - Key: Pairs of words (a tuple) # - Values: List of words that follow those pairs. data = {} # and here's a pair of words that we encounter in a text. prev1 = '' prev2 = '' # Load up a dictionary of data from the input files. for word in read(filename, verbose): if prev1 != '' and prev2 != '': key = (prev2, prev1) if key in data: data[key].append(word) else: data[key] = [word] if _is_ending(key[-1]): endings.add(key) prev2 = prev1 prev1 = word return {'content': data, 'endings': list(endings)}
  30. def train(filename, verbose=False): endings = set() # Pairs of words

    that end a sentence # Our Data Dict: # - Key: Pairs of words (a tuple) # - Values: List of words that follow those pairs. data = {} # and here's a pair of words that we encounter in a text. prev1 = '' prev2 = '' # Load up a dictionary of data from the input files. for word in read(filename, verbose): if prev1 != '' and prev2 != '': key = (prev2, prev1) if key in data: data[key].append(word) else: data[key] = [word] if _is_ending(key[-1]): endings.add(key) prev2 = prev1 prev1 = word return {'content': data, 'endings': list(endings)}
  31. def train(filename, verbose=False): endings = set() # Pairs of words

    that end a sentence # Our Data Dict: # - Key: Pairs of words (a tuple) # - Values: List of words that follow those pairs. data = {} # and here's a pair of words that we encounter in a text. prev1 = '' prev2 = '' # Load up a dictionary of data from the input files. for word in read(filename, verbose): if prev1 != '' and prev2 != '': key = (prev2, prev1) if key in data: data[key].append(word) else: data[key] = [word] if _is_ending(key[-1]): endings.add(key) prev2 = prev1 prev1 = word return {'content': data, 'endings': list(endings)}
  32. def train(filename, verbose=False): endings = set() # Pairs of words

    that end a sentence # Our Data Dict: # - Key: Pairs of words (a tuple) # - Values: List of words that follow those pairs. data = {} # and here's a pair of words that we encounter in a text. prev1 = '' prev2 = '' # Load up a dictionary of data from the input files. for word in read(filename, verbose): if prev1 != '' and prev2 != '': key = (prev2, prev1) if key in data: data[key].append(word) else: data[key] = [word] if _is_ending(key[-1]): endings.add(key) prev2 = prev1 prev1 = word return {'content': data, 'endings': list(endings)}
  33. def train(filename, verbose=False): endings = set() # Pairs of words

    that end a sentence # Our Data Dict: # - Key: Pairs of words (a tuple) # - Values: List of words that follow those pairs. data = {} # and here's a pair of words that we encounter in a text. prev1 = '' prev2 = '' # Load up a dictionary of data from the input files. for word in read(filename, verbose): if prev1 != '' and prev2 != '': key = (prev2, prev1) if key in data: data[key].append(word) else: data[key] = [word] if _is_ending(key[-1]): endings.add(key) prev2 = prev1 prev1 = word return {'content': data, 'endings': list(endings)}
  34. def train(filename, verbose=False): endings = set() # Pairs of words

    that end a sentence # Our Data Dict: # - Key: Pairs of words (a tuple) # - Values: List of words that follow those pairs. data = {} # and here's a pair of words that we encounter in a text. prev1 = '' prev2 = '' # Load up a dictionary of data from the input files. for word in read(filename, verbose): if prev1 != '' and prev2 != '': key = (prev2, prev1) if key in data: data[key].append(word) else: data[key] = [word] if _is_ending(key[-1]): endings.add(key) prev2 = prev1 prev1 = word return {'content': data, 'endings': list(endings)}
  35. def train(filename, verbose=False): endings = set() # Pairs of words

    that end a sentence # Our Data Dict: # - Key: Pairs of words (a tuple) # - Values: List of words that follow those pairs. data = {} # and here's a pair of words that we encounter in a text. prev1 = '' prev2 = '' # Load up a dictionary of data from the input files. for word in read(filename, verbose): if prev1 != '' and prev2 != '': key = (prev2, prev1) if key in data: data[key].append(word) else: data[key] = [word] if _is_ending(key[-1]): endings.add(key) prev2 = prev1 prev1 = word return {'content': data, 'endings': list(endings)}
  36. def generate_string(data, verbose=True): """Generate a single string from the markov

    data.""" output = "" key = () while True: if key in data['content']: word = random.choice(data['content'][key]) output = "{}{} ".format(output, word) key = (key[1], word) if key in data['endings']: return output else: key = random.choice(data['endings']) return ""
  37. def generate_string(data, verbose=True): """Generate a single string from the markov

    data.""" output = "" key = () while True: if key in data['content']: word = random.choice(data['content'][key]) output = "{}{} ".format(output, word) key = (key[1], word) if key in data['endings']: return output else: key = random.choice(data['endings']) return ""
  38. def generate_string(data, verbose=True): """Generate a single string from the markov

    data.""" output = "" key = () while True: if key in data['content']: word = random.choice(data['content'][key]) output = "{}{} ".format(output, word) key = (key[1], word) if key in data['endings']: return output else: key = random.choice(data['endings']) return ""
  39. def generate_string(data, verbose=True): """Generate a single string from the markov

    data.""" output = "" key = () while True: if key in data['content']: word = random.choice(data['content'][key]) output = "{}{} ".format(output, word) key = (key[1], word) if key in data['endings']: return output else: key = random.choice(data['endings']) return ""
  40. def generate_string(data, verbose=True): """Generate a single string from the markov

    data.""" output = "" key = () while True: if key in data['content']: word = random.choice(data['content'][key]) output = "{}{} ".format(output, word) key = (key[1], word) if key in data['endings']: return output else: key = random.choice(data['endings']) return ""
  41. def generate_string(data, verbose=True): """Generate a single string from the markov

    data.""" output = "" key = () while True: if key in data['content']: word = random.choice(data['content'][key]) output = "{}{} ".format(output, word) key = (key[1], word) if key in data['endings']: return output else: key = random.choice(data['endings']) return ""
  42. def generate_string(data, verbose=True): """Generate a single string from the markov

    data.""" output = "" key = () while True: if key in data['content']: word = random.choice(data['content'][key]) output = "{}{} ".format(output, word) key = (key[1], word) if key in data['endings']: return output else: key = random.choice(data['endings']) return ""
  43. def generate_string(data, verbose=True): """Generate a single string from the markov

    data.""" output = "" key = () while True: if key in data['content']: word = random.choice(data['content'][key]) output = "{}{} ".format(output, word) key = (key[1], word) if key in data['endings']: return output else: key = random.choice(data['endings']) return ""
  44. def generate_string(data, verbose=True): """Generate a single string from the markov

    data.""" output = "" key = () while True: if key in data['content']: word = random.choice(data['content'][key]) output = "{}{} ".format(output, word) key = (key[1], word) if key in data['endings']: return output else: key = random.choice(data['endings']) return ""
  45. https://github.com/bradmontgomery/shaney Search: Mark v. Shaney for the original. My pep8-compliant

    (and more readable!) version: ~100+ lines of python with comments
  46. 2nd-order markov chain • Generate a word based on the

    TWO preceding words. • Technique used by Mark V. Shaney. • Generates pretty good (almost believable) output.
  47. Resources • Khan Academy course on Information Theory • Wikipedia

    / Markov Chain • Markov Chains explained visually • Coding Horror: Markov and You • Wikipedia / Andre Markov & Mark V. Shaney • Scientific American: First Links in the Markov Chain