Fun with Markov Chains (DevSpace 2016)

Fun with Markov Chains (DevSpace 2016)

Presented at DevSpace 2016 -- Have you seen those funny accounts online that are strange, yet almost coherent? The twitter accounts like captain_markov & horse_ebooks, or the venerable King James Programming on tumblr?

They're generated by Markov Chains! This talk will take you on a journey into one of the original markov chain bots, dig into some python source code, and explain how markov chains work without using too many mathematical equations.

D57aec10399cbb252bd890c2bb3fe1c9?s=128

Brad Montgomery

October 15, 2016
Tweet

Transcript

  1. Markov Chains! fun with Brad Montgomery @bkmontgomery

  2. what is a markov chain? A stochastic process… sequence of

    random variables… serial dependence only between adjacent periods… …systems that follow a chain of linked events, where what happens next depends only on the current state of the system.
  3. what is a markov chain?

  4. Pavel Nekrasov 1853 - 1924

  5. Andrey (Andrei) Andreyevich Markov 1856 - 1922

  6. markov models & text A Mathematical Theory of Communication 1949

    paper by Claude Shannon
  7. The Ultimate Machine

  8. Example from Wikipedia: A creature who eats only grapes, cheese,

    or lettuce, based on these rules: • It eats once a day. • If it ate cheese today, tomorrow it will eat lettuce or grapes with equal probability. • If it ate grapes today, tomorrow it will eat • grapes with probability 1/10 • cheese with probability 4/10 • lettuce with probability 5/10. • If it ate lettuce today, tomorrow it will eat • grapes with probability 4/10 • cheese with probability 6/10 • It will not eat lettuce again tomorrow.
  9. Example from Wikipedia: A creature who eats only grapes, cheese,

    or lettuce, based on these rules: • It eats once a day. • If it ate cheese today, tomorrow it will eat lettuce or grapes with equal probability. • If it ate grapes today, tomorrow it will eat • grapes with probability 1/10 • cheese with probability 4/10 • lettuce with probability 5/10. • If it ate lettuce today, tomorrow it will eat • grapes with probability 4/10 • cheese with probability 6/10 • It will not eat lettuce again tomorrow.
  10. Example from Wikipedia: A creature who eats only grapes, cheese,

    or lettuce, based on these rules: • It eats once a day. • If it ate cheese today, tomorrow it will eat lettuce or grapes with equal probability. • If it ate grapes today, tomorrow it will eat • grapes with probability 1/10 • cheese with probability 4/10 • lettuce with probability 5/10. • If it ate lettuce today, tomorrow it will eat • grapes with probability 4/10 • cheese with probability 6/10 • It will not eat lettuce again tomorrow.
  11. Example from Wikipedia: A creature who eats only grapes, cheese,

    or lettuce, based on these rules: • It eats once a day. • If it ate cheese today, tomorrow it will eat lettuce or grapes with equal probability. • If it ate grapes today, tomorrow it will eat • grapes with probability 1/10 • cheese with probability 4/10 • lettuce with probability 5/10. • If it ate lettuce today, tomorrow it will eat • grapes with probability 4/10 • cheese with probability 6/10 • It will not eat lettuce again tomorrow.
  12. Example from Wikipedia: A creature who eats only grapes, cheese,

    or lettuce, based on these rules: • It eats once a day. • If it ate cheese today, tomorrow it will eat lettuce or grapes with equal probability. • If it ate grapes today, tomorrow it will eat • grapes with probability 1/10 • cheese with probability 4/10 • lettuce with probability 5/10. • If it ate lettuce today, tomorrow it will eat • grapes with probability 4/10 • cheese with probability 6/10 • It will not eat lettuce again tomorrow.
  13. physics thermodynamics chemistry speech recognition information theory (pattern recognition &

    data compression) bioinformatics / DNA sequencing economics / finance games music etc.
  14. funny things on the internet

  15. – King James Programming http://kingjamesprogramming.tumblr.com/ “17:20 O LORD, there is

    none end of the regexp.”
  16. https://twitter.com/erowidrecruiter/status/562654960995672064

  17. https://twitter.com/Horse_ebooks/status/372490043320852481

  18. https://twitter.com/captain_markov/status/723035715185991680

  19. Mark V. Shaney A Usenet bot from the 1980s trolling

    net.singles
  20. How does it work? 1. Train it on a bunch

    of text. 2. Let it generate some text. 3. Commence Hijinks
  21. – Edgar Allen Frost “The woods are lovely dark and

    deep But I have not taken My sorrow I could bear no more.”
  22. A Patch of Old Snow There's a patch of old

    snow in a corner That I should have guessed Was a blow-away paper the rain Had brought to rest. It is speckled with grime as if Small print overspread it, The news of a day I've forgotten-- If I ever read it.
  23. A Patch of Old Snow There's a patch of old

    snow in a corner That I should have guessed Was a blow-away paper the rain Had brought to rest. It is speckled with grime as if Small print overspread it, The news of a day I've forgotten-- If I ever read it.
  24. A Patch of Old Snow There's a patch of old

    snow in a corner That I should have guessed Was a blow-away paper the rain Had brought to rest. It is speckled with grime as if Small print overspread it, The news of a day I've forgotten-- If I ever read it.
  25. A Patch of Old Snow There's a patch of old

    snow in a corner That I should have guessed Was a blow-away paper the rain Had brought to rest. It is speckled with grime as if Small print overspread it, The news of a day I've forgotten-- If I ever read it.
  26. A Patch of Old Snow There's a patch of old

    snow in a corner That I should have guessed Was a blow-away paper the rain Had brought to rest. It is speckled with grime as if Small print overspread it, The news of a day I've forgotten-- If I ever read it.
  27. { (‘And', 'the'): ['sweet', 'pear', 'cloud', 'stars', 'silken', 'only', 'Raven',

    ‘lamplight’], }
  28. { (‘And', 'the'): ['sweet', 'pear', 'cloud', 'stars', 'silken', 'only', 'Raven',

    ‘lamplight’], } A word pair.
  29. { (‘And', 'the'): ['sweet', 'pear', 'cloud', 'stars', 'silken', 'only', 'Raven',

    ‘lamplight’], } A word pair. Words that always follow the pair
  30. A Patch of Old Snow There's a patch of old

    snow in a corner That I should have guessed Was a blow-away paper the rain Had brought to rest. It is speckled with grime as if Small print overspread it, The news of a day I've forgotten-- If I ever read it.
  31. [ ('my', 'loss.'), ('all', 'windstirred.'), ('to', 'rest.'), ('read', 'it.'), ('time',

    'talk.'), ('friendly', 'visit.'), ('in', 'ice.'), ('favour', 'fire.'), ('would', ‘suffice.'), … ] A list of all line endings.
  32. Generate Text! { (‘And', 'the'): ['sweet', 'pear', 'cloud', 'stars', 'silken',

    'only', 'Raven', ‘lamplight’], } [ ('my', 'loss.'), ('all', 'windstirred.'), ('to', 'rest.'), ('read', 'it.'), ('time', 'talk.'), ('friendly', 'visit.'), ('in', 'ice.'), ('favour', 'fire.'), ('would', ‘suffice.'), … ]
  33. 1. Pick a random sentence-ending pair. ('still', 'abide.')

  34. 1. Pick a random ending pair. 2. Use that as

    a key to pick a random word { (‘still', ‘abide.’): [‘But'] }
  35. 1. Pick a random ending pair. 2. Use that as

    a key to pick a random word { (‘still', ‘abide.’): [‘But'] } Output: But key: (‘abide.’, ‘But’) 3. Keep the word; use it as part of your key.
  36. Output: But lookup: {(‘abide', ‘But’): [‘tis']} 4. Keep doing this…

    Output: But tis lookup: {(‘But', ‘tis’): [‘not']} Output: But tis not lookup: {(‘tis', ‘not’): [‘true']} Output: But tis not true that lookup: {(‘not', ‘true’): [‘that']}
  37. Output: But lookup: {(‘abide', ‘But’): [‘tis']} 4. Keep doing this…

    Output: But tis lookup: {(‘But', ‘tis’): [‘not']} Output: But tis not lookup: {(‘tis', ‘not’): [‘true']} Output: But tis not true that lookup: {(‘not', ‘true’): [‘that']}
  38. Output: But lookup: {(‘abide', ‘But’): [‘tis']} 4. Keep doing this…

    Output: But tis lookup: {(‘But', ‘tis’): [‘not']} Output: But tis not lookup: {(‘tis', ‘not’): [‘true']} Output: But tis not true that lookup: {(‘not', ‘true’): [‘that']}
  39. Output: But tis not true that thus lookup: {(‘that', ‘thus’):

    [‘I']} Output: But tis not true that thus I lookup: {(‘thus', ‘I’): [‘dwelt']} Output: But tis not true that thus I dwelt lookup: {(‘I', ‘dwelt’): [‘aloof']} Output: But tis not true that thus I dwelt aloof lookup: {(‘dwelt', ‘aloof’): [‘For']}
  40. Key:(‘read', ‘it.’) 5. Until your key is a sentence ending.

    But tis not true that thus I dwelt aloof For the rare and radiant maiden whom the angels in Heaven above Nor the demons down under the sea In her sepulchre there by the sea A wind blew out of a day I've forgotten If I ever read it.
  41. def train(filename, verbose=False): endings = set() # Pairs of words

    that end a sentence # Our Data Dict: # - Key: Pairs of words (a tuple) # - Values: List of words that follow those pairs. data = {} # and here's a pair of words that we encounter in a text. prev1 = '' prev2 = '' # Load up a dictionary of data from the input files. for word in read(filename, verbose): if prev1 != '' and prev2 != '': key = (prev2, prev1) if key in data: data[key].append(word) else: data[key] = [word] if _is_ending(key[-1]): endings.add(key) prev2 = prev1 prev1 = word return {'content': data, 'endings': list(endings)}
  42. def train(filename, verbose=False): endings = set() # Pairs of words

    that end a sentence # Our Data Dict: # - Key: Pairs of words (a tuple) # - Values: List of words that follow those pairs. data = {} # and here's a pair of words that we encounter in a text. prev1 = '' prev2 = '' # Load up a dictionary of data from the input files. for word in read(filename, verbose): if prev1 != '' and prev2 != '': key = (prev2, prev1) if key in data: data[key].append(word) else: data[key] = [word] if _is_ending(key[-1]): endings.add(key) prev2 = prev1 prev1 = word return {'content': data, 'endings': list(endings)}
  43. def train(filename, verbose=False): endings = set() # Pairs of words

    that end a sentence # Our Data Dict: # - Key: Pairs of words (a tuple) # - Values: List of words that follow those pairs. data = {} # and here's a pair of words that we encounter in a text. prev1 = '' prev2 = '' # Load up a dictionary of data from the input files. for word in read(filename, verbose): if prev1 != '' and prev2 != '': key = (prev2, prev1) if key in data: data[key].append(word) else: data[key] = [word] if _is_ending(key[-1]): endings.add(key) prev2 = prev1 prev1 = word return {'content': data, 'endings': list(endings)}
  44. def train(filename, verbose=False): endings = set() # Pairs of words

    that end a sentence # Our Data Dict: # - Key: Pairs of words (a tuple) # - Values: List of words that follow those pairs. data = {} # and here's a pair of words that we encounter in a text. prev1 = '' prev2 = '' # Load up a dictionary of data from the input files. for word in read(filename, verbose): if prev1 != '' and prev2 != '': key = (prev2, prev1) if key in data: data[key].append(word) else: data[key] = [word] if _is_ending(key[-1]): endings.add(key) prev2 = prev1 prev1 = word return {'content': data, 'endings': list(endings)}
  45. def train(filename, verbose=False): endings = set() # Pairs of words

    that end a sentence # Our Data Dict: # - Key: Pairs of words (a tuple) # - Values: List of words that follow those pairs. data = {} # and here's a pair of words that we encounter in a text. prev1 = '' prev2 = '' # Load up a dictionary of data from the input files. for word in read(filename, verbose): if prev1 != '' and prev2 != '': key = (prev2, prev1) if key in data: data[key].append(word) else: data[key] = [word] if _is_ending(key[-1]): endings.add(key) prev2 = prev1 prev1 = word return {'content': data, 'endings': list(endings)}
  46. def train(filename, verbose=False): endings = set() # Pairs of words

    that end a sentence # Our Data Dict: # - Key: Pairs of words (a tuple) # - Values: List of words that follow those pairs. data = {} # and here's a pair of words that we encounter in a text. prev1 = '' prev2 = '' # Load up a dictionary of data from the input files. for word in read(filename, verbose): if prev1 != '' and prev2 != '': key = (prev2, prev1) if key in data: data[key].append(word) else: data[key] = [word] if _is_ending(key[-1]): endings.add(key) prev2 = prev1 prev1 = word return {'content': data, 'endings': list(endings)}
  47. def train(filename, verbose=False): endings = set() # Pairs of words

    that end a sentence # Our Data Dict: # - Key: Pairs of words (a tuple) # - Values: List of words that follow those pairs. data = {} # and here's a pair of words that we encounter in a text. prev1 = '' prev2 = '' # Load up a dictionary of data from the input files. for word in read(filename, verbose): if prev1 != '' and prev2 != '': key = (prev2, prev1) if key in data: data[key].append(word) else: data[key] = [word] if _is_ending(key[-1]): endings.add(key) prev2 = prev1 prev1 = word return {'content': data, 'endings': list(endings)}
  48. def train(filename, verbose=False): endings = set() # Pairs of words

    that end a sentence # Our Data Dict: # - Key: Pairs of words (a tuple) # - Values: List of words that follow those pairs. data = {} # and here's a pair of words that we encounter in a text. prev1 = '' prev2 = '' # Load up a dictionary of data from the input files. for word in read(filename, verbose): if prev1 != '' and prev2 != '': key = (prev2, prev1) if key in data: data[key].append(word) else: data[key] = [word] if _is_ending(key[-1]): endings.add(key) prev2 = prev1 prev1 = word return {'content': data, 'endings': list(endings)}
  49. def train(filename, verbose=False): endings = set() # Pairs of words

    that end a sentence # Our Data Dict: # - Key: Pairs of words (a tuple) # - Values: List of words that follow those pairs. data = {} # and here's a pair of words that we encounter in a text. prev1 = '' prev2 = '' # Load up a dictionary of data from the input files. for word in read(filename, verbose): if prev1 != '' and prev2 != '': key = (prev2, prev1) if key in data: data[key].append(word) else: data[key] = [word] if _is_ending(key[-1]): endings.add(key) prev2 = prev1 prev1 = word return {'content': data, 'endings': list(endings)}
  50. def train(filename, verbose=False): endings = set() # Pairs of words

    that end a sentence # Our Data Dict: # - Key: Pairs of words (a tuple) # - Values: List of words that follow those pairs. data = {} # and here's a pair of words that we encounter in a text. prev1 = '' prev2 = '' # Load up a dictionary of data from the input files. for word in read(filename, verbose): if prev1 != '' and prev2 != '': key = (prev2, prev1) if key in data: data[key].append(word) else: data[key] = [word] if _is_ending(key[-1]): endings.add(key) prev2 = prev1 prev1 = word return {'content': data, 'endings': list(endings)}
  51. def generate_string(data, verbose=True): """Generate a single string from the markov

    data.""" output = "" key = () while True: if key in data['content']: word = random.choice(data['content'][key]) output = "{}{} ".format(output, word) key = (key[1], word) if key in data['endings']: return output else: key = random.choice(data['endings']) return ""
  52. def generate_string(data, verbose=True): """Generate a single string from the markov

    data.""" output = "" key = () while True: if key in data['content']: word = random.choice(data['content'][key]) output = "{}{} ".format(output, word) key = (key[1], word) if key in data['endings']: return output else: key = random.choice(data['endings']) return ""
  53. def generate_string(data, verbose=True): """Generate a single string from the markov

    data.""" output = "" key = () while True: if key in data['content']: word = random.choice(data['content'][key]) output = "{}{} ".format(output, word) key = (key[1], word) if key in data['endings']: return output else: key = random.choice(data['endings']) return ""
  54. def generate_string(data, verbose=True): """Generate a single string from the markov

    data.""" output = "" key = () while True: if key in data['content']: word = random.choice(data['content'][key]) output = "{}{} ".format(output, word) key = (key[1], word) if key in data['endings']: return output else: key = random.choice(data['endings']) return ""
  55. def generate_string(data, verbose=True): """Generate a single string from the markov

    data.""" output = "" key = () while True: if key in data['content']: word = random.choice(data['content'][key]) output = "{}{} ".format(output, word) key = (key[1], word) if key in data['endings']: return output else: key = random.choice(data['endings']) return ""
  56. def generate_string(data, verbose=True): """Generate a single string from the markov

    data.""" output = "" key = () while True: if key in data['content']: word = random.choice(data['content'][key]) output = "{}{} ".format(output, word) key = (key[1], word) if key in data['endings']: return output else: key = random.choice(data['endings']) return ""
  57. def generate_string(data, verbose=True): """Generate a single string from the markov

    data.""" output = "" key = () while True: if key in data['content']: word = random.choice(data['content'][key]) output = "{}{} ".format(output, word) key = (key[1], word) if key in data['endings']: return output else: key = random.choice(data['endings']) return ""
  58. def generate_string(data, verbose=True): """Generate a single string from the markov

    data.""" output = "" key = () while True: if key in data['content']: word = random.choice(data['content'][key]) output = "{}{} ".format(output, word) key = (key[1], word) if key in data['endings']: return output else: key = random.choice(data['endings']) return ""
  59. def generate_string(data, verbose=True): """Generate a single string from the markov

    data.""" output = "" key = () while True: if key in data['content']: word = random.choice(data['content'][key]) output = "{}{} ".format(output, word) key = (key[1], word) if key in data['endings']: return output else: key = random.choice(data['endings']) return ""
  60. https://github.com/bradmontgomery/shaney Search: Mark v. Shaney for the original. My pep8-compliant

    (and more readable!) version: ~100+ lines of python with comments
  61. 2nd-order markov chain • Generate a word based on the

    TWO preceding words. • Technique used by Mark V. Shaney. • Generates pretty good (almost believable) output.
  62. 3rd-order? • Generate a word based on the THREE preceding

    words. • Should be more realistic!
  63. http://markovian-wizzzdom.herokuapp.com/ A Markov chain trained on developer job ads, fox

    news, bodybuilding & startup blogs
  64. See it in action?

  65. Resources • Khan Academy course on Information Theory • Wikipedia

    / Markov Chain • Markov Chains explained visually • Coding Horror: Markov and You • Wikipedia / Andre Markov & Mark V. Shaney • Scientific American: First Links in the Markov Chain
  66. • https://en.wikipedia.org/wiki/Random_walk • http://aix1.uottawa.ca/~jkhoury/markov.htm • http://math.stackexchange.com/questions/27514/ nice-references-on-markov-chains-processes/ 27516#27516 • https://www.quora.com/What-are-the-most-

    interesting-applications-of-Markov-chains • https://www.quora.com/What-should-every- programmer-know-about-Markov-Chains
  67. None