Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Can my computer make jokes

Can my computer make jokes

Anyone interested in understanding how Word2Vec Recurrent Neural Networks work? What about doing it while teaching your computer to make jokes?

Diogo Pinto, a Data Scientist at Feedzai is showing you how to do this. Diogo calls himself a Machine Learning, Security and Scala Enthusiast, multipotentialite, parkour practitioner and a full-time wisdom seeker.

The talks is aimed at anyone with an introductory level of knowledge of Deep Learning, that want to know more about how Recurrent Neural Nets works.

P.S.: Be warned, computer jokes will be bad, but imagine it like your weird friend that fills in the silence with the driest jokes ever.

Python Porto

November 28, 2018
Tweet

More Decks by Python Porto

Other Decks in Programming

Transcript

  1. About me • Trained as Software Engineer • Grown into

    a Data Scientist generalist • Other interests • Blockchain • Distributed systems • (Cyber- and non-cyber-) Security • Psychology and Philosophy • Kung Fu and Parkour practitioner #nofilters @diogojapinto /in/diogojapinto/ [email protected]
  2. @diogojapinto /in/diogojapinto/ [email protected] About me • Trained as Software Engineer

    • Grown into a Data Scientist generalist • Other interests • Blockchain • Distributed systems • (Cyber- and non-cyber-) Security • Psychology and Philosophy • Kung Fu and Parkour practitioner #nofilters
  3. @diogojapinto /in/diogojapinto/ [email protected] About me • Trained as Software Engineer

    • Grown into a Data Scientist generalist • Other interests • Blockchain • Distributed systems • (Cyber- and non-cyber-) Security • Psychology and Philosophy • Kung Fu and Parkour practitioner #nofilters
  4. The plan for today What is the role of Humor?

    How can a computer use words? Does my computer has better sense of humor than I do? What are the take-aways?
  5. “Know your audience Luke” • Who here is aware of

    recent Machine Learning achievements?
  6. “Know your audience Luke” • Who here is aware of

    recent Machine Learning achievements? • Who here has some intuition behind the inner workings of Neural Networks (aka Differentiable Programming)?
  7. “Know your audience Luke” • Who here is aware of

    recent Machine Learning achievements? • Who here has some intuition behind the inner workings of Neural Networks (aka Differentiable Programming)? • Who here has a dark sense of humor?
  8. What is the role of Humor? “A day without laughter

    is a day wasted” Charlie Chaplin
  9. History of Humor [1] • Humor is complex and a

    high cognitive function [1] The First Joke: Exploring the Evolutionary Origins of Humor
  10. History of Humor [1] • Humor is complex and a

    high cognitive function • Nuanced verbal phrasing + Prevailing social dynamics [1] The First Joke: Exploring the Evolutionary Origins of Humor
  11. History of Humor [1] • Humor is complex and a

    high cognitive function • Nuanced verbal phrasing + Prevailing social dynamics • Can combine language skills, theory-of-mind, symbolism, abstract thinking, social perception… [1] The First Joke: Exploring the Evolutionary Origins of Humor
  12. History of Humor [1] • Humor is complex and a

    high cognitive function • Nuanced verbal phrasing + Prevailing social dynamics • Can combine language skills, theory-of-mind, symbolism, abstract thinking, social perception… • The basic ability seems “instinctive” [1] The First Joke: Exploring the Evolutionary Origins of Humor
  13. History of Humor [1] • Humor is complex and a

    high cognitive function • Nuanced verbal phrasing + Prevailing social dynamics • Can combine language skills, theory-of-mind, symbolism, abstract thinking, social perception… • The basic ability seems “instinctive” • Ubiquitous and Universal [1] The First Joke: Exploring the Evolutionary Origins of Humor
  14. History of Humor [1] • Humor is complex and a

    high cognitive function • Nuanced verbal phrasing + Prevailing social dynamics • Can combine language skills, theory-of-mind, symbolism, abstract thinking, social perception… • The basic ability seems “instinctive” • Ubiquitous and Universal • It is probably coded somehow in our genetic code • People laugh without appreciation for the causal factors [1] The First Joke: Exploring the Evolutionary Origins of Humor
  15. History of Humor [1] • Humor is complex and a

    high cognitive function • Nuanced verbal phrasing + Prevailing social dynamics • Can combine language skills, theory-of-mind, symbolism, abstract thinking, social perception… • The basic ability seems “instinctive” • Ubiquitous and Universal • It is probably coded somehow in our genetic code • People laugh without appreciation for the causal factors • Humor dates back thousands of years [1] The First Joke: Exploring the Evolutionary Origins of Humor
  16. History of Humor [1] • Humor is complex and a

    high cognitive function • Nuanced verbal phrasing + Prevailing social dynamics • Can combine language skills, theory-of-mind, symbolism, abstract thinking, social perception… • The basic ability seems “instinctive” • Ubiquitous and Universal • It is probably coded somehow in our genetic code • People laugh without appreciation for the causal factors • Humor dates back thousands of years • Greek “laughing philosopher” Democritus [1] The First Joke: Exploring the Evolutionary Origins of Humor
  17. History of Humor [1] • Humor is complex and a

    high cognitive function • Nuanced verbal phrasing + Prevailing social dynamics • Can combine language skills, theory-of-mind, symbolism, abstract thinking, social perception… • The basic ability seems “instinctive” • Ubiquitous and Universal • It is probably coded somehow in our genetic code • People laugh without appreciation for the causal factors • Humor dates back thousands of years • Greek “laughing philosopher” Democritus • Humor conversations observed in Australian aboriginals • Lived genetically isolated for at least 35000 years [1] The First Joke: Exploring the Evolutionary Origins of Humor
  18. Why Humor is relevant for Machine Learning[1] • Humor is

    complex and a high cognitive function • Nuanced verbal phrasing + Prevailing social dynamics • Can combine language skills, theory-of-mind, symbolism, abstract thinking, social perception… • The basic ability seems “instinctive” • Ubiquitous and Universal • It is probably coded somehow in our genetic code • People laugh without appreciation for the causal factors • Humor dates back thousands of years • Greek “laughing philosopher” Democritus • Humor conversations observed in Australian aboriginals • Lived genetically isolated for at least 35000 years [1] The First Joke: Exploring the Evolutionary Origins of Humor
  19. Let’s look at the data • Dataset of short jokes

    from Kaggle user avmoudgil95 • E.g.: • It's crazy how my ex was so upset about losing me that he had to build a life with a new woman.
  20. Let’s look at the data • Dataset of short jokes

    from Kaggle user avmoudgil95 • E.g.: • It's crazy how my ex was so upset about losing me that he had to build a life with a new woman.
  21. Let’s look at the data • Dataset of short jokes

    from Kaggle user avmoudgil95 • E.g.: • It's crazy how my ex was so upset about losing me that he had to build a life with a new woman. • Where does Noah keep his bees? In the Ark Hives
  22. Let’s look at the data • Dataset of short jokes

    from Kaggle user avmoudgil95 • E.g.: • It's crazy how my ex was so upset about losing me that he had to build a life with a new woman. • Where does Noah keep his bees? In the Ark Hives
  23. Let’s look at the data • Dataset of short jokes

    from Kaggle user avmoudgil95 • E.g.: • It's crazy how my ex was so upset about losing me that he had to build a life with a new woman. • Where does Noah keep his bees? In the Ark Hives • What sex position produces the ugliest children? Ask your mother.
  24. Let’s look at the data • Dataset of short jokes

    from Kaggle user avmoudgil95 • E.g.: • It's crazy how my ex was so upset about losing me that he had to build a life with a new woman. • Where does Noah keep his bees? In the Ark Hives • What sex position produces the ugliest children? Ask your mother.
  25. Let’s look at the data • Dataset of short jokes

    from Kaggle user avmoudgil95 • E.g.: • It's crazy how my ex was so upset about losing me that he had to build a life with a new woman. • Where does Noah keep his bees? In the Ark Hives • What sex position produces the ugliest children? Ask your mother. • Chuck Norris doesn't have blood. He is filled with magma.
  26. Let’s look at the data • Dataset of short jokes

    from Kaggle user avmoudgil95 • E.g.: • It's crazy how my ex was so upset about losing me that he had to build a life with a new woman. • Where does Noah keep his bees? In the Ark Hives • What sex position produces the ugliest children? Ask your mother. • Chuck Norris doesn't have blood. He is filled with magma.
  27. Let’s look at the data • Dataset of short jokes

    from Kaggle user avmoudgil95 • E.g.: • It's crazy how my ex was so upset about losing me that he had to build a life with a new woman. • Where does Noah keep his bees? In the Ark Hives • What sex position produces the ugliest children? Ask your mother. • Chuck Norris doesn't have blood. He is filled with magma. • Length:
  28. How can a computer use words? “A picture may be

    worth a thousand words, but well chosen, words will take you where pictures never can” Unknown
  29. Why does representation matter? Sender country risky “fee” count Spam

    Detector Machine Learning Model Words count Sender country risky “fee” count Words count Spam? Email 1 True 0 203 True Email 2 False 2 345 False Email 3 True 10 180 True
  30. Why does representation matter? Sender country risky “fee” count Spam

    Detector Machine Learning Model Words count Sender country risky “fee” count Words count “prince” count Spam? Email 1 True 0 203 5 True Email 2 False 2 345 0 False Email 3 True 10 180 3 True
  31. Why does representation matter? Sender country risky “fee” count Spam

    Detector Machine Learning Model Words count Sender country risky “fee” count Words count Spam? Email 1 True 0 203 True Email 2 False 2 345 False Email 3 True 10 180 True
  32. Why does representation matter? Sender country risky “fee” count Spam

    Detector Machine Learning Model Words count Sender country risky “fee” count Spam? Email 1 True 0 True Email 2 False 2 False Email 3 True 10 True
  33. Why does representation matter? Sender country risky “fee” count Spam

    Detector Machine Learning Model Words count Sender country risky “fee” count Words count Spam? Email 1 True 0 203 True Email 2 False 2 345 False Email 3 True 10 180 True Data entries should be comparable and consistent
  34. The problem of Words Representation • An array of characters,

    each one a byte • Example: “I like beer a lot” “I own a lot of wine” • Representation: 49206c696b6520626565722061206c6f74 49206f776e2061206c6f74206f662077696e65
  35. The problem of Words Representation • An array of characters,

    each one a byte • Variable length • Difficult comparison between entries • Example: “I like beer a lot” “I own a lot of wine” • Representation: 49206c696b6520626565722061206c6f74 49206f776e2061206c6f74206f662077696e65
  36. The problem of Words Representation • An array of characters,

    each one a byte • Variable length • Difficult comparison between entries • Bag-of-Words • Example: “I like beer a lot” “I own a lot of wine” • Representation: I like beer a lot own of wine 1 1 1 1 1 0 0 0 1 0 0 1 1 1 1 1
  37. The problem of Words Representation • An array of characters,

    each one a byte • Variable length • Difficult comparison between entries • Bag-of-Words • By discarding order we are able to generalize • Example: “I like beer a lot” “I own a lot of wine” • Representation: I like beer a lot own of wine 1 1 1 1 1 0 0 0 1 0 0 1 1 1 1 1
  38. The problem of Words Representation • An array of characters,

    each one a byte • Variable length • Difficult comparison between entries • Bag-of-Words • By discarding order we are able to generalize • Term Frequency – Inverse Document Frequency • Example: “I like beer a lot” “I own a lot of wine” • Representation: I like beer a lot own of wine 0.1 0.3 0.6 0.1 0.4 0 0 0 0.1 0 0 0.1 0.4 0.5 0.1 0.7
  39. The problem of Words Representation • An array of characters,

    each one a byte • Variable length • Difficult comparison between entries • Bag-of-Words • By discarding order we are able to generalize • Term Frequency – Inverse Document Frequency • Words frequency in document • Example: “I like beer a lot” “I own a lot of wine” • Representation: I like beer a lot own of wine 0.1 0.3 0.6 0.1 0.4 0 0 0 0.1 0 0 0.1 0.4 0.5 0.1 0.7
  40. The problem of Words Representation • An array of characters,

    each one a byte • Variable length • Difficult comparison between entries • Bag-of-Words • By discarding order we are able to generalize • Term Frequency – Inverse Document Frequency • Words frequency in document • Rarity of words across documents • Example: “I like beer a lot” “I own a lot of wine” • Representation: I like beer a lot own of wine 0.1 0.3 0.6 0.1 0.4 0 0 0 0.1 0 0 0.1 0.4 0.5 0.1 0.7
  41. The problem of Words Representation • An array of characters,

    each one a byte • Variable length • Difficult comparison between entries • Bag-of-Words • By discarding order we are able to generalize • Term Frequency – Inverse Document Frequency • Words frequency in document • Rarity of words across documents • Example: “I like beer a lot” “I own a lot of wine” • Representation: I like beer a lot own of wine 0.1 0.3 0.6 0.1 0.4 0 0 0 0.1 0 0 0.1 0.4 0.5 0.1 0.7
  42. The problem of Words Representation • An array of characters,

    each one a byte • Variable length • Difficult comparison between entries • Bag-of-Words • By discarding order we are able to generalize • Term Frequency – Inverse Document Frequency • Words frequency in document • Rarity of words across documents • Example: “I like beer a lot” “I own a lot of wine” • Representation: I like beer a lot own of wine 0.1 0.3 0.6 0.1 0.4 0 0 0 0.1 0 0 0.1 0.4 0.5 0.1 0.7
  43. What about semantics? • Meaning of words is lost Doc

    I like beer a lot own of wine beer 0 0 1 0 0 0 0 0 wine 0 0 0 0 0 0 0 1 own 0 0 0 0 0 1 0 0
  44. What about semantics? • Meaning of words is lost •

    Distance(wine, beer) = Distance(wine, own) wine own beer Doc I like beer a lot own of wine beer 0 0 1 0 0 0 0 0 wine 0 0 0 0 0 0 0 1 own 0 0 0 0 0 1 0 0
  45. What about semantics? • Meaning of words is lost •

    Distance(wine, beer) = Distance(wine, own) • Distributed representations can help
  46. What about semantics? • Meaning of words is lost •

    Distance(wine, beer) = Distance(wine, own) • Distributed representations can help Doc D1 D2 beer 0.2 0.7 wine 0.1 0.8 own 0.8 0.1
  47. What about semantics? • Meaning of words is lost •

    Distance(wine, beer) = Distance(wine, own) • Distributed representations can help Doc D1 D2 beer 0.2 0.7 wine 0.1 0.8 own 0.8 0.1 0 0,3 0,5 0,8 1 0 0,2 0,4 0,6 0,8 wine own beer
  48. What about semantics? • Meaning of words is lost •

    Distance(wine, beer) = Distance(wine, own) • Distributed representations can help • Reduce the dimensionality footprint • Semantics encoded as “proximity” Doc D1 D2 beer 0.2 0.7 wine 0.1 0.8 own 0.8 0.1 0 0,3 0,5 0,8 1 0 0,2 0,4 0,6 0,8 wine own beer
  49. What about semantics? • Meaning of words is lost •

    Distance(wine, beer) = Distance(wine, own) • Distributed representations can help • Reduce the dimensionality footprint • Semantics encoded as “proximity” • Word2Vec • Start with “random” word representations with dimension d
  50. What about semantics? • Meaning of words is lost •

    Distance(wine, beer) = Distance(wine, own) • Distributed representations can help • Reduce the dimensionality footprint • Semantics encoded as “proximity” • Word2Vec • Start with “random” word representations with dimension d I [0.8, 0.12, …] like … beer … a … lot …
  51. What about semantics? • Meaning of words is lost •

    Distance(wine, beer) = Distance(wine, own) • Distributed representations can help • Reduce the dimensionality footprint • Semantics encoded as “proximity” • Word2Vec • Start with “random” word representations with dimension d • From the representation of a given word predict a randomly sampled context word
  52. What about semantics? • Meaning of words is lost •

    Distance(wine, beer) = Distance(wine, own) • Distributed representations can help • Reduce the dimensionality footprint • Semantics encoded as “proximity” • Word2Vec • Start with “random” word representations with dimension d • From the representation of a given word predict a randomly sampled context word I like beer a lot
  53. What about semantics? • Meaning of words is lost •

    Distance(wine, beer) = Distance(wine, own) • Distributed representations can help • Reduce the dimensionality footprint • Semantics encoded as “proximity” • Word2Vec • Start with “random” word representations with dimension d • From the representation of a given word predict a randomly sampled context word I like beer a lot • Training examples: • (I, like)
  54. What about semantics? • Meaning of words is lost •

    Distance(wine, beer) = Distance(wine, own) • Distributed representations can help • Reduce the dimensionality footprint • Semantics encoded as “proximity” • Word2Vec • Start with “random” word representations with dimension d • From the representation of a given word predict a randomly sampled context word I like beer a lot • Training examples: • (I, like) • (like, I) • (like, beer)
  55. What about semantics? • Meaning of words is lost •

    Distance(wine, beer) = Distance(wine, own) • Distributed representations can help • Reduce the dimensionality footprint • Semantics encoded as “proximity” • Word2Vec • Start with “random” word representations with dimension d • From the representation of a given word predict a randomly sampled context word I like beer a lot • Training examples: • (I, like) • (like, I) • (like, beer) • (beer, like) • (beer, a)
  56. What about semantics? • Meaning of words is lost •

    Distance(wine, beer) = Distance(wine, own) • Distributed representations can help • Reduce the dimensionality footprint • Semantics encoded as “proximity” • Word2Vec • Start with “random” word representations with dimension d • From the representation of a given word predict a randomly sampled context word I like beer a lot • Training examples: • (I, like) • (like, I) • (like, beer) • (beer, like) • (beer, a) • (a, beer) • (a, lot)
  57. What about semantics? • Meaning of words is lost •

    Distance(wine, beer) = Distance(wine, own) • Distributed representations can help • Reduce the dimensionality footprint • Semantics encoded as “proximity” • Word2Vec • Start with “random” word representations with dimension d • From the representation of a given word predict a randomly sampled context word I like beer a lot • Training examples: • (I, like) • (like, I) • (like, beer) • (beer, like) • (beer, a) • (a, beer) • (a, lot) • (lot, a)
  58. What about semantics? • Meaning of words is lost •

    Distance(wine, beer) = Distance(wine, own) • Distributed representations can help • Reduce the dimensionality footprint • Semantics encoded as “proximity” • Word2Vec • Start with “random” word representations with dimension d • From the representation of a given word predict a randomly sampled context word I [0.8, 0.12, …] like … beer … a … lot …
  59. What about semantics? • Meaning of words is lost •

    Distance(wine, beer) = Distance(wine, own) • Distributed representations can help • Reduce the dimensionality footprint • Semantics encoded as “proximity” • Word2Vec • Start with “random” word representations with dimension d • From the representation of a given word predict a randomly sampled context word I [0.8, 0.12, …] like … beer … a … lot …
  60. What about semantics? • Meaning of words is lost •

    Distance(wine, beer) = Distance(wine, own) • Distributed representations can help • Reduce the dimensionality footprint • Semantics encoded as “proximity” • Word2Vec • Start with “random” word representations with dimension d • From the representation of a given word predict a randomly sampled context word I [0.8, 0.12, …] like … beer … a … lot … Shady Mathy Stuff
  61. What about semantics? • Meaning of words is lost •

    Distance(wine, beer) = Distance(wine, own) • Distributed representations can help • Reduce the dimensionality footprint • Semantics encoded as “proximity” • Word2Vec • Start with “random” word representations with dimension d • From the representation of a given word predict a randomly sampled context word I [0.8, 0.12, …] like … beer … a … lot … Shady Mathy Stuff Score(I) Score(like) Score(beer) … …
  62. What about semantics? • Meaning of words is lost •

    Distance(wine, beer) = Distance(wine, own) • Distributed representations can help • Reduce the dimensionality footprint • Semantics encoded as “proximity” • Word2Vec • Start with “random” word representations with dimension d • From the representation of a given word predict a randomly sampled context word • Change the word representation in the direction that helps predicting the context word I [0.8, 0.12, …] like … beer … a … lot … Shady Mathy Stuff Score(I) Score(like) Score(beer) … …
  63. What about semantics? • Meaning of words is lost •

    Distance(wine, beer) = Distance(wine, own) • Distributed representations can help • Reduce the dimensionality footprint • Semantics encoded as “proximity” • Word2Vec • Start with “random” word representations with dimension d • From the representation of a given word predict a randomly sampled context word • Change the word representation in the direction that helps predicting the context word I [0.7, 0.15, …] like … beer … a … lot … Shady Mathy Stuff Error(I) Error(like) Error(beer) … …
  64. But I’m here to see Python code >>> jokes[20] 


    "Why do you never see elephants hiding in trees? 'Cause they are freaking good at it"
  65. But I’m here to see Python code • Pre-Processing •

    Transform into word lists >>> import nltk >>> nltk.download('punkt') >>> tokenizer = nltk.data.load('english.pickle') >>> jokes_sentences = tokenizer.tokenize_sents(jokes) >>> jokes_sentences[20] 
 ['Why do you never see elephants hiding in trees?', "'Cause they are freaking good at it"]
  66. But I’m here to see Python code >>> def sentence_to_wordlist(raw):

    ... clean = re.sub(r'[^a-zA-Z]', ' ', raw) ... words = clean.split() ... words_lower = [w.lower() for w in words] ... return words_lower 
 >>> jokes_word_lists = [ [sentence_to_wordlist(str(s)) for s in joke if len(s) > 0] for joke in jokes_sentences] 
 >>> jokes_word_lists[20] 
 [['why', 'do', 'you', 'never', 'see', 'elephants', 'hiding', 'in', 'trees'], ['cause', 'they', 'are', 'freaking', 'good', 'at', 'it']] • Pre-Processing • Transform into word lists
  67. But I’m here to see Python code >>> from nltk.corpus

    import stopwords >>> nltk.download('stopwords') >>> stop_ws = set(stopwords.words('english')) >>> jokes_non_stop_word_lists = [[[word for word in s if word not in stop_ws] for s in joke] for joke in jokes_word_lists] >>> jokes_non_stop_word_lists[20] 
 [['never', 'see', 'elephants', 'hiding', 'trees'], ['cause', 'freaking', 'good']] • Pre-Processing • Transform into word lists • Remove Stop Words
  68. But I’m here to see Python code >>> from nltk.stem

    import PorterStemmer >>> porter = PorterStemmer() >>> jokes_stemmed = [[[porter.stem(w) for w in s] for s in joke] for joke in jokes_non_stop_word_lists] >>> jokes_stemmed[20] 
 [['never', 'see', 'eleph', 'hide', 'tree'], ['caus', 'freak', 'good']] • Pre-Processing • Transform into word lists • Remove Stop Words • Stemming
  69. But I’m here to see Python code >>> jokes_flatten =

    [s for j in jokes_stemmed for s in j] >>> num_features = 300 # dimensionality of the resulting word vectors >>> min_word_count = 3 # minimum word count threshold >>> num_workers = multiprocessing.cpu_count() # number of threads to # run in parallel >>> context_size = 7 # context window length >>> downsampling = 1e-3 # downlsample setting for frequent words >>> seed = 42 # seed for the rng, to make the results reproducible >>> sg = 1 # skip-gram (instead of CBOW) 
 >>> jokes2vec = w2v.Word2Vec( ... sg=sg, ... seed=seed, ... workers=num_workers, ... size=num_features, ... min_count=min_word_count, ... window=context_size, ... sample=downsampling ... ) >>> jokes2vec.build_vocab(jokes_flatten) >>> jokes2vec.train(jokes_flatten, total_examples=jokes2vec.corpus_count, epochs=jokes2vec.iter) • Pre-Processing • Transform into word lists • Remove Stop Words • Stemming • Obtain word embeddings
  70. But I’m here to see Python code >>> import seaborn

    as sns >>> sns.set(style="darkgrid") >>> from sklearn.manifold import TSNE >>> tsne = TSNE(n_components=2, random_state=seed) >>> all_word_vectors_matrix = jokes2vec.wv.syn0 >>> all_word_vectors_matrix_2d = tsne.fit_transform(all_word_vectors_matrix) >>> points = pd.DataFrame( ... [ ... (word, coords[0], coords[1]) ... for word, coords in [ ... (word, all_word_vectors_matrix_2d[jokes2vec.wv.vocab[word].index]) ... for word in jokes2vec.wv.vocab ... ] ... ], ... columns=['word', 'x', 'y'] ... ) 
 >>> points.plot.scatter('x', 'y', s=10, figsize=(20, 12), alpha=0.6) • Pre-Processing • Transform into word lists • Remove Stop Words • Stemming • Obtain word embeddings • Visualize
  71. But I’m here to see Python code • Pre-Processing •

    Transform into word lists • Remove Stop Words • Stemming • Obtain word embeddings • Visualize
  72. But I’m here to see Python code • Pre-Processing •

    Transform into word lists • Remove Stop Words • Stemming • Obtain word embeddings • Visualize
  73. But I’m here to see Python code • Pre-Processing •

    Transform into word lists • Remove Stop Words • Stemming • Obtain word embeddings • Visualize
  74. But I’m here to see Python code • Pre-Processing •

    Transform into word lists • Remove Stop Words • Stemming • Obtain word embeddings • Visualize
  75. But I’m here to see Python code • Pre-Processing •

    Transform into word lists • Remove Stop Words • Stemming • Obtain word embeddings • Visualize
  76. But I’m here to see Python code • Pre-Processing •

    Transform into word lists • Remove Stop Words • Stemming • Obtain word embeddings • Visualize
  77. But I’m here to see Python code • Pre-Processing •

    Transform into word lists • Remove Stop Words • Stemming • Obtain word embeddings • Visualize
  78. But I’m here to see Python code • Pre-Processing •

    Transform into word lists • Remove Stop Words • Stemming • Obtain word embeddings • Visualize • Embeddings “algebra”
  79. But I’m here to see Python code >>> jokes2vec.most_similar('facebook’) [('fb',

    0.7791515588760376), ('unfriend', 0.7512669563293457), ('status', 0.7433165907859802), ('myspac', 0.7160271406173706), ('notif', 0.6782281398773193), ('retweet', 0.6745551824569702), ('timelin', 0.672653079032898), ('twitter', 0.6709973812103271), ('privaci', 0.6695473194122314), ('linkedin', 0.6655823588371277)] • Pre-Processing • Transform into word lists • Remove Stop Words • Stemming • Obtain word embeddings • Visualize • Embeddings “algebra” • Semantic similarity
  80. But I’m here to see Python code >>> jokes2vec.most_similar('lol’) [('lmao',

    0.7619101405143738), ('tho', 0.7015952467918396), ('haha', 0.6999001502990723), ('hahaha', 0.6714984178543091), ('omg', 0.6711198091506958), ('pl', 0.6587743163108826), ('bc', 0.6558701992034912), ('gona', 0.6529208421707153), ('ppl', 0.6476595401763916), ('yea', 0.6466178894042969)] • Pre-Processing • Transform into word lists • Remove Stop Words • Stemming • Obtain word embeddings • Visualize • Embeddings “algebra” • Semantic similarity
  81. But I’m here to see Python code >>> jokes2vec.most_similar('sex’) [('anal',

    0.5785183906555176), ('unprotect', 0.5359092950820923), ('foreplay', 0.5343884825706482), ('brussel', 0.5324864387512207), ('foursom', 0.5289731025695801), ('twosom', 0.5187283158302307), ('threesom', 0.5119856595993042), ('geneticist', 0.5064876079559326), ('intercours', 0.5030955076217651), ('oral', 0.5015446543693542)] • Pre-Processing • Transform into word lists • Remove Stop Words • Stemming • Obtain word embeddings • Visualize • Embeddings “algebra” • Semantic similarity
  82. But I’m here to see Python code >>> def nearest_similarity_cosmul(start1,

    end1, end2): ... similarities = jokes2vec.most_similar_cosmul( ... positive=[end2, start1], ... negative=[end1] ... ) ... start2 = similarities[0][0] ... print('{start1} is related to {end1}, as {start2} is related to {end2}'.format(**locals())) >>> nearest_similarity_cosmul('dude', 'man', 'woman’) "dude is related to man, as chick is related to woman" • Pre-Processing • Transform into word lists • Remove Stop Words • Stemming • Obtain word embeddings • Visualize • Embeddings “algebra” • Semantic similarity • Linear relationships
  83. Does my computer has better sense of humor than I

    do? “Comparing yourself to others is an act of violence against the self” Ivanla Vanzant
  84. Traditional Neural Networks ML Model Input Output • Input and

    output lengths are mostly pre-determined and static
  85. Traditional Neural Networks • Input and output lengths are mostly

    pre-determined and static • Weak for modelling sequences ML Model Input Output
  86. Traditional Neural Networks • Input and output lengths are mostly

    pre-determined and static • Weak for modelling sequences • Recurrent Neural Networks
  87. Traditional Neural Networks • Input and output lengths are mostly

    pre-determined and static • Weak for modelling sequences • Recurrent Neural Networks • Enable the model to keep a memory between executions
  88. Traditional Neural Networks • Input and output lengths are mostly

    pre-determined and static • Weak for modelling sequences • Recurrent Neural Networks • Enable the model to keep a memory between executions ML Model Input Output
  89. Traditional Neural Networks • Input and output lengths are mostly

    pre-determined and static • Weak for modelling sequences • Recurrent Neural Networks • Enable the model to keep a memory between executions ML Model Input Output
  90. Traditional Neural Networks • Input and output lengths are mostly

    pre-determined and static • Weak for modelling sequences • Recurrent Neural Networks • Enable the model to keep a memory between executions
  91. Traditional Neural Networks • Input and output lengths are mostly

    pre-determined and static • Weak for modelling sequences • Recurrent Neural Networks • Enable the model to keep a memory between executions ML Model Input 1 Output 1
  92. Traditional Neural Networks • Input and output lengths are mostly

    pre-determined and static • Weak for modelling sequences • Recurrent Neural Networks • Enable the model to keep a memory between executions ML Model Input 1 Output 1
  93. Traditional Neural Networks • Input and output lengths are mostly

    pre-determined and static • Weak for modelling sequences • Recurrent Neural Networks • Enable the model to keep a memory between executions ML Model Input 1 Output 1 ML Model Input 2 Output 2
  94. Traditional Neural Networks • Input and output lengths are mostly

    pre-determined and static • Weak for modelling sequences • Recurrent Neural Networks • Enable the model to keep a memory between executions ML Model Input 1 Output 1 ML Model Input 2 Output 2 ML Model Input 3 Output 3
  95. Traditional Neural Networks • Input and output lengths are mostly

    pre-determined and static • Weak for modelling sequences • Recurrent Neural Networks • Enable the model to keep a memory between executions ML Model Ø I ML Model I like ML Model like beer
  96. Traditional Neural Networks • Input and output lengths are mostly

    pre-determined and static • Weak for modelling sequences • Recurrent Neural Networks • Enable the model to keep a memory between executions ML Model ML Model ML Model • RNNs are very versatile
  97. Traditional Neural Networks • Input and output lengths are mostly

    pre-determined and static • Weak for modelling sequences • Recurrent Neural Networks • Enable the model to keep a memory between executions ML Model Input 1 Output 1 ML Model Input 2 Output 2 ML Model Input 3 Output 3 • RNNs are very versatile • Many to many
  98. Traditional Neural Networks • Input and output lengths are mostly

    pre-determined and static • Weak for modelling sequences • Recurrent Neural Networks • Enable the model to keep a memory between executions ML Model Input 1 ML Model Input 2 ML Model Input 3 Output 1 • RNNs are very versatile • Many to many • Many to one
  99. Traditional Neural Networks • Input and output lengths are mostly

    pre-determined and static • Weak for modelling sequences • Recurrent Neural Networks • Enable the model to keep a memory between executions ML Model Input 1 Output 1 ML Model Output 2 ML Model Output 3 • RNNs are very versatile • Many to many • Many to one • One to many
  100. AI-Powered Jokes ML Model b ML Model e • How?

    Generate character by character
  101. AI-Powered Jokes ML Model b ML Model e e •

    How? Generate character by character
  102. AI-Powered Jokes ML Model b ML Model e ML Model

    e • How? Generate character by character
  103. AI-Powered Jokes • How? Generate character by character • A

    seed is provides context for the network ML Model b ML Model e ML Model e r
  104. Stop poking my brain and show 
 me code!!! •

    Pre-Processing • Character 㲗 Integer Index >>> jokes_concat = ''.join(jokes) >>> chars = sorted(list(set(jokes_concat))) >>> idx2char = chars >>> char2idx = {c: i for i, c in enumerate(idx2char)}
  105. Stop poking my brain and show 
 me code!!! •

    Pre-Processing • Character 㲗 Integer Index • Training set preparation >>> jokes_sample = random.sample(jokes, math.floor(len(jokes) * 0.5)) >>> maxlen = 40 >>> step = 3 >>> sentences = [] >>> next_chars = [] >>> for joke in jokes_sample: >>> if len(joke) < (maxlen + 1): >>> continue >>> for i in range(0, len(joke) - maxlen, step): >>> sentences.append(joke[i:i+maxlen]) >>> next_chars.append(joke[i+maxlen])
  106. Stop poking my brain and show 
 me code!!! •

    Pre-Processing • Character 㲗 Integer Index • Training set preparation • Convert into integers >>> x = np.zeros((len(sentences), maxlen, len(chars)), dtype=np.bool) >>> y = np.zeros((len(sentences), len(chars)), dtype=np.bool) >>> for i, sentence in enumerate(sentences): >>> for j, char in enumerate(sentence): >>> x[i, j, char2idx[char]] = 1. >>> y[i, char2idx[next_chars[i]]] = 1. >>> idx = np.random.permutation(len(x)) >>> x = x[idx] >>> y = y[idx]
  107. Stop poking my brain and show 
 me code!!! •

    Pre-Processing • Character 㲗 Integer Index • Training set preparation • Convert into integers • Define the model >>> from keras.models import Sequential >>> from keras.layers import Dense, Dropout, LSTM >>> model = Sequential() >>> model.add(LSTM(256, ... input_shape=(maxlen, len(chars)), ... return_sequences=True)) >>> model.add(Dropout(0.2)) >>> model.add(LSTM(256)) >>> model.add(Dropout(0.2)) >>> model.add(Dense(len(chars), activation='softmax')) >>> model.compile(loss='categorical_crossentropy’, ... optimizer='adam’, ... metrics=['categorical_crossentropy'])
  108. Stop poking my brain and show 
 me code!!! •

    Pre-Processing • Character 㲗 Integer Index • Training set preparation • Convert into integers • Define the model • Train the model >>> model.fit(x, y, batch_size=128, epochs=8)
  109. Stop poking my brain and show 
 me code!!! •

    Pre-Processing • Character 㲗 Integer Index • Training set preparation • Convert into integers • Define the model • Train the model >>> model.fit(x, y, batch_size=128, epochs=8)
  110. Are the jokes any good? • Using the start of

    an existing joke as seed Example: How many dead hookers does it take to sc
  111. Are the jokes any good? • Using the start of

    an existing joke as seed Example: How many dead hookers does it take to screw in a light bulb? Two, but it was the same time.
  112. Are the jokes any good? • Using the start of

    an existing joke as seed • Sentence structure is surprisingly good Example: How many dead hookers does it take to screw in a light bulb? Two, but it was the same time.
  113. Are the jokes any good? • Using the start of

    an existing joke as seed • Sentence structure is surprisingly good • Coherence with the answer is a bit lacking Example: How many dead hookers does it take to screw in a light bulb? Two, but it was the same time.
  114. Are the jokes any good? • Using the start of

    an existing joke as seed • Sentence structure is surprisingly good • Coherence with the answer is a bit lacking • A weirder example
  115. Are the jokes any good? • Using the start of

    an existing joke as seed • Sentence structure is surprisingly good • Coherence with the answer is a bit lacking • A weirder example Example: Am I the only one who closes the silverw
  116. Are the jokes any good? • Using the start of

    an existing joke as seed • Sentence structure is surprisingly good • Coherence with the answer is a bit lacking • A weirder example Example: Am I the only one who closes the silverwine costume have sex with Dram the ground? Prostitute too scoring out wrenking out running into a teeth in other people? Are you surprised whereed? Chocolate Good!!!!!!!!!!!!? Don't you buy me? nit-te
  117. Are the jokes any good? • Using the start of

    an existing joke as seed • Sentence structure is surprisingly good • Coherence with the answer is a bit lacking • A weirder example • Custom seed
  118. Are the jokes any good? • Using the start of

    an existing joke as seed • Sentence structure is surprisingly good • Coherence with the answer is a bit lacking • A weirder example • Custom start • A letter Example: D
  119. Are the jokes any good? • Using the start of

    an existing joke as seed • Sentence structure is surprisingly good • Coherence with the answer is a bit lacking • A weirder example • Custom start • A letter Example: Do you have a beer from starbucks? An asshole in the back of the back, but I leave the same time. I think I was all day. The barman says "I don't know what the fucking card move in the back of his life
  120. Are the jokes any good? • Using the start of

    an existing joke as seed • Sentence structure is surprisingly good • Coherence with the answer is a bit lacking • A weirder example • Custom start • A letter • “What do you call” jokes
  121. Are the jokes any good? • Using the start of

    an existing joke as seed • Sentence structure is surprisingly good • Coherence with the answer is a bit lacking • A weirder example • Custom start • A letter • “What do you call” jokes Examples: • What do you call a nut on house? A coint in his circus. • What do you call a bill on the oven? A condom. • What do you call a blowjob? A social storm. • What do you call a shit of country in a car? A lifetime experience. • What do you call a dog device? A garbanzo bean with a curry. • What do you call a disappointment on the bathroom? A pilot, you can't take a shit. • What do you call a dialogast? A farmer. • What do you call a dog first? A sandwich. • What do you call a dick on his shoes? A woman in the stairs.
  122. Today I learned Topics covered •Distributed representations are great for

    categorical fields •Word2Vec and LSTM rules •Python rules Current trends •Divide and Conquer •Attention •Generative Adversarial Networks •Sequence-to-Sequence •RNN-based Variational Auto Encoder Potential paths of exploration •Try using attention and longer spans of memory. •Active learning classifier for jokes quality •Do the same for motivational quotes
  123. And what now? I’m a Software Engineer Fast.ai Machine Learning

    for Coders Seek problems on your surrounding and do a POC Python rules I am a Data Scientist DeepLearning.ai Deep Learning Specialization Develop Probability and Statistics •Udacity Intro to Descriptive Statistics •Udacity Intro to Inferential Statistics I went through the wrong door and noticed free beer Enjoy it :D
  124. Bibliography • https://en.wikipedia.org/wiki/Theories_of_humor • https://en.wikipedia.org/wiki/Humor_research • https://en.wikipedia.org/wiki/Computational_humor • https://www.iflscience.com/technology/ais-attempts- at-oneliner-jokes-are-unintentionally-hilarious/

    • https://motherboard.vice.com/en_us/article/z43nke/ joke-telling-robots-are-the-final-frontier-of-artificial- intelligence • https://medium.com/@davidolarinoye/will-ai-ever-be- able-to-make-a-joke-808a656b53a6 • https://journals.sagepub.com/doi/pdf/ 10.1177/147470490600400129 • https://towardsdatascience.com/word2vec-skip-gram- model-part-1-intuition-78614e4d6e0b • https://www.analyticsvidhya.com/blog/2017/06/word- embeddings-count-word2veec/ • https://www.kaggle.com/abhinavmoudgil95/short- jokes • https://arxiv.org/pdf/1708.02709.pdf • https://journals.sagepub.com/doi/pdf/ 10.1177/147470490600400129 • http://blog.aylien.com/overview-word-embeddings- history-word2vec-cbow-glove/