Slide 1

Slide 1 text

Can my computer make jokes? by Diogo Pinto 2018-11-28

Slide 2

Slide 2 text

Can my computer make jokes? by Diogo Pinto 2018-11-28

Slide 3

Slide 3 text

About me • Trained as Software Engineer • Grown into a Data Scientist generalist • Other interests • Blockchain • Distributed systems • (Cyber- and non-cyber-) Security • Psychology and Philosophy • Kung Fu and Parkour practitioner #nofilters @diogojapinto /in/diogojapinto/ [email protected]

Slide 4

Slide 4 text

@diogojapinto /in/diogojapinto/ [email protected] About me • Trained as Software Engineer • Grown into a Data Scientist generalist • Other interests • Blockchain • Distributed systems • (Cyber- and non-cyber-) Security • Psychology and Philosophy • Kung Fu and Parkour practitioner #nofilters

Slide 5

Slide 5 text

@diogojapinto /in/diogojapinto/ [email protected] About me • Trained as Software Engineer • Grown into a Data Scientist generalist • Other interests • Blockchain • Distributed systems • (Cyber- and non-cyber-) Security • Psychology and Philosophy • Kung Fu and Parkour practitioner #nofilters

Slide 6

Slide 6 text

The plan for today What is the role of Humor? How can a computer use words? Does my computer has better sense of humor than I do? What are the take-aways?

Slide 7

Slide 7 text

“Know your audience Luke”

Slide 8

Slide 8 text

“Know your audience Luke” • Who here is aware of recent Machine Learning achievements?

Slide 9

Slide 9 text

“Know your audience Luke” • Who here is aware of recent Machine Learning achievements? • Who here has some intuition behind the inner workings of Neural Networks (aka Differentiable Programming)?

Slide 10

Slide 10 text

“Know your audience Luke” • Who here is aware of recent Machine Learning achievements? • Who here has some intuition behind the inner workings of Neural Networks (aka Differentiable Programming)? • Who here has a dark sense of humor?

Slide 11

Slide 11 text

What is the role of Humor? “A day without laughter is a day wasted” Charlie Chaplin

Slide 12

Slide 12 text

History of Humor [1] [1] The First Joke: Exploring the Evolutionary Origins of Humor

Slide 13

Slide 13 text

History of Humor [1] • Humor is complex and a high cognitive function [1] The First Joke: Exploring the Evolutionary Origins of Humor

Slide 14

Slide 14 text

History of Humor [1] • Humor is complex and a high cognitive function ● Nuanced verbal phrasing + Prevailing social dynamics [1] The First Joke: Exploring the Evolutionary Origins of Humor

Slide 15

Slide 15 text

History of Humor [1] • Humor is complex and a high cognitive function ● Nuanced verbal phrasing + Prevailing social dynamics ● Can combine language skills, theory-of-mind, symbolism, abstract thinking, social perception… [1] The First Joke: Exploring the Evolutionary Origins of Humor

Slide 16

Slide 16 text

History of Humor [1] • Humor is complex and a high cognitive function ● Nuanced verbal phrasing + Prevailing social dynamics ● Can combine language skills, theory-of-mind, symbolism, abstract thinking, social perception… • The basic ability seems “instinctive” [1] The First Joke: Exploring the Evolutionary Origins of Humor

Slide 17

Slide 17 text

History of Humor [1] • Humor is complex and a high cognitive function ● Nuanced verbal phrasing + Prevailing social dynamics ● Can combine language skills, theory-of-mind, symbolism, abstract thinking, social perception… • The basic ability seems “instinctive” ● Ubiquitous and Universal [1] The First Joke: Exploring the Evolutionary Origins of Humor

Slide 18

Slide 18 text

History of Humor [1] • Humor is complex and a high cognitive function ● Nuanced verbal phrasing + Prevailing social dynamics ● Can combine language skills, theory-of-mind, symbolism, abstract thinking, social perception… • The basic ability seems “instinctive” ● Ubiquitous and Universal ● It is probably coded somehow in our genetic code ● People laugh without appreciation for the causal factors [1] The First Joke: Exploring the Evolutionary Origins of Humor

Slide 19

Slide 19 text

History of Humor [1] • Humor is complex and a high cognitive function ● Nuanced verbal phrasing + Prevailing social dynamics ● Can combine language skills, theory-of-mind, symbolism, abstract thinking, social perception… • The basic ability seems “instinctive” ● Ubiquitous and Universal ● It is probably coded somehow in our genetic code ● People laugh without appreciation for the causal factors • Humor dates back thousands of years [1] The First Joke: Exploring the Evolutionary Origins of Humor

Slide 20

Slide 20 text

History of Humor [1] • Humor is complex and a high cognitive function ● Nuanced verbal phrasing + Prevailing social dynamics ● Can combine language skills, theory-of-mind, symbolism, abstract thinking, social perception… • The basic ability seems “instinctive” ● Ubiquitous and Universal ● It is probably coded somehow in our genetic code ● People laugh without appreciation for the causal factors • Humor dates back thousands of years ● Greek “laughing philosopher” Democritus [1] The First Joke: Exploring the Evolutionary Origins of Humor

Slide 21

Slide 21 text

History of Humor [1] • Humor is complex and a high cognitive function ● Nuanced verbal phrasing + Prevailing social dynamics ● Can combine language skills, theory-of-mind, symbolism, abstract thinking, social perception… • The basic ability seems “instinctive” ● Ubiquitous and Universal ● It is probably coded somehow in our genetic code ● People laugh without appreciation for the causal factors • Humor dates back thousands of years ● Greek “laughing philosopher” Democritus ● Humor conversations observed in Australian aboriginals ● Lived genetically isolated for at least 35000 years [1] The First Joke: Exploring the Evolutionary Origins of Humor

Slide 22

Slide 22 text

Why Humor is relevant for Machine Learning[1] • Humor is complex and a high cognitive function ● Nuanced verbal phrasing + Prevailing social dynamics ● Can combine language skills, theory-of-mind, symbolism, abstract thinking, social perception… • The basic ability seems “instinctive” ● Ubiquitous and Universal ● It is probably coded somehow in our genetic code ● People laugh without appreciation for the causal factors • Humor dates back thousands of years ● Greek “laughing philosopher” Democritus ● Humor conversations observed in Australian aboriginals ● Lived genetically isolated for at least 35000 years [1] The First Joke: Exploring the Evolutionary Origins of Humor

Slide 23

Slide 23 text

Let’s look at the data

Slide 24

Slide 24 text

Let’s look at the data • Dataset of short jokes from Kaggle user avmoudgil95

Slide 25

Slide 25 text

Let’s look at the data • Dataset of short jokes from Kaggle user avmoudgil95 • E.g.: ● It's crazy how my ex was so upset about losing me that he had to build a life with a new woman.

Slide 26

Slide 26 text

Let’s look at the data • Dataset of short jokes from Kaggle user avmoudgil95 • E.g.: ● It's crazy how my ex was so upset about losing me that he had to build a life with a new woman.

Slide 27

Slide 27 text

Let’s look at the data • Dataset of short jokes from Kaggle user avmoudgil95 • E.g.: ● It's crazy how my ex was so upset about losing me that he had to build a life with a new woman. ● Where does Noah keep his bees? In the Ark Hives

Slide 28

Slide 28 text

Let’s look at the data • Dataset of short jokes from Kaggle user avmoudgil95 • E.g.: ● It's crazy how my ex was so upset about losing me that he had to build a life with a new woman. ● Where does Noah keep his bees? In the Ark Hives

Slide 29

Slide 29 text

Let’s look at the data • Dataset of short jokes from Kaggle user avmoudgil95 • E.g.: ● It's crazy how my ex was so upset about losing me that he had to build a life with a new woman. ● Where does Noah keep his bees? In the Ark Hives ● What sex position produces the ugliest children? Ask your mother.

Slide 30

Slide 30 text

Let’s look at the data • Dataset of short jokes from Kaggle user avmoudgil95 • E.g.: ● It's crazy how my ex was so upset about losing me that he had to build a life with a new woman. ● Where does Noah keep his bees? In the Ark Hives ● What sex position produces the ugliest children? Ask your mother.

Slide 31

Slide 31 text

Let’s look at the data • Dataset of short jokes from Kaggle user avmoudgil95 • E.g.: ● It's crazy how my ex was so upset about losing me that he had to build a life with a new woman. ● Where does Noah keep his bees? In the Ark Hives ● What sex position produces the ugliest children? Ask your mother. ● Chuck Norris doesn't have blood. He is filled with magma.

Slide 32

Slide 32 text

Let’s look at the data • Dataset of short jokes from Kaggle user avmoudgil95 • E.g.: ● It's crazy how my ex was so upset about losing me that he had to build a life with a new woman. ● Where does Noah keep his bees? In the Ark Hives ● What sex position produces the ugliest children? Ask your mother. ● Chuck Norris doesn't have blood. He is filled with magma.

Slide 33

Slide 33 text

Let’s look at the data • Dataset of short jokes from Kaggle user avmoudgil95 • E.g.: ● It's crazy how my ex was so upset about losing me that he had to build a life with a new woman. ● Where does Noah keep his bees? In the Ark Hives ● What sex position produces the ugliest children? Ask your mother. ● Chuck Norris doesn't have blood. He is filled with magma. • Length:

Slide 34

Slide 34 text

How can a computer use words? “A picture may be worth a thousand words, but well chosen, words will take you where pictures never can” Unknown

Slide 35

Slide 35 text

Why does representation matter?

Slide 36

Slide 36 text

Why does representation matter? Sender country risky “fee” count Spam Detector Machine Learning Model Words count

Slide 37

Slide 37 text

Why does representation matter? Sender country risky “fee” count Spam Detector Machine Learning Model Words count Sender country risky “fee” count Words count Spam? Email 1 True 0 203 True Email 2 False 2 345 False Email 3 True 10 180 True

Slide 38

Slide 38 text

Why does representation matter? Sender country risky “fee” count Spam Detector Machine Learning Model Words count Sender country risky “fee” count Words count “prince” count Spam? Email 1 True 0 203 5 True Email 2 False 2 345 0 False Email 3 True 10 180 3 True

Slide 39

Slide 39 text

Why does representation matter? Sender country risky “fee” count Spam Detector Machine Learning Model Words count Sender country risky “fee” count Words count Spam? Email 1 True 0 203 True Email 2 False 2 345 False Email 3 True 10 180 True

Slide 40

Slide 40 text

Why does representation matter? Sender country risky “fee” count Spam Detector Machine Learning Model Words count Sender country risky “fee” count Spam? Email 1 True 0 True Email 2 False 2 False Email 3 True 10 True

Slide 41

Slide 41 text

Why does representation matter? Sender country risky “fee” count Spam Detector Machine Learning Model Words count Sender country risky “fee” count Words count Spam? Email 1 True 0 203 True Email 2 False 2 345 False Email 3 True 10 180 True Data entries should be comparable and consistent

Slide 42

Slide 42 text

The problem of Words Representation

Slide 43

Slide 43 text

The problem of Words Representation • An array of characters, each one a byte • Example: “I like beer a lot” “I own a lot of wine” • Representation: 49206c696b6520626565722061206c6f74 49206f776e2061206c6f74206f662077696e65

Slide 44

Slide 44 text

The problem of Words Representation • An array of characters, each one a byte ● Variable length ● Difficult comparison between entries • Example: “I like beer a lot” “I own a lot of wine” • Representation: 49206c696b6520626565722061206c6f74 49206f776e2061206c6f74206f662077696e65

Slide 45

Slide 45 text

The problem of Words Representation • An array of characters, each one a byte ● Variable length ● Difficult comparison between entries • Bag-of-Words • Example: “I like beer a lot” “I own a lot of wine” • Representation: I like beer a lot own of wine 1 1 1 1 1 0 0 0 1 0 0 1 1 1 1 1

Slide 46

Slide 46 text

The problem of Words Representation • An array of characters, each one a byte ● Variable length ● Difficult comparison between entries • Bag-of-Words ● By discarding order we are able to generalize • Example: “I like beer a lot” “I own a lot of wine” • Representation: I like beer a lot own of wine 1 1 1 1 1 0 0 0 1 0 0 1 1 1 1 1

Slide 47

Slide 47 text

The problem of Words Representation • An array of characters, each one a byte ● Variable length ● Difficult comparison between entries • Bag-of-Words ● By discarding order we are able to generalize • Term Frequency – Inverse Document Frequency • Example: “I like beer a lot” “I own a lot of wine” • Representation: I like beer a lot own of wine 0.1 0.3 0.6 0.1 0.4 0 0 0 0.1 0 0 0.1 0.4 0.5 0.1 0.7

Slide 48

Slide 48 text

The problem of Words Representation • An array of characters, each one a byte ● Variable length ● Difficult comparison between entries • Bag-of-Words ● By discarding order we are able to generalize • Term Frequency – Inverse Document Frequency ● Words frequency in document • Example: “I like beer a lot” “I own a lot of wine” • Representation: I like beer a lot own of wine 0.1 0.3 0.6 0.1 0.4 0 0 0 0.1 0 0 0.1 0.4 0.5 0.1 0.7

Slide 49

Slide 49 text

The problem of Words Representation • An array of characters, each one a byte ● Variable length ● Difficult comparison between entries • Bag-of-Words ● By discarding order we are able to generalize • Term Frequency – Inverse Document Frequency ● Words frequency in document ● Rarity of words across documents • Example: “I like beer a lot” “I own a lot of wine” • Representation: I like beer a lot own of wine 0.1 0.3 0.6 0.1 0.4 0 0 0 0.1 0 0 0.1 0.4 0.5 0.1 0.7

Slide 50

Slide 50 text

The problem of Words Representation • An array of characters, each one a byte ● Variable length ● Difficult comparison between entries • Bag-of-Words ● By discarding order we are able to generalize • Term Frequency – Inverse Document Frequency ● Words frequency in document ● Rarity of words across documents • Example: “I like beer a lot” “I own a lot of wine” • Representation: I like beer a lot own of wine 0.1 0.3 0.6 0.1 0.4 0 0 0 0.1 0 0 0.1 0.4 0.5 0.1 0.7

Slide 51

Slide 51 text

The problem of Words Representation • An array of characters, each one a byte ● Variable length ● Difficult comparison between entries • Bag-of-Words ● By discarding order we are able to generalize • Term Frequency – Inverse Document Frequency ● Words frequency in document ● Rarity of words across documents • Example: “I like beer a lot” “I own a lot of wine” • Representation: I like beer a lot own of wine 0.1 0.3 0.6 0.1 0.4 0 0 0 0.1 0 0 0.1 0.4 0.5 0.1 0.7

Slide 52

Slide 52 text

What about semantics?

Slide 53

Slide 53 text

What about semantics? Doc I like beer a lot own of wine beer wine own

Slide 54

Slide 54 text

What about semantics? • Meaning of words is lost Doc I like beer a lot own of wine beer 0 0 1 0 0 0 0 0 wine 0 0 0 0 0 0 0 1 own 0 0 0 0 0 1 0 0

Slide 55

Slide 55 text

What about semantics? • Meaning of words is lost ● Distance(wine, beer) = Distance(wine, own) wine own beer Doc I like beer a lot own of wine beer 0 0 1 0 0 0 0 0 wine 0 0 0 0 0 0 0 1 own 0 0 0 0 0 1 0 0

Slide 56

Slide 56 text

What about semantics? • Meaning of words is lost ● Distance(wine, beer) = Distance(wine, own) • Distributed representations can help

Slide 57

Slide 57 text

What about semantics? • Meaning of words is lost ● Distance(wine, beer) = Distance(wine, own) • Distributed representations can help Doc D1 D2 beer 0.2 0.7 wine 0.1 0.8 own 0.8 0.1

Slide 58

Slide 58 text

What about semantics? • Meaning of words is lost ● Distance(wine, beer) = Distance(wine, own) • Distributed representations can help Doc D1 D2 beer 0.2 0.7 wine 0.1 0.8 own 0.8 0.1 0 0,3 0,5 0,8 1 0 0,2 0,4 0,6 0,8 wine own beer

Slide 59

Slide 59 text

What about semantics? • Meaning of words is lost ● Distance(wine, beer) = Distance(wine, own) • Distributed representations can help ● Reduce the dimensionality footprint ● Semantics encoded as “proximity” Doc D1 D2 beer 0.2 0.7 wine 0.1 0.8 own 0.8 0.1 0 0,3 0,5 0,8 1 0 0,2 0,4 0,6 0,8 wine own beer

Slide 60

Slide 60 text

What about semantics? • Meaning of words is lost ● Distance(wine, beer) = Distance(wine, own) • Distributed representations can help ● Reduce the dimensionality footprint ● Semantics encoded as “proximity” • Word2Vec ● Start with “random” word representations with dimension d

Slide 61

Slide 61 text

What about semantics? • Meaning of words is lost ● Distance(wine, beer) = Distance(wine, own) • Distributed representations can help ● Reduce the dimensionality footprint ● Semantics encoded as “proximity” • Word2Vec ● Start with “random” word representations with dimension d I [0.8, 0.12, …] like … beer … a … lot …

Slide 62

Slide 62 text

What about semantics? • Meaning of words is lost ● Distance(wine, beer) = Distance(wine, own) • Distributed representations can help ● Reduce the dimensionality footprint ● Semantics encoded as “proximity” • Word2Vec ● Start with “random” word representations with dimension d ● From the representation of a given word predict a randomly sampled context word

Slide 63

Slide 63 text

What about semantics? • Meaning of words is lost ● Distance(wine, beer) = Distance(wine, own) • Distributed representations can help ● Reduce the dimensionality footprint ● Semantics encoded as “proximity” • Word2Vec ● Start with “random” word representations with dimension d ● From the representation of a given word predict a randomly sampled context word I like beer a lot

Slide 64

Slide 64 text

What about semantics? • Meaning of words is lost ● Distance(wine, beer) = Distance(wine, own) • Distributed representations can help ● Reduce the dimensionality footprint ● Semantics encoded as “proximity” • Word2Vec ● Start with “random” word representations with dimension d ● From the representation of a given word predict a randomly sampled context word I like beer a lot • Training examples: ● (I, like)

Slide 65

Slide 65 text

What about semantics? • Meaning of words is lost ● Distance(wine, beer) = Distance(wine, own) • Distributed representations can help ● Reduce the dimensionality footprint ● Semantics encoded as “proximity” • Word2Vec ● Start with “random” word representations with dimension d ● From the representation of a given word predict a randomly sampled context word I like beer a lot • Training examples: ● (I, like) ● (like, I) ● (like, beer)

Slide 66

Slide 66 text

What about semantics? • Meaning of words is lost ● Distance(wine, beer) = Distance(wine, own) • Distributed representations can help ● Reduce the dimensionality footprint ● Semantics encoded as “proximity” • Word2Vec ● Start with “random” word representations with dimension d ● From the representation of a given word predict a randomly sampled context word I like beer a lot • Training examples: ● (I, like) ● (like, I) ● (like, beer) ● (beer, like) ● (beer, a)

Slide 67

Slide 67 text

What about semantics? • Meaning of words is lost ● Distance(wine, beer) = Distance(wine, own) • Distributed representations can help ● Reduce the dimensionality footprint ● Semantics encoded as “proximity” • Word2Vec ● Start with “random” word representations with dimension d ● From the representation of a given word predict a randomly sampled context word I like beer a lot • Training examples: ● (I, like) ● (like, I) ● (like, beer) ● (beer, like) ● (beer, a) ● (a, beer) ● (a, lot)

Slide 68

Slide 68 text

What about semantics? • Meaning of words is lost ● Distance(wine, beer) = Distance(wine, own) • Distributed representations can help ● Reduce the dimensionality footprint ● Semantics encoded as “proximity” • Word2Vec ● Start with “random” word representations with dimension d ● From the representation of a given word predict a randomly sampled context word I like beer a lot • Training examples: ● (I, like) ● (like, I) ● (like, beer) ● (beer, like) ● (beer, a) ● (a, beer) ● (a, lot) ● (lot, a)

Slide 69

Slide 69 text

What about semantics? • Meaning of words is lost ● Distance(wine, beer) = Distance(wine, own) • Distributed representations can help ● Reduce the dimensionality footprint ● Semantics encoded as “proximity” • Word2Vec ● Start with “random” word representations with dimension d ● From the representation of a given word predict a randomly sampled context word I [0.8, 0.12, …] like … beer … a … lot …

Slide 70

Slide 70 text

What about semantics? • Meaning of words is lost ● Distance(wine, beer) = Distance(wine, own) • Distributed representations can help ● Reduce the dimensionality footprint ● Semantics encoded as “proximity” • Word2Vec ● Start with “random” word representations with dimension d ● From the representation of a given word predict a randomly sampled context word I [0.8, 0.12, …] like … beer … a … lot …

Slide 71

Slide 71 text

What about semantics? • Meaning of words is lost ● Distance(wine, beer) = Distance(wine, own) • Distributed representations can help ● Reduce the dimensionality footprint ● Semantics encoded as “proximity” • Word2Vec ● Start with “random” word representations with dimension d ● From the representation of a given word predict a randomly sampled context word I [0.8, 0.12, …] like … beer … a … lot … Shady Mathy Stuff

Slide 72

Slide 72 text

What about semantics? • Meaning of words is lost ● Distance(wine, beer) = Distance(wine, own) • Distributed representations can help ● Reduce the dimensionality footprint ● Semantics encoded as “proximity” • Word2Vec ● Start with “random” word representations with dimension d ● From the representation of a given word predict a randomly sampled context word I [0.8, 0.12, …] like … beer … a … lot … Shady Mathy Stuff Score(I) Score(like) Score(beer) … …

Slide 73

Slide 73 text

What about semantics? • Meaning of words is lost ● Distance(wine, beer) = Distance(wine, own) • Distributed representations can help ● Reduce the dimensionality footprint ● Semantics encoded as “proximity” • Word2Vec ● Start with “random” word representations with dimension d ● From the representation of a given word predict a randomly sampled context word ● Change the word representation in the direction that helps predicting the context word I [0.8, 0.12, …] like … beer … a … lot … Shady Mathy Stuff Score(I) Score(like) Score(beer) … …

Slide 74

Slide 74 text

What about semantics? • Meaning of words is lost ● Distance(wine, beer) = Distance(wine, own) • Distributed representations can help ● Reduce the dimensionality footprint ● Semantics encoded as “proximity” • Word2Vec ● Start with “random” word representations with dimension d ● From the representation of a given word predict a randomly sampled context word ● Change the word representation in the direction that helps predicting the context word I [0.7, 0.15, …] like … beer … a … lot … Shady Mathy Stuff Error(I) Error(like) Error(beer) … …

Slide 75

Slide 75 text

But I’m here to see Python code

Slide 76

Slide 76 text

But I’m here to see Python code >>> jokes[20] 
 "Why do you never see elephants hiding in trees? 'Cause they are freaking good at it"

Slide 77

Slide 77 text

But I’m here to see Python code • Pre-Processing ● Transform into word lists >>> import nltk >>> nltk.download('punkt') >>> tokenizer = nltk.data.load('english.pickle') >>> jokes_sentences = tokenizer.tokenize_sents(jokes) >>> jokes_sentences[20] 
 ['Why do you never see elephants hiding in trees?', "'Cause they are freaking good at it"]

Slide 78

Slide 78 text

But I’m here to see Python code >>> def sentence_to_wordlist(raw): ... clean = re.sub(r'[^a-zA-Z]', ' ', raw) ... words = clean.split() ... words_lower = [w.lower() for w in words] ... return words_lower 
 >>> jokes_word_lists = [ [sentence_to_wordlist(str(s)) for s in joke if len(s) > 0] for joke in jokes_sentences] 
 >>> jokes_word_lists[20] 
 [['why', 'do', 'you', 'never', 'see', 'elephants', 'hiding', 'in', 'trees'], ['cause', 'they', 'are', 'freaking', 'good', 'at', 'it']] • Pre-Processing ● Transform into word lists

Slide 79

Slide 79 text

But I’m here to see Python code >>> from nltk.corpus import stopwords >>> nltk.download('stopwords') >>> stop_ws = set(stopwords.words('english')) >>> jokes_non_stop_word_lists = [[[word for word in s if word not in stop_ws] for s in joke] for joke in jokes_word_lists] >>> jokes_non_stop_word_lists[20] 
 [['never', 'see', 'elephants', 'hiding', 'trees'], ['cause', 'freaking', 'good']] • Pre-Processing ● Transform into word lists ● Remove Stop Words

Slide 80

Slide 80 text

But I’m here to see Python code >>> from nltk.stem import PorterStemmer >>> porter = PorterStemmer() >>> jokes_stemmed = [[[porter.stem(w) for w in s] for s in joke] for joke in jokes_non_stop_word_lists] >>> jokes_stemmed[20] 
 [['never', 'see', 'eleph', 'hide', 'tree'], ['caus', 'freak', 'good']] • Pre-Processing ● Transform into word lists ● Remove Stop Words ● Stemming

Slide 81

Slide 81 text

But I’m here to see Python code >>> jokes_flatten = [s for j in jokes_stemmed for s in j] >>> num_features = 300 # dimensionality of the resulting word vectors >>> min_word_count = 3 # minimum word count threshold >>> num_workers = multiprocessing.cpu_count() # number of threads to # run in parallel >>> context_size = 7 # context window length >>> downsampling = 1e-3 # downlsample setting for frequent words >>> seed = 42 # seed for the rng, to make the results reproducible >>> sg = 1 # skip-gram (instead of CBOW) 
 >>> jokes2vec = w2v.Word2Vec( ... sg=sg, ... seed=seed, ... workers=num_workers, ... size=num_features, ... min_count=min_word_count, ... window=context_size, ... sample=downsampling ... ) >>> jokes2vec.build_vocab(jokes_flatten) >>> jokes2vec.train(jokes_flatten, total_examples=jokes2vec.corpus_count, epochs=jokes2vec.iter) • Pre-Processing ● Transform into word lists ● Remove Stop Words ● Stemming • Obtain word embeddings

Slide 82

Slide 82 text

But I’m here to see Python code >>> import seaborn as sns >>> sns.set(style="darkgrid") >>> from sklearn.manifold import TSNE >>> tsne = TSNE(n_components=2, random_state=seed) >>> all_word_vectors_matrix = jokes2vec.wv.syn0 >>> all_word_vectors_matrix_2d = tsne.fit_transform(all_word_vectors_matrix) >>> points = pd.DataFrame( ... [ ... (word, coords[0], coords[1]) ... for word, coords in [ ... (word, all_word_vectors_matrix_2d[jokes2vec.wv.vocab[word].index]) ... for word in jokes2vec.wv.vocab ... ] ... ], ... columns=['word', 'x', 'y'] ... ) 
 >>> points.plot.scatter('x', 'y', s=10, figsize=(20, 12), alpha=0.6) • Pre-Processing ● Transform into word lists ● Remove Stop Words ● Stemming • Obtain word embeddings • Visualize

Slide 83

Slide 83 text

But I’m here to see Python code • Pre-Processing ● Transform into word lists ● Remove Stop Words ● Stemming • Obtain word embeddings • Visualize

Slide 84

Slide 84 text

But I’m here to see Python code • Pre-Processing ● Transform into word lists ● Remove Stop Words ● Stemming • Obtain word embeddings • Visualize

Slide 85

Slide 85 text

But I’m here to see Python code • Pre-Processing ● Transform into word lists ● Remove Stop Words ● Stemming • Obtain word embeddings • Visualize

Slide 86

Slide 86 text

But I’m here to see Python code • Pre-Processing ● Transform into word lists ● Remove Stop Words ● Stemming • Obtain word embeddings • Visualize

Slide 87

Slide 87 text

But I’m here to see Python code • Pre-Processing ● Transform into word lists ● Remove Stop Words ● Stemming • Obtain word embeddings • Visualize

Slide 88

Slide 88 text

But I’m here to see Python code • Pre-Processing ● Transform into word lists ● Remove Stop Words ● Stemming • Obtain word embeddings • Visualize

Slide 89

Slide 89 text

But I’m here to see Python code • Pre-Processing ● Transform into word lists ● Remove Stop Words ● Stemming • Obtain word embeddings • Visualize

Slide 90

Slide 90 text

But I’m here to see Python code • Pre-Processing ● Transform into word lists ● Remove Stop Words ● Stemming • Obtain word embeddings • Visualize • Embeddings “algebra”

Slide 91

Slide 91 text

But I’m here to see Python code >>> jokes2vec.most_similar('facebook’) [('fb', 0.7791515588760376), ('unfriend', 0.7512669563293457), ('status', 0.7433165907859802), ('myspac', 0.7160271406173706), ('notif', 0.6782281398773193), ('retweet', 0.6745551824569702), ('timelin', 0.672653079032898), ('twitter', 0.6709973812103271), ('privaci', 0.6695473194122314), ('linkedin', 0.6655823588371277)] • Pre-Processing ● Transform into word lists ● Remove Stop Words ● Stemming • Obtain word embeddings • Visualize • Embeddings “algebra” ● Semantic similarity

Slide 92

Slide 92 text

But I’m here to see Python code >>> jokes2vec.most_similar('lol’) [('lmao', 0.7619101405143738), ('tho', 0.7015952467918396), ('haha', 0.6999001502990723), ('hahaha', 0.6714984178543091), ('omg', 0.6711198091506958), ('pl', 0.6587743163108826), ('bc', 0.6558701992034912), ('gona', 0.6529208421707153), ('ppl', 0.6476595401763916), ('yea', 0.6466178894042969)] • Pre-Processing ● Transform into word lists ● Remove Stop Words ● Stemming • Obtain word embeddings • Visualize • Embeddings “algebra” ● Semantic similarity

Slide 93

Slide 93 text

But I’m here to see Python code >>> jokes2vec.most_similar('sex’) [('anal', 0.5785183906555176), ('unprotect', 0.5359092950820923), ('foreplay', 0.5343884825706482), ('brussel', 0.5324864387512207), ('foursom', 0.5289731025695801), ('twosom', 0.5187283158302307), ('threesom', 0.5119856595993042), ('geneticist', 0.5064876079559326), ('intercours', 0.5030955076217651), ('oral', 0.5015446543693542)] • Pre-Processing ● Transform into word lists ● Remove Stop Words ● Stemming • Obtain word embeddings • Visualize • Embeddings “algebra” ● Semantic similarity

Slide 94

Slide 94 text

But I’m here to see Python code >>> def nearest_similarity_cosmul(start1, end1, end2): ... similarities = jokes2vec.most_similar_cosmul( ... positive=[end2, start1], ... negative=[end1] ... ) ... start2 = similarities[0][0] ... print('{start1} is related to {end1}, as {start2} is related to {end2}'.format(**locals())) >>> nearest_similarity_cosmul('dude', 'man', 'woman’) "dude is related to man, as chick is related to woman" • Pre-Processing ● Transform into word lists ● Remove Stop Words ● Stemming • Obtain word embeddings • Visualize • Embeddings “algebra” ● Semantic similarity ● Linear relationships

Slide 95

Slide 95 text

Does my computer has better sense of humor than I do? “Comparing yourself to others is an act of violence against the self” Ivanla Vanzant

Slide 96

Slide 96 text

Traditional Neural Networks

Slide 97

Slide 97 text

Traditional Neural Networks • Input and output lengths are mostly pre-determined and static

Slide 98

Slide 98 text

Traditional Neural Networks ML Model Input Output • Input and output lengths are mostly pre-determined and static

Slide 99

Slide 99 text

Traditional Neural Networks • Input and output lengths are mostly pre-determined and static ● Weak for modelling sequences ML Model Input Output

Slide 100

Slide 100 text

Traditional Neural Networks • Input and output lengths are mostly pre-determined and static ● Weak for modelling sequences • Recurrent Neural Networks

Slide 101

Slide 101 text

Traditional Neural Networks • Input and output lengths are mostly pre-determined and static ● Weak for modelling sequences • Recurrent Neural Networks ● Enable the model to keep a memory between executions

Slide 102

Slide 102 text

Traditional Neural Networks • Input and output lengths are mostly pre-determined and static ● Weak for modelling sequences • Recurrent Neural Networks ● Enable the model to keep a memory between executions ML Model Input Output

Slide 103

Slide 103 text

Traditional Neural Networks • Input and output lengths are mostly pre-determined and static ● Weak for modelling sequences • Recurrent Neural Networks ● Enable the model to keep a memory between executions ML Model Input Output

Slide 104

Slide 104 text

Traditional Neural Networks • Input and output lengths are mostly pre-determined and static ● Weak for modelling sequences • Recurrent Neural Networks ● Enable the model to keep a memory between executions

Slide 105

Slide 105 text

Traditional Neural Networks • Input and output lengths are mostly pre-determined and static ● Weak for modelling sequences • Recurrent Neural Networks ● Enable the model to keep a memory between executions ML Model Input 1 Output 1

Slide 106

Slide 106 text

Traditional Neural Networks • Input and output lengths are mostly pre-determined and static ● Weak for modelling sequences • Recurrent Neural Networks ● Enable the model to keep a memory between executions ML Model Input 1 Output 1

Slide 107

Slide 107 text

Traditional Neural Networks • Input and output lengths are mostly pre-determined and static ● Weak for modelling sequences • Recurrent Neural Networks ● Enable the model to keep a memory between executions ML Model Input 1 Output 1 ML Model Input 2 Output 2

Slide 108

Slide 108 text

Traditional Neural Networks • Input and output lengths are mostly pre-determined and static ● Weak for modelling sequences • Recurrent Neural Networks ● Enable the model to keep a memory between executions ML Model Input 1 Output 1 ML Model Input 2 Output 2 ML Model Input 3 Output 3

Slide 109

Slide 109 text

Traditional Neural Networks • Input and output lengths are mostly pre-determined and static ● Weak for modelling sequences • Recurrent Neural Networks ● Enable the model to keep a memory between executions ML Model Ø I ML Model I like ML Model like beer

Slide 110

Slide 110 text

Traditional Neural Networks • Input and output lengths are mostly pre-determined and static ● Weak for modelling sequences • Recurrent Neural Networks ● Enable the model to keep a memory between executions ML Model ML Model ML Model • RNNs are very versatile

Slide 111

Slide 111 text

Traditional Neural Networks • Input and output lengths are mostly pre-determined and static ● Weak for modelling sequences • Recurrent Neural Networks ● Enable the model to keep a memory between executions ML Model Input 1 Output 1 ML Model Input 2 Output 2 ML Model Input 3 Output 3 • RNNs are very versatile ● Many to many

Slide 112

Slide 112 text

Traditional Neural Networks • Input and output lengths are mostly pre-determined and static ● Weak for modelling sequences • Recurrent Neural Networks ● Enable the model to keep a memory between executions ML Model Input 1 ML Model Input 2 ML Model Input 3 Output 1 • RNNs are very versatile ● Many to many ● Many to one

Slide 113

Slide 113 text

Traditional Neural Networks • Input and output lengths are mostly pre-determined and static ● Weak for modelling sequences • Recurrent Neural Networks ● Enable the model to keep a memory between executions ML Model Input 1 Output 1 ML Model Output 2 ML Model Output 3 • RNNs are very versatile ● Many to many ● Many to one ● One to many

Slide 114

Slide 114 text

AI-Powered Jokes

Slide 115

Slide 115 text

AI-Powered Jokes • How? Generate character by character

Slide 116

Slide 116 text

AI-Powered Jokes ML Model b ML Model e • How? Generate character by character

Slide 117

Slide 117 text

AI-Powered Jokes ML Model b ML Model e e • How? Generate character by character

Slide 118

Slide 118 text

AI-Powered Jokes ML Model b ML Model e ML Model e • How? Generate character by character

Slide 119

Slide 119 text

AI-Powered Jokes • How? Generate character by character ML Model b ML Model e ML Model e r

Slide 120

Slide 120 text

AI-Powered Jokes • How? Generate character by character ● A seed is provides context for the network ML Model b ML Model e ML Model e r

Slide 121

Slide 121 text

Stop poking my brain and show 
 me code!!! • Pre-Processing

Slide 122

Slide 122 text

Stop poking my brain and show 
 me code!!! • Pre-Processing ● Character 㲗 Integer Index >>> jokes_concat = ''.join(jokes) >>> chars = sorted(list(set(jokes_concat))) >>> idx2char = chars >>> char2idx = {c: i for i, c in enumerate(idx2char)}

Slide 123

Slide 123 text

Stop poking my brain and show 
 me code!!! • Pre-Processing ● Character 㲗 Integer Index ● Training set preparation >>> jokes_sample = random.sample(jokes, math.floor(len(jokes) * 0.5)) >>> maxlen = 40 >>> step = 3 >>> sentences = [] >>> next_chars = [] >>> for joke in jokes_sample: >>> if len(joke) < (maxlen + 1): >>> continue >>> for i in range(0, len(joke) - maxlen, step): >>> sentences.append(joke[i:i+maxlen]) >>> next_chars.append(joke[i+maxlen])

Slide 124

Slide 124 text

Stop poking my brain and show 
 me code!!! • Pre-Processing ● Character 㲗 Integer Index ● Training set preparation ● Convert into integers >>> x = np.zeros((len(sentences), maxlen, len(chars)), dtype=np.bool) >>> y = np.zeros((len(sentences), len(chars)), dtype=np.bool) >>> for i, sentence in enumerate(sentences): >>> for j, char in enumerate(sentence): >>> x[i, j, char2idx[char]] = 1. >>> y[i, char2idx[next_chars[i]]] = 1. >>> idx = np.random.permutation(len(x)) >>> x = x[idx] >>> y = y[idx]

Slide 125

Slide 125 text

Stop poking my brain and show 
 me code!!! • Pre-Processing ● Character 㲗 Integer Index ● Training set preparation ● Convert into integers • Define the model >>> from keras.models import Sequential >>> from keras.layers import Dense, Dropout, LSTM >>> model = Sequential() >>> model.add(LSTM(256, ... input_shape=(maxlen, len(chars)), ... return_sequences=True)) >>> model.add(Dropout(0.2)) >>> model.add(LSTM(256)) >>> model.add(Dropout(0.2)) >>> model.add(Dense(len(chars), activation='softmax')) >>> model.compile(loss='categorical_crossentropy’, ... optimizer='adam’, ... metrics=['categorical_crossentropy'])

Slide 126

Slide 126 text

Stop poking my brain and show 
 me code!!! • Pre-Processing ● Character 㲗 Integer Index ● Training set preparation ● Convert into integers • Define the model • Train the model >>> model.fit(x, y, batch_size=128, epochs=8)

Slide 127

Slide 127 text

Stop poking my brain and show 
 me code!!! • Pre-Processing ● Character 㲗 Integer Index ● Training set preparation ● Convert into integers • Define the model • Train the model >>> model.fit(x, y, batch_size=128, epochs=8)

Slide 128

Slide 128 text

Are the jokes any good? • Using the start of an existing joke as seed

Slide 129

Slide 129 text

Are the jokes any good? • Using the start of an existing joke as seed Example: How many dead hookers does it take to sc

Slide 130

Slide 130 text

Are the jokes any good? • Using the start of an existing joke as seed Example: How many dead hookers does it take to screw in a light bulb? Two, but it was the same time.

Slide 131

Slide 131 text

Are the jokes any good? • Using the start of an existing joke as seed ● Sentence structure is surprisingly good Example: How many dead hookers does it take to screw in a light bulb? Two, but it was the same time.

Slide 132

Slide 132 text

Are the jokes any good? • Using the start of an existing joke as seed ● Sentence structure is surprisingly good ● Coherence with the answer is a bit lacking Example: How many dead hookers does it take to screw in a light bulb? Two, but it was the same time.

Slide 133

Slide 133 text

Are the jokes any good? • Using the start of an existing joke as seed ● Sentence structure is surprisingly good ● Coherence with the answer is a bit lacking • A weirder example

Slide 134

Slide 134 text

Are the jokes any good? • Using the start of an existing joke as seed ● Sentence structure is surprisingly good ● Coherence with the answer is a bit lacking • A weirder example Example: Am I the only one who closes the silverw

Slide 135

Slide 135 text

Are the jokes any good? • Using the start of an existing joke as seed ● Sentence structure is surprisingly good ● Coherence with the answer is a bit lacking • A weirder example Example: Am I the only one who closes the silverwine costume have sex with Dram the ground? Prostitute too scoring out wrenking out running into a teeth in other people? Are you surprised whereed? Chocolate Good!!!!!!!!!!!!? Don't you buy me? nit-te

Slide 136

Slide 136 text

Are the jokes any good? • Using the start of an existing joke as seed ● Sentence structure is surprisingly good ● Coherence with the answer is a bit lacking • A weirder example • Custom seed

Slide 137

Slide 137 text

Are the jokes any good? • Using the start of an existing joke as seed ● Sentence structure is surprisingly good ● Coherence with the answer is a bit lacking • A weirder example • Custom start ● A letter Example: D

Slide 138

Slide 138 text

Are the jokes any good? • Using the start of an existing joke as seed ● Sentence structure is surprisingly good ● Coherence with the answer is a bit lacking • A weirder example • Custom start ● A letter Example: Do you have a beer from starbucks? An asshole in the back of the back, but I leave the same time. I think I was all day. The barman says "I don't know what the fucking card move in the back of his life

Slide 139

Slide 139 text

Are the jokes any good? • Using the start of an existing joke as seed ● Sentence structure is surprisingly good ● Coherence with the answer is a bit lacking • A weirder example • Custom start ● A letter ● “What do you call” jokes

Slide 140

Slide 140 text

Are the jokes any good? • Using the start of an existing joke as seed ● Sentence structure is surprisingly good ● Coherence with the answer is a bit lacking • A weirder example • Custom start ● A letter ● “What do you call” jokes Examples: • What do you call a nut on house? A coint in his circus. • What do you call a bill on the oven? A condom. • What do you call a blowjob? A social storm. • What do you call a shit of country in a car? A lifetime experience. • What do you call a dog device? A garbanzo bean with a curry. • What do you call a disappointment on the bathroom? A pilot, you can't take a shit. • What do you call a dialogast? A farmer. • What do you call a dog first? A sandwich. • What do you call a dick on his shoes? A woman in the stairs.

Slide 141

Slide 141 text

What are the take- aways? “Art is never finished, only abandoned” Leonardo Da Vinci

Slide 142

Slide 142 text

Today I learned Topics covered •Distributed representations are great for categorical fields •Word2Vec and LSTM rules •Python rules Current trends •Divide and Conquer •Attention •Generative Adversarial Networks •Sequence-to-Sequence •RNN-based Variational Auto Encoder Potential paths of exploration •Try using attention and longer spans of memory. •Active learning classifier for jokes quality •Do the same for motivational quotes

Slide 143

Slide 143 text

And what now? I’m a Software Engineer Fast.ai Machine Learning for Coders Seek problems on your surrounding and do a POC Python rules I am a Data Scientist DeepLearning.ai Deep Learning Specialization Develop Probability and Statistics •Udacity Intro to Descriptive Statistics •Udacity Intro to Inferential Statistics I went through the wrong door and noticed free beer Enjoy it :D

Slide 144

Slide 144 text

Bibliography • https://en.wikipedia.org/wiki/Theories_of_humor • https://en.wikipedia.org/wiki/Humor_research • https://en.wikipedia.org/wiki/Computational_humor • https://www.iflscience.com/technology/ais-attempts- at-oneliner-jokes-are-unintentionally-hilarious/ • https://motherboard.vice.com/en_us/article/z43nke/ joke-telling-robots-are-the-final-frontier-of-artificial- intelligence • https://medium.com/@davidolarinoye/will-ai-ever-be- able-to-make-a-joke-808a656b53a6 • https://journals.sagepub.com/doi/pdf/ 10.1177/147470490600400129 • https://towardsdatascience.com/word2vec-skip-gram- model-part-1-intuition-78614e4d6e0b • https://www.analyticsvidhya.com/blog/2017/06/word- embeddings-count-word2veec/ • https://www.kaggle.com/abhinavmoudgil95/short- jokes • https://arxiv.org/pdf/1708.02709.pdf • https://journals.sagepub.com/doi/pdf/ 10.1177/147470490600400129 • http://blog.aylien.com/overview-word-embeddings- history-word2vec-cbow-glove/

Slide 145

Slide 145 text

Thank you @diogojapinto /in/diogojapinto/ [email protected]