Slide 1

Slide 1 text

Natural Language Processing Trends, Challenges and Opportunities @MarcoBonzanini PyData Global 2021

Slide 2

Slide 2 text

© Bonzanini Consulting Ltd — BonzaniniConsulting.com Nice to meet you • Consulting, training and coaching on Python + Data Science • Chair @ PyData London 2

Slide 3

Slide 3 text

© Bonzanini Consulting Ltd — BonzaniniConsulting.com Natural Language Processing 3

Slide 4

Slide 4 text

© Bonzanini Consulting Ltd — BonzaniniConsulting.com Natural Language Processing 4 Natural Language
 Understanding Natural Language
 Generation

Slide 5

Slide 5 text

© Bonzanini Consulting Ltd — BonzaniniConsulting.com 5 That that is is that that is not is not is that it it is (That’s proper English)

Slide 6

Slide 6 text

© Bonzanini Consulting Ltd — BonzaniniConsulting.com 6 That that is, is. That that is not, is not. Is that it? It is. More fun at: https://en.wikipedia.org/wiki/List_of_linguistic_example_sentences Pics: https://en.wikipedia.org/wiki/Socrates and https://en.wikipedia.org/wiki/Parmenides

Slide 7

Slide 7 text

© Bonzanini Consulting Ltd — BonzaniniConsulting.com 7 “They ate pizza with anchovies”

Slide 8

Slide 8 text

© Bonzanini Consulting Ltd — BonzaniniConsulting.com Language is challenging 8

Slide 9

Slide 9 text

© Bonzanini Consulting Ltd — BonzaniniConsulting.com Language is challenging • Language is evolving 9

Slide 10

Slide 10 text

© Bonzanini Consulting Ltd — BonzaniniConsulting.com Language is challenging • Language is evolving • Language is ambiguous 10

Slide 11

Slide 11 text

© Bonzanini Consulting Ltd — BonzaniniConsulting.com Language is challenging • Language is evolving • Language is ambiguous • (Understanding) Language requires context 11

Slide 12

Slide 12 text

© Bonzanini Consulting Ltd — BonzaniniConsulting.com We need annotated data 12

Slide 13

Slide 13 text

© Bonzanini Consulting Ltd — BonzaniniConsulting.com We need annotated data • Variability: domains and languages 13

Slide 14

Slide 14 text

© Bonzanini Consulting Ltd — BonzaniniConsulting.com We need annotated data • Variability: domains and languages • Available data: sparse 14

Slide 15

Slide 15 text

© Bonzanini Consulting Ltd — BonzaniniConsulting.com We need annotated data • Variability: domains and languages • Available data: sparse • Available data: bias 15

Slide 16

Slide 16 text

© Bonzanini Consulting Ltd — BonzaniniConsulting.com We need annotated data • Variability: domains and languages • Available data: sparse • Available data: bias • Annotating data is a bottleneck 16

Slide 17

Slide 17 text

© Bonzanini Consulting Ltd — BonzaniniConsulting.com (Incomplete) History of NLP 17

Slide 18

Slide 18 text

© Bonzanini Consulting Ltd — BonzaniniConsulting.com • 1950s Symbolic / rule-based 18 (Incomplete) History of NLP

Slide 19

Slide 19 text

© Bonzanini Consulting Ltd — BonzaniniConsulting.com • 1950s Symbolic / rule-based • 1990s Stats / annotated data / Machine Learning 19 (Incomplete) History of NLP

Slide 20

Slide 20 text

© Bonzanini Consulting Ltd — BonzaniniConsulting.com • 1950s Symbolic / rule-based • 1990s Stats / annotated data / Machine Learning • 2010s Neural Nets / Deep Learning 20 (Incomplete) History of NLP

Slide 21

Slide 21 text

© Bonzanini Consulting Ltd — BonzaniniConsulting.com Evolution of Models 21

Slide 22

Slide 22 text

© Bonzanini Consulting Ltd — BonzaniniConsulting.com 22 Evolution of Models Bag-of-words

Slide 23

Slide 23 text

© Bonzanini Consulting Ltd — BonzaniniConsulting.com 23 Evolution of Models Bag-of-words Word Embeddings (circa 2013)

Slide 24

Slide 24 text

© Bonzanini Consulting Ltd — BonzaniniConsulting.com 24 Evolution of Models Bag-of-words Word Embeddings (circa 2013) “Traditional” ML models

Slide 25

Slide 25 text

© Bonzanini Consulting Ltd — BonzaniniConsulting.com 25 Evolution of Models Bag-of-words Word Embeddings (circa 2013) “Traditional” ML models RNN/LSTM (circa 2015)

Slide 26

Slide 26 text

© Bonzanini Consulting Ltd — BonzaniniConsulting.com 26 Evolution of Models Bag-of-words Word Embeddings (circa 2013) “Traditional” ML models RNN/LSTM (circa 2015) Transformers (circa 2017)

Slide 27

Slide 27 text

© Bonzanini Consulting Ltd — BonzaniniConsulting.com Transformers 27

Slide 28

Slide 28 text

© Bonzanini Consulting Ltd — BonzaniniConsulting.com Transformers • Parallelisation → training on bigger dataset 28

Slide 29

Slide 29 text

© Bonzanini Consulting Ltd — BonzaniniConsulting.com Transformers • Parallelisation → training on bigger dataset • Fine-tuning on specific task 29

Slide 30

Slide 30 text

© Bonzanini Consulting Ltd — BonzaniniConsulting.com Transformers • Parallelisation → training on bigger dataset • Fine-tuning on specific task • Bigger and bigger models 30

Slide 31

Slide 31 text

© Bonzanini Consulting Ltd — BonzaniniConsulting.com Transformers • Parallelisation → training on bigger dataset • Fine-tuning on specific task • Bigger and bigger models • Pre-trained models 31

Slide 32

Slide 32 text

© Bonzanini Consulting Ltd — BonzaniniConsulting.com 32

Slide 33

Slide 33 text

© Bonzanini Consulting Ltd — BonzaniniConsulting.com 33

Slide 34

Slide 34 text

© Bonzanini Consulting Ltd — BonzaniniConsulting.com 34 Source: https://github.com/thunlp/PLMpapers

Slide 35

Slide 35 text

© Bonzanini Consulting Ltd — BonzaniniConsulting.com 35 Bender et al., 2021, ACM FAccT

Slide 36

Slide 36 text

© Bonzanini Consulting Ltd — BonzaniniConsulting.com Python NLP Ecosystem 36

Slide 37

Slide 37 text

© Bonzanini Consulting Ltd — BonzaniniConsulting.com Python NLP Ecosystem 37 NLTK

Slide 38

Slide 38 text

THANK YOU @MarcoBonzanini marcobonzanini.com/newsletter