Slides for the Expert Briefing session on Natural Language Processing at PyData Global 2021 https://pydata.org/global2021/expert-briefings/
Speaker: Marco Bonzanini https://twitter.com/marcobonzanini
Natural Language ProcessingTrends, Challenges and Opportunities@MarcoBonzaniniPyData Global 2021
View Slide
© Bonzanini Consulting Ltd — BonzaniniConsulting.comNice to meet you• Consulting, training and coachingon Python + Data Science• Chair @ PyData London2
© Bonzanini Consulting Ltd — BonzaniniConsulting.comNatural Language Processing3
© Bonzanini Consulting Ltd — BonzaniniConsulting.comNatural Language Processing4Natural Language UnderstandingNatural Language Generation
© Bonzanini Consulting Ltd — BonzaniniConsulting.com 5That that is is that that isnot is not is that it it is(That’s proper English)
© Bonzanini Consulting Ltd — BonzaniniConsulting.com 6That that is, is.That that is not, is not.Is that it? It is.More fun at:https://en.wikipedia.org/wiki/List_of_linguistic_example_sentencesPics:https://en.wikipedia.org/wiki/Socrates and https://en.wikipedia.org/wiki/Parmenides
© Bonzanini Consulting Ltd — BonzaniniConsulting.com 7“They ate pizza with anchovies”
© Bonzanini Consulting Ltd — BonzaniniConsulting.comLanguage is challenging8
© Bonzanini Consulting Ltd — BonzaniniConsulting.comLanguage is challenging• Language is evolving9
© Bonzanini Consulting Ltd — BonzaniniConsulting.comLanguage is challenging• Language is evolving• Language is ambiguous10
© Bonzanini Consulting Ltd — BonzaniniConsulting.comLanguage is challenging• Language is evolving• Language is ambiguous• (Understanding) Language requires context11
© Bonzanini Consulting Ltd — BonzaniniConsulting.comWe need annotated data12
© Bonzanini Consulting Ltd — BonzaniniConsulting.comWe need annotated data• Variability: domains and languages13
© Bonzanini Consulting Ltd — BonzaniniConsulting.comWe need annotated data• Variability: domains and languages• Available data: sparse14
© Bonzanini Consulting Ltd — BonzaniniConsulting.comWe need annotated data• Variability: domains and languages• Available data: sparse• Available data: bias15
© Bonzanini Consulting Ltd — BonzaniniConsulting.comWe need annotated data• Variability: domains and languages• Available data: sparse• Available data: bias• Annotating data is a bottleneck16
© Bonzanini Consulting Ltd — BonzaniniConsulting.com(Incomplete) History of NLP17
© Bonzanini Consulting Ltd — BonzaniniConsulting.com• 1950s Symbolic / rule-based18(Incomplete) History of NLP
© Bonzanini Consulting Ltd — BonzaniniConsulting.com• 1950s Symbolic / rule-based• 1990s Stats / annotated data / Machine Learning19(Incomplete) History of NLP
© Bonzanini Consulting Ltd — BonzaniniConsulting.com• 1950s Symbolic / rule-based• 1990s Stats / annotated data / Machine Learning• 2010s Neural Nets / Deep Learning20(Incomplete) History of NLP
© Bonzanini Consulting Ltd — BonzaniniConsulting.comEvolution of Models21
© Bonzanini Consulting Ltd — BonzaniniConsulting.com 22Evolution of ModelsBag-of-words
© Bonzanini Consulting Ltd — BonzaniniConsulting.com 23Evolution of ModelsBag-of-wordsWord Embeddings(circa 2013)
© Bonzanini Consulting Ltd — BonzaniniConsulting.com 24Evolution of ModelsBag-of-wordsWord Embeddings(circa 2013)“Traditional”ML models
© Bonzanini Consulting Ltd — BonzaniniConsulting.com 25Evolution of ModelsBag-of-wordsWord Embeddings(circa 2013)“Traditional”ML modelsRNN/LSTM(circa 2015)
© Bonzanini Consulting Ltd — BonzaniniConsulting.com 26Evolution of ModelsBag-of-wordsWord Embeddings(circa 2013)“Traditional”ML modelsRNN/LSTM(circa 2015)Transformers(circa 2017)
© Bonzanini Consulting Ltd — BonzaniniConsulting.comTransformers27
© Bonzanini Consulting Ltd — BonzaniniConsulting.comTransformers• Parallelisation → training on bigger dataset28
© Bonzanini Consulting Ltd — BonzaniniConsulting.comTransformers• Parallelisation → training on bigger dataset• Fine-tuning on specific task29
© Bonzanini Consulting Ltd — BonzaniniConsulting.comTransformers• Parallelisation → training on bigger dataset• Fine-tuning on specific task• Bigger and bigger models30
© Bonzanini Consulting Ltd — BonzaniniConsulting.comTransformers• Parallelisation → training on bigger dataset• Fine-tuning on specific task• Bigger and bigger models• Pre-trained models31
© Bonzanini Consulting Ltd — BonzaniniConsulting.com 32
© Bonzanini Consulting Ltd — BonzaniniConsulting.com 33
© Bonzanini Consulting Ltd — BonzaniniConsulting.com 34Source: https://github.com/thunlp/PLMpapers
© Bonzanini Consulting Ltd — BonzaniniConsulting.com 35Bender et al., 2021, ACM FAccT
© Bonzanini Consulting Ltd — BonzaniniConsulting.comPython NLP Ecosystem36
© Bonzanini Consulting Ltd — BonzaniniConsulting.comPython NLP Ecosystem37NLTK
THANK YOU@MarcoBonzaninimarcobonzanini.com/newsletter