Slides for the presentation at the Expert Briefings @ PyData Global 2022
Speaker: Marco Bonzanini https://www.twitter.com/marcobonzanini https://marcobonzanini.com/
Natural Language Processing Trends, Challenges and Opportunities@MarcoBonzanini marcobonzanini.comPyData Global 2022
View Slide
© Bonzanini Consulting Ltd — BonzaniniConsulting.comAgenda for Today• Quick overview on NLP and current trends• Round table discussion• Your challenges?• Your success stories?2
© Bonzanini Consulting Ltd — BonzaniniConsulting.comNice to meet you• Consulting, training and coaching on Python + Data Science• Chair @ PyData London3
© Bonzanini Consulting Ltd — BonzaniniConsulting.comLanguage is Challenging4
© Bonzanini Consulting Ltd — BonzaniniConsulting.comLanguage is Challenging5
© Bonzanini Consulting Ltd — BonzaniniConsulting.comLanguage is Challenging• Language is evolving 6
© Bonzanini Consulting Ltd — BonzaniniConsulting.comLanguage is Challenging• Language is evolving• Language is ambiguous 7
© Bonzanini Consulting Ltd — BonzaniniConsulting.comLanguage is Challenging• Language is evolving• Language is ambiguous• (Understanding) Language requires context8
© Bonzanini Consulting Ltd — BonzaniniConsulting.comWe need annotated data9
© Bonzanini Consulting Ltd — BonzaniniConsulting.comWe need annotated data• Variability: domains and languages 10
© Bonzanini Consulting Ltd — BonzaniniConsulting.comWe need annotated data• Variability: domains and languages• Available data: sparse+biased? 11
© Bonzanini Consulting Ltd — BonzaniniConsulting.comWe need annotated data• Variability: domains and languages• Available data: sparse+biased?• Annotated data is the bottleneck 12
© Bonzanini Consulting Ltd — BonzaniniConsulting.comWe need annotated data• Variability: domains and languages• Available data: sparse+biased?• Annotated data is the bottleneck• Vincent Warmerdam on Tools to Improve TrainingData: https://www.youtube.com/watch?v=KRQJDLyc1uM13
© Bonzanini Consulting Ltd — BonzaniniConsulting.comEvolution of Models14
© Bonzanini Consulting Ltd — BonzaniniConsulting.com 15Evolution of ModelsBag-of-words
© Bonzanini Consulting Ltd — BonzaniniConsulting.com 16Evolution of ModelsBag-of-wordsWord Embeddings (circa 2013)
© Bonzanini Consulting Ltd — BonzaniniConsulting.com 17Evolution of ModelsBag-of-wordsWord Embeddings (circa 2013)“Traditional” ML models
© Bonzanini Consulting Ltd — BonzaniniConsulting.com 18Evolution of ModelsBag-of-wordsWord Embeddings (circa 2013)“Traditional” ML modelsRNN/LSTM(circa 2015)
© Bonzanini Consulting Ltd — BonzaniniConsulting.com 19Evolution of ModelsBag-of-wordsWord Embeddings (circa 2013)“Traditional” ML modelsRNN/LSTM(circa 2015)Transformers(circa 2017)
© Bonzanini Consulting Ltd — BonzaniniConsulting.comTransformers20
© Bonzanini Consulting Ltd — BonzaniniConsulting.comTransformers21
© Bonzanini Consulting Ltd — BonzaniniConsulting.comTransformers22Attention is all you need (Vaswani et al., 2017) 57K citations in November 2022
© Bonzanini Consulting Ltd — BonzaniniConsulting.comTransformers• Parallelisation → training on bigger dataset• Fine-tuning on specific task23
© Bonzanini Consulting Ltd — BonzaniniConsulting.comBigger and Bigger Models24
© Bonzanini Consulting Ltd — BonzaniniConsulting.com• BERT (2018): 345M parameters• GPT-2 (2019): 1.5B parameters• GPT-3 (2020): 175B parameters• Galactica (2022): 120B parameters25Bigger and Bigger Models
© Bonzanini Consulting Ltd — BonzaniniConsulting.comBig Hype, Yet…26
© Bonzanini Consulting Ltd — BonzaniniConsulting.comBig Hype, Yet…27Bolukbasi et al., 2016 NIPS
© Bonzanini Consulting Ltd — BonzaniniConsulting.comBig Hype, Yet…28Bolukbasi et al., 2016 NIPS• King - man + woman = Queen• Doctor - man + woman = Nurse? • Word embeddings are not “neutral” Bias in the data
© Bonzanini Consulting Ltd — BonzaniniConsulting.comBig Hype, Yet…29https://twitter.com/Michael_J_Black/status/1593133722316189696
© Bonzanini Consulting Ltd — BonzaniniConsulting.comBig Hype, Yet…30https://arstechnica.com/gadgets/2022/11/amazon-alexa-is-a-colossal-failure-on-pace-to-lose-10-billion-this-year/
© Bonzanini Consulting Ltd — BonzaniniConsulting.com 31Bender et al., 2021, ACM FAccT
© Bonzanini Consulting Ltd — BonzaniniConsulting.comPython NLP Ecosystem32
© Bonzanini Consulting Ltd — BonzaniniConsulting.comPython NLP Ecosystem33NLTK
© Bonzanini Consulting Ltd — BonzaniniConsulting.comDiscussion34• “Let’s just use Deep Learning (TM)”• What if we don’t have millions of $$$?• Data annotation / quality: still the main issue?• Your Success Stories?• Your Horror Stories?