Save 37% off PRO during our Black Friday Sale! »

Natural Language Processing Expert Briefing @ PyData Global 2021

Natural Language Processing Expert Briefing @ PyData Global 2021

Slides for the Expert Briefing session on Natural Language Processing at PyData Global 2021 https://pydata.org/global2021/expert-briefings/

Speaker: Marco Bonzanini https://twitter.com/marcobonzanini

Aa38bb7a9c35bc414da6ec7dcd8d7339?s=128

Marco Bonzanini

October 20, 2021
Tweet

Transcript

  1. Natural Language Processing Trends, Challenges and Opportunities @MarcoBonzanini PyData Global

    2021
  2. © Bonzanini Consulting Ltd — BonzaniniConsulting.com Nice to meet you

    • Consulting, training and coaching on Python + Data Science • Chair @ PyData London 2
  3. © Bonzanini Consulting Ltd — BonzaniniConsulting.com Natural Language Processing 3

  4. © Bonzanini Consulting Ltd — BonzaniniConsulting.com Natural Language Processing 4

    Natural Language
 Understanding Natural Language
 Generation
  5. © Bonzanini Consulting Ltd — BonzaniniConsulting.com 5 That that is

    is that that is not is not is that it it is (That’s proper English)
  6. © Bonzanini Consulting Ltd — BonzaniniConsulting.com 6 That that is,

    is. That that is not, is not. Is that it? It is. More fun at: https://en.wikipedia.org/wiki/List_of_linguistic_example_sentences Pics: https://en.wikipedia.org/wiki/Socrates and https://en.wikipedia.org/wiki/Parmenides
  7. © Bonzanini Consulting Ltd — BonzaniniConsulting.com 7 “They ate pizza

    with anchovies”
  8. © Bonzanini Consulting Ltd — BonzaniniConsulting.com Language is challenging 8

  9. © Bonzanini Consulting Ltd — BonzaniniConsulting.com Language is challenging •

    Language is evolving 9
  10. © Bonzanini Consulting Ltd — BonzaniniConsulting.com Language is challenging •

    Language is evolving • Language is ambiguous 10
  11. © Bonzanini Consulting Ltd — BonzaniniConsulting.com Language is challenging •

    Language is evolving • Language is ambiguous • (Understanding) Language requires context 11
  12. © Bonzanini Consulting Ltd — BonzaniniConsulting.com We need annotated data

    12
  13. © Bonzanini Consulting Ltd — BonzaniniConsulting.com We need annotated data

    • Variability: domains and languages 13
  14. © Bonzanini Consulting Ltd — BonzaniniConsulting.com We need annotated data

    • Variability: domains and languages • Available data: sparse 14
  15. © Bonzanini Consulting Ltd — BonzaniniConsulting.com We need annotated data

    • Variability: domains and languages • Available data: sparse • Available data: bias 15
  16. © Bonzanini Consulting Ltd — BonzaniniConsulting.com We need annotated data

    • Variability: domains and languages • Available data: sparse • Available data: bias • Annotating data is a bottleneck 16
  17. © Bonzanini Consulting Ltd — BonzaniniConsulting.com (Incomplete) History of NLP

    17
  18. © Bonzanini Consulting Ltd — BonzaniniConsulting.com • 1950s Symbolic /

    rule-based 18 (Incomplete) History of NLP
  19. © Bonzanini Consulting Ltd — BonzaniniConsulting.com • 1950s Symbolic /

    rule-based • 1990s Stats / annotated data / Machine Learning 19 (Incomplete) History of NLP
  20. © Bonzanini Consulting Ltd — BonzaniniConsulting.com • 1950s Symbolic /

    rule-based • 1990s Stats / annotated data / Machine Learning • 2010s Neural Nets / Deep Learning 20 (Incomplete) History of NLP
  21. © Bonzanini Consulting Ltd — BonzaniniConsulting.com Evolution of Models 21

  22. © Bonzanini Consulting Ltd — BonzaniniConsulting.com 22 Evolution of Models

    Bag-of-words
  23. © Bonzanini Consulting Ltd — BonzaniniConsulting.com 23 Evolution of Models

    Bag-of-words Word Embeddings (circa 2013)
  24. © Bonzanini Consulting Ltd — BonzaniniConsulting.com 24 Evolution of Models

    Bag-of-words Word Embeddings (circa 2013) “Traditional” ML models
  25. © Bonzanini Consulting Ltd — BonzaniniConsulting.com 25 Evolution of Models

    Bag-of-words Word Embeddings (circa 2013) “Traditional” ML models RNN/LSTM (circa 2015)
  26. © Bonzanini Consulting Ltd — BonzaniniConsulting.com 26 Evolution of Models

    Bag-of-words Word Embeddings (circa 2013) “Traditional” ML models RNN/LSTM (circa 2015) Transformers (circa 2017)
  27. © Bonzanini Consulting Ltd — BonzaniniConsulting.com Transformers 27

  28. © Bonzanini Consulting Ltd — BonzaniniConsulting.com Transformers • Parallelisation →

    training on bigger dataset 28
  29. © Bonzanini Consulting Ltd — BonzaniniConsulting.com Transformers • Parallelisation →

    training on bigger dataset • Fine-tuning on specific task 29
  30. © Bonzanini Consulting Ltd — BonzaniniConsulting.com Transformers • Parallelisation →

    training on bigger dataset • Fine-tuning on specific task • Bigger and bigger models 30
  31. © Bonzanini Consulting Ltd — BonzaniniConsulting.com Transformers • Parallelisation →

    training on bigger dataset • Fine-tuning on specific task • Bigger and bigger models • Pre-trained models 31
  32. © Bonzanini Consulting Ltd — BonzaniniConsulting.com 32

  33. © Bonzanini Consulting Ltd — BonzaniniConsulting.com 33

  34. © Bonzanini Consulting Ltd — BonzaniniConsulting.com 34 Source: https://github.com/thunlp/PLMpapers

  35. © Bonzanini Consulting Ltd — BonzaniniConsulting.com 35 Bender et al.,

    2021, ACM FAccT
  36. © Bonzanini Consulting Ltd — BonzaniniConsulting.com Python NLP Ecosystem 36

  37. © Bonzanini Consulting Ltd — BonzaniniConsulting.com Python NLP Ecosystem 37

    NLTK
  38. THANK YOU @MarcoBonzanini marcobonzanini.com/newsletter