Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Natural Language Processing Expert Briefing @ PyData Global 2021

Natural Language Processing Expert Briefing @ PyData Global 2021

Slides for the Expert Briefing session on Natural Language Processing at PyData Global 2021 https://pydata.org/global2021/expert-briefings/

Speaker: Marco Bonzanini https://twitter.com/marcobonzanini

Marco Bonzanini

October 20, 2021
Tweet

More Decks by Marco Bonzanini

Other Decks in Technology

Transcript

  1. Natural Language Processing
    Trends, Challenges and Opportunities
    @MarcoBonzanini
    PyData Global 2021

    View Slide

  2. © Bonzanini Consulting Ltd — BonzaniniConsulting.com
    Nice to meet you
    • Consulting, training and coaching
    on Python + Data Science
    • Chair @ PyData London
    2

    View Slide

  3. © Bonzanini Consulting Ltd — BonzaniniConsulting.com
    Natural Language Processing
    3

    View Slide

  4. © Bonzanini Consulting Ltd — BonzaniniConsulting.com
    Natural Language Processing
    4
    Natural Language

    Understanding
    Natural Language

    Generation

    View Slide

  5. © Bonzanini Consulting Ltd — BonzaniniConsulting.com 5
    That that is is that that is
    not is not is that it it is
    (That’s proper English)

    View Slide

  6. © Bonzanini Consulting Ltd — BonzaniniConsulting.com 6
    That that is, is.
    That that is not, is not.
    Is that it? It is.
    More fun at:
    https://en.wikipedia.org/wiki/List_of_linguistic_example_sentences
    Pics:
    https://en.wikipedia.org/wiki/Socrates and https://en.wikipedia.org/wiki/Parmenides

    View Slide

  7. © Bonzanini Consulting Ltd — BonzaniniConsulting.com 7
    “They ate pizza with anchovies”

    View Slide

  8. © Bonzanini Consulting Ltd — BonzaniniConsulting.com
    Language is challenging
    8

    View Slide

  9. © Bonzanini Consulting Ltd — BonzaniniConsulting.com
    Language is challenging
    • Language is evolving
    9

    View Slide

  10. © Bonzanini Consulting Ltd — BonzaniniConsulting.com
    Language is challenging
    • Language is evolving
    • Language is ambiguous
    10

    View Slide

  11. © Bonzanini Consulting Ltd — BonzaniniConsulting.com
    Language is challenging
    • Language is evolving
    • Language is ambiguous
    • (Understanding) Language requires context
    11

    View Slide

  12. © Bonzanini Consulting Ltd — BonzaniniConsulting.com
    We need annotated data
    12

    View Slide

  13. © Bonzanini Consulting Ltd — BonzaniniConsulting.com
    We need annotated data
    • Variability: domains and languages
    13

    View Slide

  14. © Bonzanini Consulting Ltd — BonzaniniConsulting.com
    We need annotated data
    • Variability: domains and languages
    • Available data: sparse
    14

    View Slide

  15. © Bonzanini Consulting Ltd — BonzaniniConsulting.com
    We need annotated data
    • Variability: domains and languages
    • Available data: sparse
    • Available data: bias
    15

    View Slide

  16. © Bonzanini Consulting Ltd — BonzaniniConsulting.com
    We need annotated data
    • Variability: domains and languages
    • Available data: sparse
    • Available data: bias
    • Annotating data is a bottleneck
    16

    View Slide

  17. © Bonzanini Consulting Ltd — BonzaniniConsulting.com
    (Incomplete) History of NLP
    17

    View Slide

  18. © Bonzanini Consulting Ltd — BonzaniniConsulting.com
    • 1950s Symbolic / rule-based
    18
    (Incomplete) History of NLP

    View Slide

  19. © Bonzanini Consulting Ltd — BonzaniniConsulting.com
    • 1950s Symbolic / rule-based
    • 1990s Stats / annotated data / Machine Learning
    19
    (Incomplete) History of NLP

    View Slide

  20. © Bonzanini Consulting Ltd — BonzaniniConsulting.com
    • 1950s Symbolic / rule-based
    • 1990s Stats / annotated data / Machine Learning
    • 2010s Neural Nets / Deep Learning
    20
    (Incomplete) History of NLP

    View Slide

  21. © Bonzanini Consulting Ltd — BonzaniniConsulting.com
    Evolution of Models
    21

    View Slide

  22. © Bonzanini Consulting Ltd — BonzaniniConsulting.com 22
    Evolution of Models
    Bag-of-words

    View Slide

  23. © Bonzanini Consulting Ltd — BonzaniniConsulting.com 23
    Evolution of Models
    Bag-of-words
    Word Embeddings
    (circa 2013)

    View Slide

  24. © Bonzanini Consulting Ltd — BonzaniniConsulting.com 24
    Evolution of Models
    Bag-of-words
    Word Embeddings
    (circa 2013)
    “Traditional”
    ML models

    View Slide

  25. © Bonzanini Consulting Ltd — BonzaniniConsulting.com 25
    Evolution of Models
    Bag-of-words
    Word Embeddings
    (circa 2013)
    “Traditional”
    ML models
    RNN/LSTM
    (circa 2015)

    View Slide

  26. © Bonzanini Consulting Ltd — BonzaniniConsulting.com 26
    Evolution of Models
    Bag-of-words
    Word Embeddings
    (circa 2013)
    “Traditional”
    ML models
    RNN/LSTM
    (circa 2015)
    Transformers
    (circa 2017)

    View Slide

  27. © Bonzanini Consulting Ltd — BonzaniniConsulting.com
    Transformers
    27

    View Slide

  28. © Bonzanini Consulting Ltd — BonzaniniConsulting.com
    Transformers
    • Parallelisation → training on bigger dataset
    28

    View Slide

  29. © Bonzanini Consulting Ltd — BonzaniniConsulting.com
    Transformers
    • Parallelisation → training on bigger dataset
    • Fine-tuning on specific task
    29

    View Slide

  30. © Bonzanini Consulting Ltd — BonzaniniConsulting.com
    Transformers
    • Parallelisation → training on bigger dataset
    • Fine-tuning on specific task
    • Bigger and bigger models
    30

    View Slide

  31. © Bonzanini Consulting Ltd — BonzaniniConsulting.com
    Transformers
    • Parallelisation → training on bigger dataset
    • Fine-tuning on specific task
    • Bigger and bigger models
    • Pre-trained models
    31

    View Slide

  32. © Bonzanini Consulting Ltd — BonzaniniConsulting.com 32

    View Slide

  33. © Bonzanini Consulting Ltd — BonzaniniConsulting.com 33

    View Slide

  34. © Bonzanini Consulting Ltd — BonzaniniConsulting.com 34
    Source: https://github.com/thunlp/PLMpapers

    View Slide

  35. © Bonzanini Consulting Ltd — BonzaniniConsulting.com 35
    Bender et al., 2021, ACM FAccT

    View Slide

  36. © Bonzanini Consulting Ltd — BonzaniniConsulting.com
    Python NLP Ecosystem
    36

    View Slide

  37. © Bonzanini Consulting Ltd — BonzaniniConsulting.com
    Python NLP Ecosystem
    37
    NLTK

    View Slide

  38. THANK YOU
    @MarcoBonzanini
    marcobonzanini.com/newsletter

    View Slide