Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Understanding Natural Language with Word Vectors @ PyCon UK 2017

Understanding Natural Language with Word Vectors @ PyCon UK 2017

Slides for my talk on word embeddings presented at PyCon UK 2017:
http://2017.pyconuk.org/sessions/talks/understanding-natural-language-with-word-vectors/

Abstract:
This talk is an introduction to word vectors, a.k.a. word embeddings, a family of Natural Language Processing (NLP) algorithms where words are mapped to vectors.

An important property of these vector is being able to capture semantic relationships, for example: UK - London + Paris = ???

These techniques have been driving important improvements in many NLP applications over the past few years, so the interest around word embeddings is spreading. In this talk, we'll discuss the basic linguistic intuitions behind word embeddings, we'll compare some of the most popular word embedding approaches, from word2vec to fastText, and we'll showcase their use with Python libraries.

The aim of the talk is to be approachable for beginners, so the theory is kept to a minimum.

By attending this talk, you'll be able to learn: - the core features of word embeddings - how to choose between different word embedding algorithms - how to implement word embedding techniques in Python

Aa38bb7a9c35bc414da6ec7dcd8d7339?s=128

Marco Bonzanini

October 27, 2017
Tweet

Transcript

  1. Understanding
 Natural Language with Word Vectors (and Python) @MarcoBonzanini PyCon

    UK 2017
  2. WORD EMBEDDINGS?

  3. Word Embeddings Word Vectors Distributed Representations = =

  4. Why should you care?

  5. Why should you care? Data representation
 is crucial

  6. Applications

  7. Applications Classification

  8. Applications Classification Recommender Systems

  9. Applications Classification Recommender Systems Search Engines

  10. Applications Classification Recommender Systems Search Engines Machine Translation

  11. Word Embeddings

  12. Word Embeddings Rome Paris Italy France

  13. Word Embeddings is-capital-of

  14. Word Embeddings Paris

  15. Word Embeddings Paris + Italy

  16. Word Embeddings Paris + Italy - France

  17. Word Embeddings Rome Paris + Italy - France ≈ Rome

  18. FROM LANGUAGE TO VECTORS?

  19. Distributional Hypothesis

  20. –J.R. Firth, 1957 “You shall know a word 
 by

    the company it keeps.”
  21. –Z. Harris, 1954 “Words that occur in similar context
 tend

    to have similar meaning.”
  22. Context ≈ Meaning

  23. I enjoyed eating some pizza at the restaurant

  24. I enjoyed eating some pizza at the restaurant Word

  25. I enjoyed eating some pizza at the restaurant The company

    it keeps Word
  26. I enjoyed eating some pizza at the restaurant I enjoyed

    eating some Welsh cake at the restaurant
  27. I enjoyed eating some pizza at the restaurant I enjoyed

    eating some Welsh cake at the restaurant
  28. Same Context = ?

  29. WORD2VEC

  30. word2vec (2013)

  31. Vector Calculation

  32. Vector Calculation Goal: learn vec(word)

  33. Vector Calculation Goal: learn vec(word) 1. Choose objective function

  34. Vector Calculation Goal: learn vec(word) 1. Choose objective function 2.

    Init: random vectors
  35. Vector Calculation Goal: learn vec(word) 1. Choose objective function 2.

    Init: random vectors 3. Run gradient descent
  36. Vector Calculation Goal: learn vec(word) 1. Choose objective function 2.

    Init: random vectors 3. Run gradient descent
  37. Vector Calculation Goal: learn vec(word) 1. Choose objective function 2.

    Init: random vectors 3. Run gradient descent
  38. Objective Function

  39. I enjoyed eating some pizza at the restaurant Objective Function

  40. I enjoyed eating some pizza at the restaurant Objective Function

  41. I enjoyed eating some pizza at the restaurant Objective Function

    maximise
 the likelihood of the context
 given the focus word
  42. I enjoyed eating some pizza at the restaurant Objective Function

    maximise
 the likelihood of the context
 given the focus word P(eating | pizza)
  43. WORD2VEC IN PYTHON

  44. None
  45. pip install gensim

  46. Example

  47. from gensim.models import Word2Vec fname = ‘my_dataset.json’ corpus = MyCorpusReader(fname)

    model = Word2Vec(corpus) Example
  48. from gensim.models import Word2Vec fname = ‘my_dataset.json’ corpus = MyCorpusReader(fname)

    model = Word2Vec(corpus) Example
  49. model.most_similar('chef') [('cook', 0.94), ('bartender', 0.91), ('waitress', 0.89), ('restaurant', 0.76), ...]

    Example
  50. model.most_similar(
 'chef',
 negative=['food']
 ) [('puppet', 0.93), ('devops', 0.92), ('ansible', 0.79),

    ('salt', 0.77), ...] Example
  51. Pre-trained Vectors

  52. Pre-trained Vectors from gensim.models.keyedvectors \
 import KeyedVectors fname = ‘GoogleNews-vectors.bin'

    model = KeyedVectors.load_word2vec_format( fname,
 binary=True )
  53. model.most_similar( positive=['king', ‘woman'], negative=[‘man’] ) Pre-trained Vectors

  54. model.most_similar( positive=['king', ‘woman'], negative=[‘man’] ) [('queen', 0.7118), ('monarch', 0.6189), ('princess',

    0.5902), ('crown_prince', 0.5499), ('prince', 0.5377), …] Pre-trained Vectors
  55. model.most_similar( positive=['Paris', ‘Italy'], negative=[‘France’] ) Pre-trained Vectors

  56. model.most_similar( positive=['Paris', ‘Italy'], negative=[‘France’] ) [('Milan', 0.7222), ('Rome', 0.7028), ('Palermo_Sicily',

    0.5967), ('Italian', 0.5911), ('Tuscany', 0.5632), …] Pre-trained Vectors
  57. model.most_similar( positive=[‘professor’,’woman’], negative=[‘man’] ) Pre-trained Vectors

  58. model.most_similar( positive=[‘professor’,’woman’], negative=[‘man’] ) [('associate_professor', 0.7771), ('assistant_professor', 0.7558), ('professor_emeritus', 0.7066),

    ('lecturer', 0.6982), ('sociology_professor', 0.6539), …] Pre-trained Vectors
  59. model.most_similar( positive=[‘professor', ‘man'], negative=[‘woman’] ) Pre-trained Vectors

  60. model.most_similar( positive=[‘professor', ‘man'], negative=[‘woman’] ) [('professor_emeritus', 0.7433), ('emeritus_professor', 0.7109), ('associate_professor',

    0.6817), ('Professor', 0.6495), ('assistant_professor', 0.6484), …] Pre-trained Vectors
  61. model.most_similar(
 positive=[‘computer_programmer’,’woman'],
 negative=[‘man’] ) Pre-trained Vectors

  62. model.most_similar(
 positive=[‘computer_programmer’,’woman'],
 negative=[‘man’] ) Pre-trained Vectors [('homemaker', 0.5627), ('housewife', 0.5105),

    ('graphic_designer', 0.5051), ('schoolteacher', 0.4979), ('businesswoman', 0.4934), …]
  63. Culture is biased Pre-trained Vectors

  64. Culture is biased Language is biased Pre-trained Vectors

  65. Culture is biased Language is biased Algorithms are not? Pre-trained

    Vectors
  66. NOT ONLY WORD2VEC

  67. GloVe (2014)

  68. GloVe (2014) • Global co-occurrence matrix

  69. GloVe (2014) • Global co-occurrence matrix • Much bigger memory

    footprint
  70. GloVe (2014) • Global co-occurrence matrix • Much bigger memory

    footprint • Downstream tasks: similar performances
  71. doc2vec (2014)

  72. doc2vec (2014) • From words to documents

  73. doc2vec (2014) • From words to documents • (or sentences,

    paragraphs, classes, …)
  74. doc2vec (2014) • From words to documents • (or sentences,

    paragraphs, classes, …) • P(context | word, label)
  75. fastText (2016-17)

  76. • word2vec + morphology (sub-words) fastText (2016-17)

  77. • word2vec + morphology (sub-words) • Pre-trained vectors on ~300

    languages fastText (2016-17)
  78. • word2vec + morphology (sub-words) • Pre-trained vectors on ~300

    languages • morphologically rich languages fastText (2016-17)
  79. FINAL REMARKS

  80. But we’ve been doing this for X years

  81. But we’ve been doing this for X years • Approaches

    based on co-occurrences are not new
  82. But we’ve been doing this for X years • Approaches

    based on co-occurrences are not new • … but usually outperformed by word embeddings
  83. But we’ve been doing this for X years • Approaches

    based on co-occurrences are not new • … but usually outperformed by word embeddings • … and don’t scale as well as word embeddings
  84. Garbage in, garbage out

  85. Garbage in, garbage out • Pre-trained vectors are useful …

    until they’re not
  86. Garbage in, garbage out • Pre-trained vectors are useful …

    until they’re not • The business domain is important
  87. Garbage in, garbage out • Pre-trained vectors are useful …

    until they’re not • The business domain is important • > 100K words? Maybe train your own model
  88. Garbage in, garbage out • Pre-trained vectors are useful …

    until they’re not • The business domain is important • > 100K words? Maybe train your own model • > 1M words? Yep, train your own model
  89. Summary

  90. Summary • Word Embeddings are magic! • Big victory of

    unsupervised learning • Gensim makes your life easy
  91. THANK YOU @MarcoBonzanini speakerdeck.com/marcobonzanini GitHub.com/bonzanini marcobonzanini.com

  92. Credits & Readings

  93. Credits & Readings Credits • Lev Konstantinovskiy (@teagermylk) Readings •

    Deep Learning for NLP (R. Socher) http://cs224d.stanford.edu/ • “GloVe: global vectors for word representation” by Pennington et al. • “Distributed Representation of Sentences and Documents” (doc2vec)
 by Le and Mikolov • “Enriching Word Vectors with Subword Information” (fastText)
 by Bojanokwsi et al.
  94. Credits & Readings Even More Readings • “Man is to

    Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings” by Bolukbasi et al. • “Quantifying and Reducing Stereotypes in Word Embeddings” by Bolukbasi et al. • “Equality of Opportunity in Machine Learning” - Google Research Blog
 https://research.googleblog.com/2016/10/equality-of-opportunity-in-machine.html Pics Credits • Classification: https://commons.wikimedia.org/wiki/File:Cluster-2.svg • Translation: https://commons.wikimedia.org/wiki/File:Translation_-_A_till_%C3%85-colours.svg • Welsh cake: https://commons.wikimedia.org/wiki/File:Closeup_of_Welsh_cakes,_February_2009.jpg • Pizza: https://commons.wikimedia.org/wiki/File:Eq_it-na_pizza-margherita_sep2005_sml.jpg