Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Word embeddings under the hood - Strata Data Conference

Word embeddings under the hood - Strata Data Conference

Slides from the talk "Word embeddings under the hood: How neural networks learn from language" as presented on March 8, 2018 at the Strata Data Conference in San Jose.

https://conferences.oreilly.com/strata/strata-ca/public/schedule/detail/63773

Patrick Harrison

March 08, 2018
Tweet

More Decks by Patrick Harrison

Other Decks in Technology

Transcript

  1. Word Embeddings Under the Hood
    How Neural Networks Learn from Language

    View Slide

  2. View Slide

  3. word2vec

    View Slide

  4. “Wait, what?”
    Source: https://tinyurl.com/y9lb6e7j

    View Slide

  5. Good news! Text data is everywhere.

    View Slide

  6. Bad news… there is way too much.
    We need computers to help!

    View Slide

  7. We started with the scallop dish as an appetizer, followed by
    the spaghetti with tomato sauce and duck and foie gras ravioli.
    How do we represent data like this?

    View Slide

  8. 1 2 3 … V
    we 1 0 0 … 0
    started 0 1 0 … 0
    with 0 0 1 … 0
    … … … … … …
    ravioli 0 0 0 … 1
    One-Hot Encoding

    View Slide

  9. …but one-hot encoding
    leaves a lot to be desired.
    Are better word representations possible?

    View Slide

  10. y
    -2
    -1
    0
    1
    2
    x
    -2 -1 0 1 2
    beer
    wine
    cocktail
    spoon
    fork
    knife
    spaghetti
    pasta
    lasagna

    View Slide

  11. y
    -2
    -1
    0
    1
    2
    x
    -2 -1 0 1 2
    beer
    wine
    cocktail
    spoon
    fork
    knife
    spaghetti
    pasta
    lasagna
    x y
    spaghetti 1.0 1.5
    pasta 1.2 1.3
    … … …
    fork 0.0 -0.7
    spoon -0.5 -1.5

    View Slide

  12. “You shall know a word by the company it keeps.”
    — J.R. Firth, 1957
    Postulate #1

    View Slide

  13. “Neural networks learn useful, new data representations.”
    — Rumelhart, Hinton & Williams, 1986
    (paraphrased)
    Postulate #2

    View Slide

  14. context clues
    neural networks
    ?
    =
    +

    View Slide

  15. Context clues as training data?

    View Slide

  16. View Slide

  17. View Slide

  18. spaghetti followed 1
    spaghetti by 1
    … … …
    spaghetti sauce 1

    View Slide

  19. spaghetti followed 1
    spaghetti by 1
    … … …
    spaghetti sauce 1
    spaghetti we 0
    spaghetti parking 0
    … … …
    spaghetti sushi 0

    View Slide

  20. with by 1
    with the 1
    … … …
    with and 1
    with appetizer 0
    with loud 0
    … … …
    with up 0

    View Slide

  21. Minimum Viable Introduction to
    Neural Networks

    View Slide

  22. View Slide

  23. View Slide

  24. View Slide

  25. View Slide

  26. View Slide

  27. “Sigmoid” Activation Function
    Weighted Input
    Activation Value
    (z) =
    1
    1 + e z
    AAACBnicdVDLSsNAFJ3UV62vqEtBBotQEUMionYhFNy4rGBsoYllMp3UoTNJmJkIbcjOjb/ixoWKW7/BnX/j9CH4PHDhcM693HtPkDAqlW2/G4Wp6ZnZueJ8aWFxaXnFXF27lHEqMHFxzGLRDJAkjEbEVVQx0kwEQTxgpBH0Tod+44YISePoQvUT4nPUjWhIMVJaapubnqRdjiqDHXgCvVAgnDl55uySq2xvkOdts2xbVdupHjrwN3Ese4QymKDeNt+8ToxTTiKFGZKy5diJ8jMkFMWM5CUvlSRBuIe6pKVphDiRfjb6I4fbWunAMBa6IgVH6teJDHEp+zzQnRypa/nTG4p/ea1Uhcd+RqMkVSTC40VhyqCK4TAU2KGCYMX6miAsqL4V4mukw1A6upIO4fNT+D9x962qZZ8flGv1SRpFsAG2QAU44AjUwBmoAxdgcAvuwSN4Mu6MB+PZeBm3FozJzDr4BuP1A0VMmJI=
    AAACBnicdVDLSsNAFJ3UV62vqEtBBotQEUMionYhFNy4rGBsoYllMp3UoTNJmJkIbcjOjb/ixoWKW7/BnX/j9CH4PHDhcM693HtPkDAqlW2/G4Wp6ZnZueJ8aWFxaXnFXF27lHEqMHFxzGLRDJAkjEbEVVQx0kwEQTxgpBH0Tod+44YISePoQvUT4nPUjWhIMVJaapubnqRdjiqDHXgCvVAgnDl55uySq2xvkOdts2xbVdupHjrwN3Ese4QymKDeNt+8ToxTTiKFGZKy5diJ8jMkFMWM5CUvlSRBuIe6pKVphDiRfjb6I4fbWunAMBa6IgVH6teJDHEp+zzQnRypa/nTG4p/ea1Uhcd+RqMkVSTC40VhyqCK4TAU2KGCYMX6miAsqL4V4mukw1A6upIO4fNT+D9x962qZZ8flGv1SRpFsAG2QAU44AjUwBmoAxdgcAvuwSN4Mu6MB+PZeBm3FozJzDr4BuP1A0VMmJI=
    AAACBnicdVDLSsNAFJ3UV62vqEtBBotQEUMionYhFNy4rGBsoYllMp3UoTNJmJkIbcjOjb/ixoWKW7/BnX/j9CH4PHDhcM693HtPkDAqlW2/G4Wp6ZnZueJ8aWFxaXnFXF27lHEqMHFxzGLRDJAkjEbEVVQx0kwEQTxgpBH0Tod+44YISePoQvUT4nPUjWhIMVJaapubnqRdjiqDHXgCvVAgnDl55uySq2xvkOdts2xbVdupHjrwN3Ese4QymKDeNt+8ToxTTiKFGZKy5diJ8jMkFMWM5CUvlSRBuIe6pKVphDiRfjb6I4fbWunAMBa6IgVH6teJDHEp+zzQnRypa/nTG4p/ea1Uhcd+RqMkVSTC40VhyqCK4TAU2KGCYMX6miAsqL4V4mukw1A6upIO4fNT+D9x962qZZ8flGv1SRpFsAG2QAU44AjUwBmoAxdgcAvuwSN4Mu6MB+PZeBm3FozJzDr4BuP1A0VMmJI=

    View Slide

  28. “Sigmoid” Activation Function
    Weighted Input
    Activation Value
    0.88
    (z) =
    1
    1 + e z
    AAACBnicdVDLSsNAFJ3UV62vqEtBBotQEUMionYhFNy4rGBsoYllMp3UoTNJmJkIbcjOjb/ixoWKW7/BnX/j9CH4PHDhcM693HtPkDAqlW2/G4Wp6ZnZueJ8aWFxaXnFXF27lHEqMHFxzGLRDJAkjEbEVVQx0kwEQTxgpBH0Tod+44YISePoQvUT4nPUjWhIMVJaapubnqRdjiqDHXgCvVAgnDl55uySq2xvkOdts2xbVdupHjrwN3Ese4QymKDeNt+8ToxTTiKFGZKy5diJ8jMkFMWM5CUvlSRBuIe6pKVphDiRfjb6I4fbWunAMBa6IgVH6teJDHEp+zzQnRypa/nTG4p/ea1Uhcd+RqMkVSTC40VhyqCK4TAU2KGCYMX6miAsqL4V4mukw1A6upIO4fNT+D9x962qZZ8flGv1SRpFsAG2QAU44AjUwBmoAxdgcAvuwSN4Mu6MB+PZeBm3FozJzDr4BuP1A0VMmJI=
    AAACBnicdVDLSsNAFJ3UV62vqEtBBotQEUMionYhFNy4rGBsoYllMp3UoTNJmJkIbcjOjb/ixoWKW7/BnX/j9CH4PHDhcM693HtPkDAqlW2/G4Wp6ZnZueJ8aWFxaXnFXF27lHEqMHFxzGLRDJAkjEbEVVQx0kwEQTxgpBH0Tod+44YISePoQvUT4nPUjWhIMVJaapubnqRdjiqDHXgCvVAgnDl55uySq2xvkOdts2xbVdupHjrwN3Ese4QymKDeNt+8ToxTTiKFGZKy5diJ8jMkFMWM5CUvlSRBuIe6pKVphDiRfjb6I4fbWunAMBa6IgVH6teJDHEp+zzQnRypa/nTG4p/ea1Uhcd+RqMkVSTC40VhyqCK4TAU2KGCYMX6miAsqL4V4mukw1A6upIO4fNT+D9x962qZZ8flGv1SRpFsAG2QAU44AjUwBmoAxdgcAvuwSN4Mu6MB+PZeBm3FozJzDr4BuP1A0VMmJI=
    AAACBnicdVDLSsNAFJ3UV62vqEtBBotQEUMionYhFNy4rGBsoYllMp3UoTNJmJkIbcjOjb/ixoWKW7/BnX/j9CH4PHDhcM693HtPkDAqlW2/G4Wp6ZnZueJ8aWFxaXnFXF27lHEqMHFxzGLRDJAkjEbEVVQx0kwEQTxgpBH0Tod+44YISePoQvUT4nPUjWhIMVJaapubnqRdjiqDHXgCvVAgnDl55uySq2xvkOdts2xbVdupHjrwN3Ese4QymKDeNt+8ToxTTiKFGZKy5diJ8jMkFMWM5CUvlSRBuIe6pKVphDiRfjb6I4fbWunAMBa6IgVH6teJDHEp+zzQnRypa/nTG4p/ea1Uhcd+RqMkVSTC40VhyqCK4TAU2KGCYMX6miAsqL4V4mukw1A6upIO4fNT+D9x962qZZ8flGv1SRpFsAG2QAU44AjUwBmoAxdgcAvuwSN4Mu6MB+PZeBm3FozJzDr4BuP1A0VMmJI=

    View Slide

  29. View Slide

  30. View Slide

  31. View Slide

  32. View Slide

  33. View Slide

  34. View Slide

  35. View Slide

  36. View Slide

  37. View Slide

  38. View Slide

  39. A neural network for
    learning context clues?

    View Slide

  40. (spaghetti, tomato, 1)

    View Slide

  41. View Slide

  42. View Slide

  43. View Slide

  44. View Slide

  45. View Slide

  46. (weight matrix for the hidden layer)

    View Slide

  47. (weight matrix for the hidden layer)

    View Slide

  48. (weight matrix for the output layer)

    View Slide

  49. (weight matrix for the output layer)

    View Slide

  50. View Slide

  51. View Slide

  52. Training our network
    on the first context clue

    View Slide

  53. Training our network
    on the first context clue
    (there will be lots of these)

    View Slide

  54. 1. Make a prediction

    View Slide

  55. 1. Make a prediction
    2. Measure how wrong we are

    View Slide

  56. 1. Make a prediction
    2. Measure how wrong we are
    3. Tune the model to become
    slightly less wrong

    View Slide

  57. 1. Make a prediction
    2. Measure how wrong we are
    3. Tune the model to become
    slightly less wrong
    4. Repeat with the next context clue

    View Slide

  58. 1. Make a prediction
    2. Measure how wrong we are
    3. Tune the model to become
    slightly less wrong
    4. Repeat with the next context clue

    View Slide

  59. View Slide

  60. View Slide

  61. View Slide

  62. View Slide

  63. View Slide

  64. View Slide

  65. View Slide

  66. View Slide

  67. View Slide

  68. Weighted Input
    Activation Value

    View Slide

  69. Weighted Input
    Activation Value
    0.51

    View Slide

  70. View Slide

  71. “forward pass”

    View Slide

  72. 1. Make a prediction
    2. Measure how wrong we are
    3. Tune the model to become
    slightly less wrong
    4. Repeat with the next context clue

    View Slide

  73. View Slide

  74. “Loss” Function
    Model Prediction
    Penalty
    L(ˆ
    y) = ln(ˆ
    y)
    AAACAHicdVDLSgMxFM3UV62vUTeCm2AR6sIyI6J2IRTcuHBRwbFCO5RMmmlDM5khuSOUoW78FTcuVNz6Ge78G9OH4vPAhZNz7iX3niARXIPjvFm5qemZ2bn8fGFhcWl5xV5du9RxqijzaCxidRUQzQSXzAMOgl0lipEoEKwe9E6Gfv2aKc1jeQH9hPkR6UgeckrASC1746zU7BLI+oOd492mkJ+vll10yhXHrRy4+Ddxy84IRTRBrWW/NtsxTSMmgQqidcN1EvAzooBTwQaFZqpZQmiPdFjDUEkipv1sdMEAbxuljcNYmZKAR+rXiYxEWvejwHRGBLr6pzcU//IaKYRHfsZlkgKTdPxRmAoMMR7GgdtcMQqibwihiptdMe0SRSiY0AomhI9L8f/E2ytXys75frFam6SRR5toC5WQiw5RFZ2iGvIQRTfoDj2gR+vWureerOdxa86azKyjb7Be3gFNVZZs
    AAACAHicdVDLSgMxFM3UV62vUTeCm2AR6sIyI6J2IRTcuHBRwbFCO5RMmmlDM5khuSOUoW78FTcuVNz6Ge78G9OH4vPAhZNz7iX3niARXIPjvFm5qemZ2bn8fGFhcWl5xV5du9RxqijzaCxidRUQzQSXzAMOgl0lipEoEKwe9E6Gfv2aKc1jeQH9hPkR6UgeckrASC1746zU7BLI+oOd492mkJ+vll10yhXHrRy4+Ddxy84IRTRBrWW/NtsxTSMmgQqidcN1EvAzooBTwQaFZqpZQmiPdFjDUEkipv1sdMEAbxuljcNYmZKAR+rXiYxEWvejwHRGBLr6pzcU//IaKYRHfsZlkgKTdPxRmAoMMR7GgdtcMQqibwihiptdMe0SRSiY0AomhI9L8f/E2ytXys75frFam6SRR5toC5WQiw5RFZ2iGvIQRTfoDj2gR+vWureerOdxa86azKyjb7Be3gFNVZZs
    AAACAHicdVDLSgMxFM3UV62vUTeCm2AR6sIyI6J2IRTcuHBRwbFCO5RMmmlDM5khuSOUoW78FTcuVNz6Ge78G9OH4vPAhZNz7iX3niARXIPjvFm5qemZ2bn8fGFhcWl5xV5du9RxqijzaCxidRUQzQSXzAMOgl0lipEoEKwe9E6Gfv2aKc1jeQH9hPkR6UgeckrASC1746zU7BLI+oOd492mkJ+vll10yhXHrRy4+Ddxy84IRTRBrWW/NtsxTSMmgQqidcN1EvAzooBTwQaFZqpZQmiPdFjDUEkipv1sdMEAbxuljcNYmZKAR+rXiYxEWvejwHRGBLr6pzcU//IaKYRHfsZlkgKTdPxRmAoMMR7GgdtcMQqibwihiptdMe0SRSiY0AomhI9L8f/E2ytXys75frFam6SRR5toC5WQiw5RFZ2iGvIQRTfoDj2gR+vWureerOdxa86azKyjb7Be3gFNVZZs
    right answer: 1

    View Slide

  75. “Loss” Function
    Model Prediction
    Penalty
    right answer: 0
    L(ˆ
    y) = ln (1 ˆ
    y)
    AAACBnicdVDLSsNAFJ34rPEVdSnIYBHqoiERUbsQim5cuKhgbaENZTKdtEMnkzAzEULIzo2/4saFilu/wZ1/4/QFPg9cOJxzL/fe48eMSuU4H8bM7Nz8wmJhyVxeWV1btzY2b2SUCEzqOGKRaPpIEkY5qSuqGGnGgqDQZ6ThD86HfuOWCEkjfq3SmHgh6nEaUIyUljrWzmWp3UcqS/N989QstxnPSm55KuUdq+jYFcetHLnwN3FtZ4QimKDWsd7b3QgnIeEKMyRly3Vi5WVIKIoZyc12IkmM8AD1SEtTjkIivWz0Rw73tNKFQSR0cQVH6teJDIVSpqGvO0Ok+vKnNxT/8lqJCk68jPI4UYTj8aIgYVBFcBgK7FJBsGKpJggLqm+FuI8EwkpHZ+oQpp/C/0n9wK7YztVhsXo2SaMAtsEuKAEXHIMquAA1UAcY3IEH8ASejXvj0XgxXsetM8ZkZgt8g/H2CXbFmAQ=
    AAACBnicdVDLSsNAFJ34rPEVdSnIYBHqoiERUbsQim5cuKhgbaENZTKdtEMnkzAzEULIzo2/4saFilu/wZ1/4/QFPg9cOJxzL/fe48eMSuU4H8bM7Nz8wmJhyVxeWV1btzY2b2SUCEzqOGKRaPpIEkY5qSuqGGnGgqDQZ6ThD86HfuOWCEkjfq3SmHgh6nEaUIyUljrWzmWp3UcqS/N989QstxnPSm55KuUdq+jYFcetHLnwN3FtZ4QimKDWsd7b3QgnIeEKMyRly3Vi5WVIKIoZyc12IkmM8AD1SEtTjkIivWz0Rw73tNKFQSR0cQVH6teJDIVSpqGvO0Ok+vKnNxT/8lqJCk68jPI4UYTj8aIgYVBFcBgK7FJBsGKpJggLqm+FuI8EwkpHZ+oQpp/C/0n9wK7YztVhsXo2SaMAtsEuKAEXHIMquAA1UAcY3IEH8ASejXvj0XgxXsetM8ZkZgt8g/H2CXbFmAQ=
    AAACBnicdVDLSsNAFJ34rPEVdSnIYBHqoiERUbsQim5cuKhgbaENZTKdtEMnkzAzEULIzo2/4saFilu/wZ1/4/QFPg9cOJxzL/fe48eMSuU4H8bM7Nz8wmJhyVxeWV1btzY2b2SUCEzqOGKRaPpIEkY5qSuqGGnGgqDQZ6ThD86HfuOWCEkjfq3SmHgh6nEaUIyUljrWzmWp3UcqS/N989QstxnPSm55KuUdq+jYFcetHLnwN3FtZ4QimKDWsd7b3QgnIeEKMyRly3Vi5WVIKIoZyc12IkmM8AD1SEtTjkIivWz0Rw73tNKFQSR0cQVH6teJDIVSpqGvO0Ok+vKnNxT/8lqJCk68jPI4UYTj8aIgYVBFcBgK7FJBsGKpJggLqm+FuI8EwkpHZ+oQpp/C/0n9wK7YztVhsXo2SaMAtsEuKAEXHIMquAA1UAcY3IEH8ASejXvj0XgxXsetM8ZkZgt8g/H2CXbFmAQ=

    View Slide

  76. “Loss” Function
    Model Prediction
    Penalty
    L(ˆ
    y) = ln(ˆ
    y)
    AAACAHicdVDLSgMxFM3UV62vUTeCm2AR6sIyI6J2IRTcuHBRwbFCO5RMmmlDM5khuSOUoW78FTcuVNz6Ge78G9OH4vPAhZNz7iX3niARXIPjvFm5qemZ2bn8fGFhcWl5xV5du9RxqijzaCxidRUQzQSXzAMOgl0lipEoEKwe9E6Gfv2aKc1jeQH9hPkR6UgeckrASC1746zU7BLI+oOd492mkJ+vll10yhXHrRy4+Ddxy84IRTRBrWW/NtsxTSMmgQqidcN1EvAzooBTwQaFZqpZQmiPdFjDUEkipv1sdMEAbxuljcNYmZKAR+rXiYxEWvejwHRGBLr6pzcU//IaKYRHfsZlkgKTdPxRmAoMMR7GgdtcMQqibwihiptdMe0SRSiY0AomhI9L8f/E2ytXys75frFam6SRR5toC5WQiw5RFZ2iGvIQRTfoDj2gR+vWureerOdxa86azKyjb7Be3gFNVZZs
    AAACAHicdVDLSgMxFM3UV62vUTeCm2AR6sIyI6J2IRTcuHBRwbFCO5RMmmlDM5khuSOUoW78FTcuVNz6Ge78G9OH4vPAhZNz7iX3niARXIPjvFm5qemZ2bn8fGFhcWl5xV5du9RxqijzaCxidRUQzQSXzAMOgl0lipEoEKwe9E6Gfv2aKc1jeQH9hPkR6UgeckrASC1746zU7BLI+oOd492mkJ+vll10yhXHrRy4+Ddxy84IRTRBrWW/NtsxTSMmgQqidcN1EvAzooBTwQaFZqpZQmiPdFjDUEkipv1sdMEAbxuljcNYmZKAR+rXiYxEWvejwHRGBLr6pzcU//IaKYRHfsZlkgKTdPxRmAoMMR7GgdtcMQqibwihiptdMe0SRSiY0AomhI9L8f/E2ytXys75frFam6SRR5toC5WQiw5RFZ2iGvIQRTfoDj2gR+vWureerOdxa86azKyjb7Be3gFNVZZs
    AAACAHicdVDLSgMxFM3UV62vUTeCm2AR6sIyI6J2IRTcuHBRwbFCO5RMmmlDM5khuSOUoW78FTcuVNz6Ge78G9OH4vPAhZNz7iX3niARXIPjvFm5qemZ2bn8fGFhcWl5xV5du9RxqijzaCxidRUQzQSXzAMOgl0lipEoEKwe9E6Gfv2aKc1jeQH9hPkR6UgeckrASC1746zU7BLI+oOd492mkJ+vll10yhXHrRy4+Ddxy84IRTRBrWW/NtsxTSMmgQqidcN1EvAzooBTwQaFZqpZQmiPdFjDUEkipv1sdMEAbxuljcNYmZKAR+rXiYxEWvejwHRGBLr6pzcU//IaKYRHfsZlkgKTdPxRmAoMMR7GgdtcMQqibwihiptdMe0SRSiY0AomhI9L8f/E2ytXys75frFam6SRR5toC5WQiw5RFZ2iGvIQRTfoDj2gR+vWureerOdxa86azKyjb7Be3gFNVZZs
    right answer: 1

    View Slide

  77. “Loss” Function
    Model Prediction
    Penalty
    0.51
    0.67
    L(ˆ
    y) = ln(ˆ
    y)
    AAACAHicdVDLSgMxFM3UV62vUTeCm2AR6sIyI6J2IRTcuHBRwbFCO5RMmmlDM5khuSOUoW78FTcuVNz6Ge78G9OH4vPAhZNz7iX3niARXIPjvFm5qemZ2bn8fGFhcWl5xV5du9RxqijzaCxidRUQzQSXzAMOgl0lipEoEKwe9E6Gfv2aKc1jeQH9hPkR6UgeckrASC1746zU7BLI+oOd492mkJ+vll10yhXHrRy4+Ddxy84IRTRBrWW/NtsxTSMmgQqidcN1EvAzooBTwQaFZqpZQmiPdFjDUEkipv1sdMEAbxuljcNYmZKAR+rXiYxEWvejwHRGBLr6pzcU//IaKYRHfsZlkgKTdPxRmAoMMR7GgdtcMQqibwihiptdMe0SRSiY0AomhI9L8f/E2ytXys75frFam6SRR5toC5WQiw5RFZ2iGvIQRTfoDj2gR+vWureerOdxa86azKyjb7Be3gFNVZZs
    AAACAHicdVDLSgMxFM3UV62vUTeCm2AR6sIyI6J2IRTcuHBRwbFCO5RMmmlDM5khuSOUoW78FTcuVNz6Ge78G9OH4vPAhZNz7iX3niARXIPjvFm5qemZ2bn8fGFhcWl5xV5du9RxqijzaCxidRUQzQSXzAMOgl0lipEoEKwe9E6Gfv2aKc1jeQH9hPkR6UgeckrASC1746zU7BLI+oOd492mkJ+vll10yhXHrRy4+Ddxy84IRTRBrWW/NtsxTSMmgQqidcN1EvAzooBTwQaFZqpZQmiPdFjDUEkipv1sdMEAbxuljcNYmZKAR+rXiYxEWvejwHRGBLr6pzcU//IaKYRHfsZlkgKTdPxRmAoMMR7GgdtcMQqibwihiptdMe0SRSiY0AomhI9L8f/E2ytXys75frFam6SRR5toC5WQiw5RFZ2iGvIQRTfoDj2gR+vWureerOdxa86azKyjb7Be3gFNVZZs
    AAACAHicdVDLSgMxFM3UV62vUTeCm2AR6sIyI6J2IRTcuHBRwbFCO5RMmmlDM5khuSOUoW78FTcuVNz6Ge78G9OH4vPAhZNz7iX3niARXIPjvFm5qemZ2bn8fGFhcWl5xV5du9RxqijzaCxidRUQzQSXzAMOgl0lipEoEKwe9E6Gfv2aKc1jeQH9hPkR6UgeckrASC1746zU7BLI+oOd492mkJ+vll10yhXHrRy4+Ddxy84IRTRBrWW/NtsxTSMmgQqidcN1EvAzooBTwQaFZqpZQmiPdFjDUEkipv1sdMEAbxuljcNYmZKAR+rXiYxEWvejwHRGBLr6pzcU//IaKYRHfsZlkgKTdPxRmAoMMR7GgdtcMQqibwihiptdMe0SRSiY0AomhI9L8f/E2ytXys75frFam6SRR5toC5WQiw5RFZ2iGvIQRTfoDj2gR+vWureerOdxa86azKyjb7Be3gFNVZZs
    right answer: 1

    View Slide

  78. 1. Make a prediction
    2. Measure how wrong we are
    3. Tune the model to become
    slightly less wrong
    4. Repeat with the next context clue

    View Slide

  79. “Loss” Function
    Model Prediction
    Penalty
    0.51
    0.67
    L(ˆ
    y) = ln(ˆ
    y)
    AAACAHicdVDLSgMxFM3UV62vUTeCm2AR6sIyI6J2IRTcuHBRwbFCO5RMmmlDM5khuSOUoW78FTcuVNz6Ge78G9OH4vPAhZNz7iX3niARXIPjvFm5qemZ2bn8fGFhcWl5xV5du9RxqijzaCxidRUQzQSXzAMOgl0lipEoEKwe9E6Gfv2aKc1jeQH9hPkR6UgeckrASC1746zU7BLI+oOd492mkJ+vll10yhXHrRy4+Ddxy84IRTRBrWW/NtsxTSMmgQqidcN1EvAzooBTwQaFZqpZQmiPdFjDUEkipv1sdMEAbxuljcNYmZKAR+rXiYxEWvejwHRGBLr6pzcU//IaKYRHfsZlkgKTdPxRmAoMMR7GgdtcMQqibwihiptdMe0SRSiY0AomhI9L8f/E2ytXys75frFam6SRR5toC5WQiw5RFZ2iGvIQRTfoDj2gR+vWureerOdxa86azKyjb7Be3gFNVZZs
    AAACAHicdVDLSgMxFM3UV62vUTeCm2AR6sIyI6J2IRTcuHBRwbFCO5RMmmlDM5khuSOUoW78FTcuVNz6Ge78G9OH4vPAhZNz7iX3niARXIPjvFm5qemZ2bn8fGFhcWl5xV5du9RxqijzaCxidRUQzQSXzAMOgl0lipEoEKwe9E6Gfv2aKc1jeQH9hPkR6UgeckrASC1746zU7BLI+oOd492mkJ+vll10yhXHrRy4+Ddxy84IRTRBrWW/NtsxTSMmgQqidcN1EvAzooBTwQaFZqpZQmiPdFjDUEkipv1sdMEAbxuljcNYmZKAR+rXiYxEWvejwHRGBLr6pzcU//IaKYRHfsZlkgKTdPxRmAoMMR7GgdtcMQqibwihiptdMe0SRSiY0AomhI9L8f/E2ytXys75frFam6SRR5toC5WQiw5RFZ2iGvIQRTfoDj2gR+vWureerOdxa86azKyjb7Be3gFNVZZs
    AAACAHicdVDLSgMxFM3UV62vUTeCm2AR6sIyI6J2IRTcuHBRwbFCO5RMmmlDM5khuSOUoW78FTcuVNz6Ge78G9OH4vPAhZNz7iX3niARXIPjvFm5qemZ2bn8fGFhcWl5xV5du9RxqijzaCxidRUQzQSXzAMOgl0lipEoEKwe9E6Gfv2aKc1jeQH9hPkR6UgeckrASC1746zU7BLI+oOd492mkJ+vll10yhXHrRy4+Ddxy84IRTRBrWW/NtsxTSMmgQqidcN1EvAzooBTwQaFZqpZQmiPdFjDUEkipv1sdMEAbxuljcNYmZKAR+rXiYxEWvejwHRGBLr6pzcU//IaKYRHfsZlkgKTdPxRmAoMMR7GgdtcMQqibwihiptdMe0SRSiY0AomhI9L8f/E2ytXys75frFam6SRR5toC5WQiw5RFZ2iGvIQRTfoDj2gR+vWureerOdxa86azKyjb7Be3gFNVZZs
    right answer: 1

    View Slide

  80. “Loss” Function
    Model Prediction
    Penalty
    0.51
    0.67
    L(ˆ
    y)
    ˆ
    y
    =
    1
    ˆ
    y
    AAACKnicdZDLSsNAFIYnXmu8RV26GSyCLgxJVdouBNGNCxcVrApNKZPJxA6dXJg5EUrI+7jxVVzoQsWtD+K0RlHRAwM/338OZ87vp4IrcJwXY2JyanpmtjJnzi8sLi1bK6sXKskkZW2aiERe+UQxwWPWBg6CXaWSkcgX7NIfHI/8yxsmFU/icximrBuR65iHnBLQqGcdeaEk1My9gAkg+HTL6xPIh8V28cVKUpgH5o5Ztrsju+Q9q+rY9Wat0XSwYzvj0sKt7Tfqu9gtSRWV1epZD16Q0CxiMVBBlOq4TgrdnEjgVLDC9DLFUkIH5Jp1tIxJxFQ3H99a4E1NAhwmUr8Y8Jh+n8hJpNQw8nVnRKCvfnsj+JfXySBsdHMepxmwmH4sCjOBIcGj4HDAJaMghloQKrn+K6Z9otMAHa+pQ/i8FP8v2jW7aTtne9XDVplGBa2jDbSFXFRHh+gEtVAbUXSL7tETejbujEfjxXj9aJ0wypk19KOMt3dcg6bV
    AAACKnicdZDLSsNAFIYnXmu8RV26GSyCLgxJVdouBNGNCxcVrApNKZPJxA6dXJg5EUrI+7jxVVzoQsWtD+K0RlHRAwM/338OZ87vp4IrcJwXY2JyanpmtjJnzi8sLi1bK6sXKskkZW2aiERe+UQxwWPWBg6CXaWSkcgX7NIfHI/8yxsmFU/icximrBuR65iHnBLQqGcdeaEk1My9gAkg+HTL6xPIh8V28cVKUpgH5o5Ztrsju+Q9q+rY9Wat0XSwYzvj0sKt7Tfqu9gtSRWV1epZD16Q0CxiMVBBlOq4TgrdnEjgVLDC9DLFUkIH5Jp1tIxJxFQ3H99a4E1NAhwmUr8Y8Jh+n8hJpNQw8nVnRKCvfnsj+JfXySBsdHMepxmwmH4sCjOBIcGj4HDAJaMghloQKrn+K6Z9otMAHa+pQ/i8FP8v2jW7aTtne9XDVplGBa2jDbSFXFRHh+gEtVAbUXSL7tETejbujEfjxXj9aJ0wypk19KOMt3dcg6bV
    AAACKnicdZDLSsNAFIYnXmu8RV26GSyCLgxJVdouBNGNCxcVrApNKZPJxA6dXJg5EUrI+7jxVVzoQsWtD+K0RlHRAwM/338OZ87vp4IrcJwXY2JyanpmtjJnzi8sLi1bK6sXKskkZW2aiERe+UQxwWPWBg6CXaWSkcgX7NIfHI/8yxsmFU/icximrBuR65iHnBLQqGcdeaEk1My9gAkg+HTL6xPIh8V28cVKUpgH5o5Ztrsju+Q9q+rY9Wat0XSwYzvj0sKt7Tfqu9gtSRWV1epZD16Q0CxiMVBBlOq4TgrdnEjgVLDC9DLFUkIH5Jp1tIxJxFQ3H99a4E1NAhwmUr8Y8Jh+n8hJpNQw8nVnRKCvfnsj+JfXySBsdHMepxmwmH4sCjOBIcGj4HDAJaMghloQKrn+K6Z9otMAHa+pQ/i8FP8v2jW7aTtne9XDVplGBa2jDbSFXFRHh+gEtVAbUXSL7tETejbujEfjxXj9aJ0wypk19KOMt3dcg6bV
    L(ˆ
    y) = ln(ˆ
    y)
    AAACAHicdVDLSgMxFM3UV62vUTeCm2AR6sIyI6J2IRTcuHBRwbFCO5RMmmlDM5khuSOUoW78FTcuVNz6Ge78G9OH4vPAhZNz7iX3niARXIPjvFm5qemZ2bn8fGFhcWl5xV5du9RxqijzaCxidRUQzQSXzAMOgl0lipEoEKwe9E6Gfv2aKc1jeQH9hPkR6UgeckrASC1746zU7BLI+oOd492mkJ+vll10yhXHrRy4+Ddxy84IRTRBrWW/NtsxTSMmgQqidcN1EvAzooBTwQaFZqpZQmiPdFjDUEkipv1sdMEAbxuljcNYmZKAR+rXiYxEWvejwHRGBLr6pzcU//IaKYRHfsZlkgKTdPxRmAoMMR7GgdtcMQqibwihiptdMe0SRSiY0AomhI9L8f/E2ytXys75frFam6SRR5toC5WQiw5RFZ2iGvIQRTfoDj2gR+vWureerOdxa86azKyjb7Be3gFNVZZs
    AAACAHicdVDLSgMxFM3UV62vUTeCm2AR6sIyI6J2IRTcuHBRwbFCO5RMmmlDM5khuSOUoW78FTcuVNz6Ge78G9OH4vPAhZNz7iX3niARXIPjvFm5qemZ2bn8fGFhcWl5xV5du9RxqijzaCxidRUQzQSXzAMOgl0lipEoEKwe9E6Gfv2aKc1jeQH9hPkR6UgeckrASC1746zU7BLI+oOd492mkJ+vll10yhXHrRy4+Ddxy84IRTRBrWW/NtsxTSMmgQqidcN1EvAzooBTwQaFZqpZQmiPdFjDUEkipv1sdMEAbxuljcNYmZKAR+rXiYxEWvejwHRGBLr6pzcU//IaKYRHfsZlkgKTdPxRmAoMMR7GgdtcMQqibwihiptdMe0SRSiY0AomhI9L8f/E2ytXys75frFam6SRR5toC5WQiw5RFZ2iGvIQRTfoDj2gR+vWureerOdxa86azKyjb7Be3gFNVZZs
    AAACAHicdVDLSgMxFM3UV62vUTeCm2AR6sIyI6J2IRTcuHBRwbFCO5RMmmlDM5khuSOUoW78FTcuVNz6Ge78G9OH4vPAhZNz7iX3niARXIPjvFm5qemZ2bn8fGFhcWl5xV5du9RxqijzaCxidRUQzQSXzAMOgl0lipEoEKwe9E6Gfv2aKc1jeQH9hPkR6UgeckrASC1746zU7BLI+oOd492mkJ+vll10yhXHrRy4+Ddxy84IRTRBrWW/NtsxTSMmgQqidcN1EvAzooBTwQaFZqpZQmiPdFjDUEkipv1sdMEAbxuljcNYmZKAR+rXiYxEWvejwHRGBLr6pzcU//IaKYRHfsZlkgKTdPxRmAoMMR7GgdtcMQqibwihiptdMe0SRSiY0AomhI9L8f/E2ytXys75frFam6SRR5toC5WQiw5RFZ2iGvIQRTfoDj2gR+vWureerOdxa86azKyjb7Be3gFNVZZs
    right answer: 1

    View Slide

  81. Move in the opposite direction
    from the gradient

    View Slide

  82. View Slide

  83. View Slide

  84. View Slide

  85. View Slide

  86. View Slide

  87. View Slide

  88. View Slide

  89. View Slide

  90. It works! Now…

    View Slide

  91. It works! Now…
    1. How do we know which direction
    to nudge a weight?

    View Slide

  92. It works! Now…
    1. How do we know which direction
    to nudge a weight?
    2. How can we calculate this
    automatically for all the weights?

    View Slide

  93. View Slide

  94. View Slide

  95. View Slide

  96. View Slide

  97. View Slide

  98. View Slide

  99. View Slide

  100. View Slide

  101. View Slide

  102. View Slide

  103. View Slide

  104. View Slide

  105. View Slide

  106. View Slide

  107. View Slide

  108. View Slide

  109. “back propagation”

    View Slide

  110. View Slide

  111. View Slide

  112. View Slide

  113. View Slide

  114. View Slide

  115. Did it work?

    View Slide

  116. View Slide

  117. View Slide

  118. “stochastic gradient descent”

    View Slide

  119. Context clues trained: 1

    View Slide

  120. 1. Make a prediction
    2. Measure how wrong we are
    3. Tune the model to become
    slightly less wrong
    4. Repeat with the next context clue

    View Slide

  121. View Slide

  122. View Slide

  123. View Slide

  124. View Slide

  125. Learning to predict Learning to represent

    View Slide

  126. View Slide

  127. we can measure distances!

    View Slide

  128. Context clues trained: 1
    topics
    discerning
    masked
    sweets
    carmelized
    shelly
    cue
    prepare
    “amazing” as focus word: 0
    cheerful
    succulent
    adjusting
    pop
    antenna
    suggesting
    vinegary
    brothers
    “server” as focus word: 0
    ignorant
    sop
    refrigerators
    bags
    recliner
    introduce
    covered
    petco
    “spaghetti” as focus word: 1

    View Slide

  129. Fast forward…

    View Slide

  130. Context clues trained: 2,000,000
    awesome
    delicious
    super
    here
    )
    $
    customer
    excellent
    “amazing” as focus word: 1,854
    thru
    along
    crab
    tacos
    /
    windows
    chef
    1
    “server” as focus word: 780
    dollar
    rings
    loves
    =
    opened
    wrapped
    form
    provided
    “spaghetti” as focus word: 84

    View Slide

  131. Fast forward…

    View Slide

  132. Context clues trained: 100,000,000
    incredible
    awesome
    outstanding
    excellent
    phenomenal
    fabulous
    superb
    fantastic
    “amazing” as focus word: 87,864
    waiter
    waitress
    bartender
    hostess
    guide
    technician
    cashier
    barista
    “server” as focus word: 48,492
    risotto
    veal
    katsu
    goat
    turkey
    enchilada
    raspberry
    meatloaf
    “spaghetti” as focus word: 3,600

    View Slide

  133. View Slide

  134. View Slide

  135. bun american + mexican ⇡ tortilla
    AAACLXicdZBLSwMxEMez9V1fVY9egkUQxLIrovZWEMGTKFgV2lKy6bQNzSZLMistpZ/Ii19FD4IPvPo1TNsVfA4E/vnNTCbzD2MpLPr+k5eZmJyanpmdy84vLC4t51ZWL61ODIcy11Kb65BZkEJBGQVKuI4NsCiUcBV2job5qxswVmh1gb0YahFrKdEUnKFD9dxxFaGL/TBRg+xOdnxhERhX4Mh2SiLojkGVxbHR3RSjNiikZIN6Lu8Xin5Q3A/obxEU/FHkSRpn9dxDtaF5EoFCLpm1lcCPsdZn7j0uwc1JLMSMd1gLKk4q9yVb64/WHdBNRxq0qY07CumIfu1wC1jbi0JXGTFs25+5IfwrV0mweVjrCxUnCIqPBzUTSVHToXe0IQxwlD0nGDfC/ZXyNjOMo3M460z43JT+L8q7hWIhON/Ll05TN2bJOtkgWyQgB6RETsgZKRNObsk9eSYv3p336L16b+PSjJf2rJFv4b1/AGJDqk4=
    AAACLXicdZBLSwMxEMez9V1fVY9egkUQxLIrovZWEMGTKFgV2lKy6bQNzSZLMistpZ/Ii19FD4IPvPo1TNsVfA4E/vnNTCbzD2MpLPr+k5eZmJyanpmdy84vLC4t51ZWL61ODIcy11Kb65BZkEJBGQVKuI4NsCiUcBV2job5qxswVmh1gb0YahFrKdEUnKFD9dxxFaGL/TBRg+xOdnxhERhX4Mh2SiLojkGVxbHR3RSjNiikZIN6Lu8Xin5Q3A/obxEU/FHkSRpn9dxDtaF5EoFCLpm1lcCPsdZn7j0uwc1JLMSMd1gLKk4q9yVb64/WHdBNRxq0qY07CumIfu1wC1jbi0JXGTFs25+5IfwrV0mweVjrCxUnCIqPBzUTSVHToXe0IQxwlD0nGDfC/ZXyNjOMo3M460z43JT+L8q7hWIhON/Ll05TN2bJOtkgWyQgB6RETsgZKRNObsk9eSYv3p336L16b+PSjJf2rJFv4b1/AGJDqk4=
    AAACLXicdZBLSwMxEMez9V1fVY9egkUQxLIrovZWEMGTKFgV2lKy6bQNzSZLMistpZ/Ii19FD4IPvPo1TNsVfA4E/vnNTCbzD2MpLPr+k5eZmJyanpmdy84vLC4t51ZWL61ODIcy11Kb65BZkEJBGQVKuI4NsCiUcBV2job5qxswVmh1gb0YahFrKdEUnKFD9dxxFaGL/TBRg+xOdnxhERhX4Mh2SiLojkGVxbHR3RSjNiikZIN6Lu8Xin5Q3A/obxEU/FHkSRpn9dxDtaF5EoFCLpm1lcCPsdZn7j0uwc1JLMSMd1gLKk4q9yVb64/WHdBNRxq0qY07CumIfu1wC1jbi0JXGTFs25+5IfwrV0mweVjrCxUnCIqPBzUTSVHToXe0IQxwlD0nGDfC/ZXyNjOMo3M460z43JT+L8q7hWIhON/Ll05TN2bJOtkgWyQgB6RETsgZKRNObsk9eSYv3p336L16b+PSjJf2rJFv4b1/AGJDqk4=
    AAACLXicdZBLSwMxEMez9V1fVY9egkUQxLIrovZWEMGTKFgV2lKy6bQNzSZLMistpZ/Ii19FD4IPvPo1TNsVfA4E/vnNTCbzD2MpLPr+k5eZmJyanpmdy84vLC4t51ZWL61ODIcy11Kb65BZkEJBGQVKuI4NsCiUcBV2job5qxswVmh1gb0YahFrKdEUnKFD9dxxFaGL/TBRg+xOdnxhERhX4Mh2SiLojkGVxbHR3RSjNiikZIN6Lu8Xin5Q3A/obxEU/FHkSRpn9dxDtaF5EoFCLpm1lcCPsdZn7j0uwc1JLMSMd1gLKk4q9yVb64/WHdBNRxq0qY07CumIfu1wC1jbi0JXGTFs25+5IfwrV0mweVjrCxUnCIqPBzUTSVHToXe0IQxwlD0nGDfC/ZXyNjOMo3M460z43JT+L8q7hWIhON/Ll05TN2bJOtkgWyQgB6RETsgZKRNObsk9eSYv3p336L16b+PSjJf2rJFv4b1/AGJDqk4=

    View Slide

  136. “word2vec skip-gram negative sampling”

    View Slide

  137. No magical black box AI…
    Just context clues and some arithmetic!
    Bonus: now you know the fundamentals
    of all neural network learning

    View Slide

  138. careers.spglobal.com

    View Slide