Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Deep Learning

Lucy
September 13, 2016

Deep Learning

Lucy

September 13, 2016
Tweet

More Decks by Lucy

Other Decks in Education

Transcript

  1. Agenda What is deep learning? Why is this important? How

    to apply deep learning to natural language processing?
  2. About Me Sihem Romdhani Software engineer at Veeva MEng in

    Computer Engineering MSc in ECE at University of Waterloo - Deep learning - Speech/image recognition
  3. What is machine learning? Training Text DOCUMENTS, IMAGES, SOUNDS, ...

    Labels Features Vectors Machine Learning Algorithm
  4. What is machine learning? Training Text DOCUMENTS, IMAGES, SOUNDS, ...

    Labels Features Vectors Features Vector New Text DOCUMENTS, IMAGES, SOUNDS, ... Predictive Model Machine Learning Algorithm Expected Label
  5. What is deep learning? image (pixel) edges object parts object

    models Input Layer Output Layer SARA layers of abstraction DEEP NEURAL NETWORK (DNN)
  6. Speech technology DeepMind beats the best in GO Nvidia to

    drive car with Drive PX 2 super computer Apple’s Siri Amazon’s Alexa
  7. Which is human? Which is machine? I’m in a love

    affair I can’t share it ain’t fair Haha I’m just playin’ ladies you know I love you. I know my love is true and I know you love me too Girl I’m down for whatever cause my love is true SONG A For a chance at romance I would love to enhance But everything I love has turned to a tedious task One day we gonna have to leave our love in the past I love my fans but no one ever puts a grasp SONG B
  8. Human language is difficult! variability kitty vs. cat ambiguity blue

    jays vs. Blue Jays vague and latent “Eating babies can be messy.”
  9. ‘CAT’ 0 0.5 0 0.2 0.1 0 0.1 0 0.3

    0 0.4 0.1 ‘KITTY’ Similarity? Embedding Embedding
  10. The cat/kitty purrs Context defines similarity Infer the meaning of

    words from the company they keep The cat/kitty hunts mice
  11. Word2Vec (Google) for learning embeddings 1. Collect large corpus of

    text 2. Randomly initialize embeddings Each word is mapped into a fixed-size embedding with random values 3. Set up prediction task For each word, pick a random word from the context and consider it as the target (label) INPUT WORD → TARGET WORD ‘Cat’ → ‘Hunt’ 4. Learn the embeddings Train a model (logistic classifier) to predict the target (neighbours of each word) Update the embedding
  12. 0 0 0 0 0 1 0 0 0 0

    ‘cat’ Embedding Predicted target Real target ‘hunt’ PREDICTION COMPARE Linear model wVcat + b Volcabulary Word2Vec Process
  13. Extracting addresses from webpages abbreviations Avenue - Ave - Ave.

    ambiguity in names main vs. Main Street missing elements mispellings misorder
  14. Solution architecture WEB PAGE (HTML) TOKENIZATION WORD REPRESENTATION/ WORD EMBEDDING

    DEEP NEURAL NETWORK CLASSIFIER POST PROCESSING OUTPUT EXTRACTED ADDRESS
  15. Solution architecture WEB PAGE (HTML) TOKENIZATION WORD REPRESENTATION/ WORD EMBEDDING

    DEEP NEURAL NETWORK CLASSIFIER POST PROCESSING OUTPUT EXTRACTED ADDRESS
  16. Solution architecture WEB PAGE (HTML) TOKENIZATION WORD REPRESENTATION/ WORD EMBEDDING

    DEEP NEURAL NETWORK CLASSIFIER POST PROCESSING OUTPUT EXTRACTED ADDRESS
  17. Solution architecture WEB PAGE (HTML) TOKENIZATION WORD REPRESENTATION/ WORD EMBEDDING

    DEEP NEURAL NETWORK CLASSIFIER POST PROCESSING OUTPUT EXTRACTED ADDRESS
  18. Solution architecture WEB PAGE (HTML) TOKENIZATION WORD REPRESENTATION/ WORD EMBEDDING

    DEEP NEURAL NETWORK CLASSIFIER POST PROCESSING OUTPUT EXTRACTED ADDRESS
  19. Training a DNN classifier 123 Main Street, Florida 32209 123

    Street Florida 32209 Main Context Words A O A: Address O: Other (non-address) Hidden Layer 3 Hidden Layer 2 Hidden Layer 1
  20. Address extraction algorithm 1. Load the trained DNN model 2.

    for each new web page do 3. The web page is pre-processed to remove the HTML tags, then tokenized 4. Each token (word) is replaced by its embedding vector 5. Create the DNN input by concatenating the embedding of the context 6. The DNN labels the central word of the context as A (ADDRESS) or O (OTHER) 7.Extract the sequence of tokens with A label 8.if the total number of tokens is within range then 9. Output the token block as extracted address 10.End if 11.End For Input: Trained DNN model and unlabeled web page Output: Extracted address
  21. Applications of NLP Information extraction 文 A Social media analytics

    Question & Answer Natural language generation Speech recognition