Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Implicit Emotion Classification with Deep Contextualized Word Representations

Jorge Balazs
October 31, 2018

Implicit Emotion Classification with Deep Contextualized Word Representations

Presentation given at WASSA 2018 (@ EMNLP2018) as part of the Implicit Emotion Shared Task (IEST).

You can read the paper here: https://arxiv.org/abs/1808.08672

And find the source code for the slides and paper (along with the solution to the task itself) here: https://github.com/jabalazs/implicit_emotion

Jorge Balazs

October 31, 2018
Tweet

Other Decks in Research

Transcript

  1. IIIDYT AT IEST 2018: IMPLICIT EMOTION IIIDYT AT IEST 2018:

    IMPLICIT EMOTION CLASSIFICATION WITH DEEP CLASSIFICATION WITH DEEP CONTEXTUALIZED WORD CONTEXTUALIZED WORD REPRESENTATIONS REPRESENTATIONS Jorge A. Balazs, Edison Marrese-Taylor, Yutaka Matsuo https://arxiv.org/abs/1808.08672 1
  2. PREPROCESSING PREPROCESSING We wanted to have a single format for

    special tokens The replacements were chosen arbitrarily 4
  3. PREPROCESSING PREPROCESSING We wanted to have a single format for

    special tokens The replacements were chosen arbitrarily Shorter replacements did not impact performance significantly 4
  4. PREPROCESSING PREPROCESSING We wanted to have a single format for

    special tokens The replacements were chosen arbitrarily Shorter replacements did not impact performance significantly Completely removing [#TRIGGERWORD#] had a negative impact in our best model. 4
  5. PREPROCESSING PREPROCESSING We wanted to have a single format for

    special tokens The replacements were chosen arbitrarily Shorter replacements did not impact performance significantly Completely removing [#TRIGGERWORD#] had a negative impact in our best model. We tokenized the data using an emoji- aware modification of the twokenize.py script. 4
  6. HYPERPARAMETERS HYPERPARAMETERS ELMo Layer Official implementation with default parameters Dimensionalities

    ELMo output = BiLSTM output = for each direction Sentence vector representation = Fully-connected (FC) layer input = FC layer hidden = FC layer output = Loss Function Cross-Entropy Optimizer Default Adam ( , , ) Learning Rate Slanted triangular schedule ( ) (Howard and Ruder ) Regularization Dropout ( a er Elmo Layer and FC hidden; a er max-pooling layer) 2018 6
  7. ENSEMBLES ENSEMBLES We tried combinations of 9 trained models initialized

    with different random seeds. Similar to Bonab and Can ( ), we found out that ensembling 6 models yielded the best results. 2016 7
  8. ABLATION STUDY ABLATION STUDY ELMo provided the biggest boost in

    performance. Emoji also helped ( ). analysis 9
  9. ABLATION STUDY ABLATION STUDY ELMo provided the biggest boost in

    performance. Emoji also helped ( ). Concat pooling (Howard and Ruder ), did not help. analysis 2018 9
  10. ABLATION STUDY ABLATION STUDY ELMo provided the biggest boost in

    performance. Emoji also helped ( ). Concat pooling (Howard and Ruder ), did not help. Different BiLSTM sizes did not improve results. analysis 2018 9
  11. ABLATION STUDY ABLATION STUDY ELMo provided the biggest boost in

    performance. Emoji also helped ( ). Concat pooling (Howard and Ruder ), did not help. Different BiLSTM sizes did not improve results. POS tag embeddings of dimension 50 slightly helped. analysis 2018 9
  12. ABLATION STUDY ABLATION STUDY ELMo provided the biggest boost in

    performance. Emoji also helped ( ). Concat pooling (Howard and Ruder ), did not help. Different BiLSTM sizes did not improve results. POS tag embeddings of dimension 50 slightly helped. SGD optimizer with simpler LR schedule (Conneau et al. ), did not help. analysis 2018 2017 9
  13. ABLATION STUDY ABLATION STUDY Dropout Best dropout configurations concentrated around

    high values for word-level representations, and low values for sentence-level representations. 10
  14. ERROR ANALYSIS ERROR ANALYSIS Confusion Matrix Classification Report anger was

    the hardest class to predict. joy was the easiest one
  15. ERROR ANALYSIS ERROR ANALYSIS Confusion Matrix Classification Report anger was

    the hardest class to predict. joy was the easiest one (probably due to an annotation artifact).
  16. ERROR ANALYSIS ERROR ANALYSIS PCA projection of test sentence representations

    Separate joy cluster corresponds to those sentences containing the “un[#TRIGGERWORD#]” pattern. 12
  17. AMOUNT OF TRAINING DATA AMOUNT OF TRAINING DATA Upward trend

    suggests that the model is expressive enough to learn from new data, and is not overfitting the training set. 13
  18. EMOJI & HASHTAGS EMOJI & HASHTAGS Number of examples with

    and without emoji and hashtags. Numbers between parentheses correspond to the percentage of examples classified correctly. 14
  19. EMOJI & HASHTAGS EMOJI & HASHTAGS Number of examples with

    and without emoji and hashtags. Numbers between parentheses correspond to the percentage of examples classified correctly. Tweets and hashtags (to a lesser extent), seem to be good discriminating features. 14
  20. EMOJI & HASHTAGS EMOJI & HASHTAGS ❤ rage , mask

    , and cry , were the most informative emoji. 15
  21. EMOJI & HASHTAGS EMOJI & HASHTAGS ❤ rage , mask

    , and cry , were the most informative emoji. Counterintuitively, sob was less informative than , despite representing a stronger emotion. 15
  22. EMOJI & HASHTAGS EMOJI & HASHTAGS ❤ rage , mask

    , and cry , were the most informative emoji. Counterintuitively, sob was less informative than , despite representing a stronger emotion. Removing sweat_smile and confused improved results. 15
  23. CONCLUSIONS CONCLUSIONS We obtained competitive results with: simple preprocessing, almost

    no external data dependencies (save for the pretrained ELMo language model), 16
  24. CONCLUSIONS CONCLUSIONS We obtained competitive results with: simple preprocessing, almost

    no external data dependencies (save for the pretrained ELMo language model), a simple architecture. 16
  25. CONCLUSIONS CONCLUSIONS We showed that: The “un[#TRIGGERWORD#]” artifact had significant

    impact in the final example representations (as shown by the PCA projection). 17
  26. CONCLUSIONS CONCLUSIONS We showed that: The “un[#TRIGGERWORD#]” artifact had significant

    impact in the final example representations (as shown by the PCA projection). This in turn made the model better at classifying joy examples. 17
  27. CONCLUSIONS CONCLUSIONS We showed that: The “un[#TRIGGERWORD#]” artifact had significant

    impact in the final example representations (as shown by the PCA projection). This in turn made the model better at classifying joy examples. Emoji and hashtags were good features for implicit emotion classification. 17
  28. FUTURE WORK FUTURE WORK Ensemble models with added POS tag

    features. Perform fine-grained hashtag analysis. 18
  29. FUTURE WORK FUTURE WORK Ensemble models with added POS tag

    features. Perform fine-grained hashtag analysis. Implement architectural improvements. 18
  30. REFERENCES REFERENCES Bonab, Hamed R., and Fazli Can. 2016. “A

    Theoretical Framework on the Ideal Number of Classifiers for Online Ensembles in Data Streams.” In Proceedings of the 25th Acm International on Conference on Information and Knowledge Management, 2053–6. CIKM ’16. New York, NY, USA: ACM. . Conneau, Alexis, Douwe Kiela, Holger Schwenk, Loïc Barrault, and Antoine Bordes. 2017. “Supervised Learning of Universal Sentence Representations from Natural Language Inference Data.” In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, 670–80. Copenhagen, Denmark: Association for Computational Linguistics. . Howard, Jeremy, and Sebastian Ruder. 2018. “Universal Language Model Fine-tuning for Text Classification.” ArXiv E-Prints. . https://doi.org/10.1145/2983323.2983907 https://www.aclweb.org/anthology/D17-1070 http://arxiv.org/abs/1801.06146 20