Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Lifelong Learning CRF for Supervised Aspect Extraction

vhqviet
September 28, 2018
59

Lifelong Learning CRF for Supervised Aspect Extraction

vhqviet

September 28, 2018
Tweet

Transcript

  1. Literature review: Lei Shu | Hu Xu | Bing Liu.

    ACL 2017 - 55th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Volume 2: Short Papers), pages 148-154, Jan 2017. Nagaoka University of Technology VO HUYNH QUOC VIET ➢ Natural Language Processing Laboratory 2018 / 09 / 28 Lifelong Learning CRF for Supervised Aspect Extraction
  2. Introduction • Aspect Extraction • Identify aspect from customer reviews.

    • For example: “The battery of this camera is great” • Supervised Sequence Labeling • Using Conditional Random Field (CRF) • Lifelong Machine Learning (LML) • Uses LML to help a supervised extraction method to markedly improve its results. 2
  3. Method 3 Key idea: • Assume the model has been

    applied to many domains d1 , d2 ,…, dn and obtained their extraction result sets r1 , r2 ,…, rn . • Then apply the model on the new domain dn+1 : • Using Reliable Results (R) from the r1 , r2 ,…, rn to generate features for dn+1 • Highly frequent in one domain • Common in multiple past domains
  4. Method 4 Dependency feature: • For example: apply the model

    on a new domain dn+1 “Phone” • Leveraging the past Reliable Result “battery” • “phone” in “the battery of this phone” is likely predicted as an aspect. • Since “battery” is known as a reliable aspect. • Without any past reliable result • “signal” and “phone” in “The signal of this phone” is unlikely predicted as an aspect. • Since both “signal” and “phone” are not reliable aspect.
  5. Method 5 Dependency feature: • Dependency relation: • (type, gov

    word, gov POS tag, dep word, dep POS tag) • Generalize dependency relation and link with R (reliable result) • If a word in R, replace it with label “A”, otherwise, label “O” • For example of sentence: “The battery of this phone is great” (battery is a R) Step 1: Dependency relation: (nmod, battery, NN, phone, NN) Step 2: replace current word’s info with wildcard “*” (nmod, battery, NN, *) Step 3: Replace related word’s info with label: (nmod, A, NN, *) → applying learning algorithm
  6. Method 6 Lifelong-CRF (L-CRF) Algorithm: works in two phases •

    Training phase: • Normally trains a CRF model using the training data. • Lifelong extraction phase: • The model has extracted aspects from data in n previous domains D1 , . . . , Dn and the extracted sets of aspects are A1 , . . . , An . • Then, system is faced with a new domain data D n+1 . • Then, the model can leverage some reliable prior knowledge in A1 , . . . , An to make a better extraction from Dn+1
  7. Experiments 7 Dataset: • Labeled datasets for training/testing: reviews on

    7 products labeled with its aspect. • Unlabeled review datasets for LML: 50 diverse domains (each domain has 1000 reviews).
  8. Experiments 8 Cross-Domain: • Combine 6 labeled domain datasets for

    training and test on the 7th domain. In-Domain: • Train and test on the same domain. Evaluation Method: • Precision, Recall and F1-score as evaluation measures.
  9. Experiments 9 Compared Methods: • CRF: • Linear-chain CRF •

    All features including dependency feature but not employ LML. • CRF + R: • Treats the reliable aspect set as a dictionary. • Adds those aspects to the final result. (which are not extracted by CRF but existed in test data) • L-CRF (this paper model): • Employ LML
  10. Conclusions 11 • This paper introduced a novel idea to

    add a new capability to lifelong learning. • Continuously improving the model performance after training. • Improve the CRF model perfromance by leveragin the knowledge gained from extraction results of previous domains. • Future work: modify CRF so that it can consider previous extraction results as well as the knowledge in previous CRF models.