Upgrade to Pro — share decks privately, control downloads, hide ads and more …

論文読み会 How Large Are Lions? Inducing Distributions over Quantitative Attributes

58dd94a2fd4500043ad051cae2ceb2af?s=47 Reo
December 04, 2019

論文読み会 How Large Are Lions? Inducing Distributions over Quantitative Attributes

58dd94a2fd4500043ad051cae2ceb2af?s=128

Reo

December 04, 2019
Tweet

Transcript

  1. How Large Are Lions? Inducing Distributions over Quantitative Attributes Yanai

    Elazar, Abhijit Mahabal, Deepak Ramachandran, Tania Bedrax-Weiss, Dan Roth ACL 2019 紹介者: 平尾 礼央(TMU, B4, 小町研究室) 3 December, 2019 @論文読み会
  2. Abstract • Distribution over Quantities (DoQ) ◦ Unsupervised method for

    collecting quantitative information ◦ Objects, adjectives, and verbs • Contrasts with recent work in this area ◦ Only relative comparisons ◦ “Is a lion bigger than a wolf” • Contributions ◦ A new method for collecting expressive quantitative information ◦ A large resource of distributions over quantitative attributes ◦ Showed superior to existing datasets
  3. Introduction • Qustions ◦ How much does a lion weigh?

    ◦ How tall can they be? ◦ When do people typically eat breakfast? • Datasets ◦ Acquiring distributions over ten dimensions ▪ time, currency, length, … ◦ Nouns (e.g. elephant, airplane, NBA game), ◦ Adjectives (e.g. cold, hot, lukewarm) ◦ Verbs (e.g. eating, walking, running) ◦ It can be extended to other languages easily
  4. Distribution over Quantities: Method (1) • Measurement Identification and Normalization

    ◦ Do not extract sentences that are not recognized by the parser ◦ Data contains some typo (such as “17 C” where Centigrade is meant) ◦ “inch” = 0.02524 meters, “acre foot” = 1233.48, ... • Object Collection ◦ 1-token words and more complex phrases (e.g. noun phrases) (“race car”, “electric car”) ◦ Also retrieve its syntactic head (compare a “fast car” to a ‘car’) ◦ Collect the objects that co-occur within a certain context window ◦ Processed billions of English webpages
  5. Distribution over Quantities: Method (2) • De-noising ◦ Distance Based

    Co-Occurrences (within the same sentence or a token distance k) ◦ Simply discard all negation (“The dimension of the car is not 50cm.”) • Distribution over Quantities Statistics
  6. Evaluation Data Commonsense Property Comparison(1) • ORIG F&C dataset ◦

    Labeled the typical relation between two objects along five dimensions ▪ SIZE, WEIGHT, STRENGTH, RIGIDITY and SPEED ▪ whether the first object was typically greater than, lesser than, or equal to the second ◦ 47% of the annotations were not comparable ▪ Broad objects: e.g. (father, clothes, big) ▪ Abstract objects: e.g. (seal, place, big) ▪ Ill-defined dimension: e.g. (friend, bed, strong) ◦ Leakage ▪ 8% of transitivity leakage ((o1, o2, d) and (o2, o3, d) in train, (o1, o3, d) in dev/test) ▪ 95% of object leakage (same object in train and dev/test)
  7. Evaluation Data Commonsense Property Comparison(2) • NO-LEAK F&C (8,209 pairs)

    ◦ Formed new splits of train/dev/test (Table 2) • CLEAN F&C (2,964) ◦ Re-annotated the dataset due to the ill-defined comparison ◦ Used three crowd-source workers • New Data (+4,773) ◦ Created new dataset because F&C dataset became small after filtering ◦ Only used as a test set • The Relative Size Dataset ◦ 486 object pairs between 41 physical objects
  8. Evaluation Data Scalar Adjectives & Intrinsic Evaluation • De Melo

    and Bansal (2013) ◦ Used adjective clusters based on the “dumbbell” structure of adjectives in WordNet ◦ e.g. “cold < frigid < frozen” • Wilkinson and Oates (2016) ◦ Created another testset, by defining a total order between adjectives in the same cluster, spanning the entire scale range ◦ e.g. “minuscule < tiny < small < big < large < huge < enormous < gigantic” • Removed all of the non-measurable clusters ◦ e.g. “known< famous < legendary” • Intrinsic Evaluation ◦ Expanded the median of the distribution given an object and a dimension into a range and ask human raters whether this range overlaps with the range of the target object-dimension pair ◦ e.g. median of object “car” is 99.7 km/h -> 10-100km/h?
  9. Experimental Results (1) • Noun Comparison ◦ F&C Clean ▪

    Lower than Yang et al. (2018) because of ▪ fine-tune on a train set ▪ information through pre-trained word embeddings ◦ New Data ▪ Better in the new data but lower than F&C overall ◦ RELATIVE ▪ State-of-the-art result with k=10
  10. Experimental Results (2) • Adjective Comparison ◦ Achieve good results

    on the full range scale of Wilk-all ◦ There was no extreme difference in the errors ◦ Good at differentiating between the adjectives on the two tips of the scale • Intrinsic Evaluation ◦ The total agreement is 69% ◦ Re-annotated the currency (difference between India and US)
  11. Discussion Reporting Bias and Exaggeration • The temperature is exaggerated

    higher than actually measured • Since they collected data from an English website, the seasonal temperature is in the Northern Hemisphere • The weight of alfalfa and watermelon are very different. This bias is because alfalfa is shipped in tons • There is also a bias due to polysemy.
  12. Conclusion • Distribution over Quantities (DoQ) ◦ Unsupervised method for

    collecting quantitative information ◦ Objects, adjectives, and verbs • Contributions ◦ A new method for collecting expressive quantitative information ◦ A large resource of distributions over quantitative attributes ◦ Showed superior to existing datasets