TokyoCL-2016-10-05

Normative Analysis of Reading Comprehension Tasks + etc. Saku Sugawara
(NII Aizawa-lab, Univ. Tokyo) TokyoCL #6 (NII) October 5, 2016

ࠓ೔ͷ࿩ ϙΤϜ + ࣗ෼ͷݚڀ + म࿦ͷશମ૾ͷߏ૝ ࡶͳٞ࿦ ໨࣍ 1. ʮݴޠཧղʯʹ͍ͭͯ
2. ʮಡղλεΫʯʹ͍ͭͯ 3. ʮݴޠཧղʹཁٻ͞ΕΔجૅతͳೳྗʯʹ͍ͭͯ 4. ʮఆٛͨ͠ೳྗʹΑΔಡղλεΫͷΞϊςʔγϣϯʯʹͭ ͍ͯ 5. ࠓޙ 2 / 80

1. Natural Language Understanding??? 3 / 80

͜ͷ section ͰͲͷΑ͏ͳ࿩Λ͢Δ͔ Reading comprehension ͱ͍͏λεΫܗࣜΛʢNLP ͷશମతͳ ໨తͷதʹʣͲͷΑ͏ʹҐஔ͚ͮΔ͔ 1. େ໨ඪ͸Կ͔
2. ୹ظత໨ඪͱͯ͠ͲͷΑ͏ͳλεΫܗࣜΛఆΊΔ͔ 3. ͦͷλεΫͷ੍໿͸ԿͰ͋ΓɺͲͷΑ͏ͳൃలɾ֦ு͕ݟࠐ ΊΔ͔ 4 / 80

େ໨ඪ: ʮݴޠཧղʯʹؔ͢Δલஔ͖ Կ͕Ͱ͖Ε͹ݴޠΛཧղ͍ͯ͠Δͱݴ͑Δ͔ʁ தࠃޠͷ෦԰ʁ νϡʔϦϯάςετʁ ఆٛ͸໨తʹΑͬͯҟͳΔɺͱݴ͑Δ ໨త࿦ऀͳͷͰڐ͍ͯͩ͘͠͞ ʢ͜͜ʹதࠃޠͷ෦԰ͷը૾ʣ 5 /
80

େ໨ඪ: ҰํͰզʑ (ࢲ) ͸ԿΛ࣮ݱ͍͔ͨ͠ʁ ʮ஻ΕΔϩϘοτʯ ʁ ʮࣗ཯ͨ͠ΤʔδΣϯτʯ ʁ Ͳ͜ͰͲͷΑ͏ʹͲͷΑ͏ͳ໨తͰʁ Subsumption
architecture? യવͱͨ͠໨తͰ͸ઃܭ͢Δਓ͕ࠔΔ ࠷ۙͷྲྀߦΓͰΑࣖ͘ʹ͢Δ໰୊఺ ͱʹ͔͘ʮίϛϡχέʔγϣϯ͕ͱΕΔ͜ͱʯ ʁ ʢ͜͜ʹϩϘοτͷը૾ʣ 6 / 80

ͲͷΑ͏ʹ໨ඪΛఆٛ͢Δʁ ݱ࣌఺Ͱ͸γεςϜ͕ৼΔ෣͏ۭؒΛݶఆ͍ͨ͠ ͨͩ͠ཁٻ͞ΕΔೳྗ͸ͳΔ΂͘൚༻తͰ͋ͬͯ΄͍͠ ͋͘·ͰυϝΠϯ΍ϞμϦςΟͷݶఆͰ͋ͬͯ΄͍͠ → ςΩετͷΈ → จষΛಡΊΔΑ͏ʹͳΕ͹Α͍ʁ ʢඈ༂ʣ 7
/ 80

ݴޠߦҝΛݶఆ͢Δ ͍Ζ͍Ζͳݴޠߦҝ͕ఆٛͰ͖Δ ࣭໰ɺԠ౴ɺఏҊɺ໋ྩɺಠݴɺ http://plato.stanford.edu/entries/speech-acts/ ͜͜Ͱ͸Ԡ౴ʹݶఆ͢Δ ʢ༨ஊʣ Stanford Encyclopedia of Philosophy
(SEP) ͸ݴޠֶɾݴޠ఩ ֶܥͷ༻ޠͷղઆ͕ཉ͍͠ͱ͖ʹ݁ߏศརͰ͢ SEP ͷ Computational Linguistics [url] ͷ߲͸ Lenhart Schubert [url] ͱ͍͏͓͡͞Μ͕ॻ͍͍ͯΔ Semantic Representation (AAAI2015) [pdf] What kinds of knowledge are required for genuine understanding? (IJCAI2015 WS) [pdf] ͍Ζ͍Ζ֓؍ΛಘΔͷʹྑ͍͔΋͠Εͳ͍Ͱ͢ 8 / 80

ͱ΋͔͘ಡղλεΫͱͯ͠ཁ݅Λݶఆ͍ͯ͘͠ ʮจষΛಡΉ͜ͱʯΛϕʔεʹݴޠཧղΛఆٛ͢Δ ೖग़ྗ͸ςΩετͷΈ Ϣʔβʔͱڞ༗͞Εͨจষʢจ຺ʣʹ͍ͭͯɺ࣭໰ʹ౴͑ͯ ΋Β͏ ͜ͷʮจ຺ʯΛ࣌ؒతɾۭؒతʹ֦ு͢Ε͹͞Βʹ࿩͸޿͕Δ จ຺֎͸ࢀর͠ͳ͍ɺͨͩ͠ҰఆͷৗࣝΛલఏͱ͢Δ 9 / 80

ཧ૝తʹ͸ ֎քΛ஌֮ͯ͠΄͍͠ ͔͠͠஌֮ث׭͕ͳ͍ ຊདྷͳΒɺਓؒͷݴޠ֫ಘͷ໛฿͔Β΍Δ΂͖Ͱ͸ʁ ͋Δ͍͸ɺਓؒͷࣦޠ঱ςετʹྨ͢Δ΋ͷΛΛ΍Δ΂͖ Ͱ͸ʁ ຊ࣭తͳ஌ೳ͸ IQ ςετ΍ WAIS-III
ͷΑ͏ͳࢼݧΛ՝͢΂ ͖Ͱ͸ʁ ࠓճͷ࿩Λࢹ֮৘ใ·Ͱ֦ு͢Δɺͱ͍͏ͷ͸͓ͦΒ͘Մೳ 10 / 80

ਓ͸ʮ͋ΒΏΔ՝୊͸˓˓ͱ͍͏ܗࣜʹؐݩͰ ͖Δʯͱݴ͏͜ͱ͕͋Δ͕ɺͦΕͰ͍͍ͷͩΖ ͏͔ͱ͍͏࿩ ʮRTE/QA/຋༁ ʹؐݩͰ͖Δʯ → ʮೖྗɾ಺෦ঢ়ଶɾग़ྗʯͰ͍͍ͱࢥ͏ → Input /
State / Action ୯ͳΔؔ਺ͱ͸ݴ͑ͳ͍ɺঢ়ଶͷอଘ͕͋Δ ೖྗ͸ͳ͍͜ͱ͕͋Δɺ಺෦ঢ়ଶ͸࣌ؒతʹભҠ͢Δɺग़ྗ ͸ͳ͍͜ͱ͕͋Δ Context ͸ʮ஌֮Մೳͳڞ༗͞Εͨ৘ใʯ΍ʮ࣌ؒతɾۭؒత ʹͦͷ৔͔Β཭Εͨ఺Ͱ֫ಘ͞Εͨ৘ใʯΛؚΉɺ΋ͬͱݴ ͑͹ʮજࡏతʹ֫ಘͰ͖Δ৘ใʯ΋ؚΉ ͜͜Ͱ͸ਓؒͷݴޠ׆ಈʹ͍ͭͯड़΂͍ͯ·͢ 11 / 80

࣮༻తͳྫ Input: ࣭໰ʮ౦ژӺ͔ΒਆอொӺ·Ͱͷܦ࿏Λڭ͑ͯʯ Context: ܦ࿏ਤ΍࣌ࠁදͷσʔλɺܭࢉ Action: ฦ౴ʮ౦ژ͔Βؙϊ಺ઢͰେखொ΁ɺେखொ͔Β൒ଂ ໳ઢͰਆอொ΁ʯ Input: ࣭໰ʮ͜ͷը૾ʹࣸͬͯΔೣͷ඼छ͸Կʁʯ
Context: ը૾ͷೝࣝɾॲཧ Action: ฦ౴ʮࡾໟೣʯ ඇ࣮༻తʹ͸ɺϢʔβʔ͕ҙਤ͠ͳ͍ೖྗΛ͢Δͱಥવ૽͗ ग़͢ϑΝʔϏʔਓܗͳͲ Pepper ͘ΜͬͯಠΓݴΛݴ͏ΜͰ͔͢Ͷ 12 / 80

Կ͕ݴ͍͍͔ͨ ʮಡղͱ͍͏λεΫ͕ɺཧ૝తͳ໨ඪʹରͯ͠ɺͲͷΑ͏ͳ ੍໿Λ͔࣋ͭʯΛҙࣝ͢ΔͱɺΑΓൃలతͳܗ͕ࣜߟ͑΍͢ ͘ͳΔʁ ۠ผ͢ΔͨΊͷཁૉ͸;ͨͭ: ʮೖग़ྗɾ಺෦ঢ়ଶʢจ຺ʣ ʯ ʮ৘ใͷഔମɾͦͷ࣌ؒ/ۭؒత෯ɾ಺༰ʯ ೖग़ྗ: ςΩετͷΈɺ࣌ؒతɾྔతʹมԽ
จ຺తঢ়ଶ: ςΩετͷΈɺ࣌ؒతɾྔతʹݻఆ ೖग़ྗ͸࣭໰Ԡ౴ʢͱ͍͏ݴޠߦҝʣͷΈ ࢀ༩ऀ͕ར༻͢Δจ຺͸ڞ༗͞Ε໌ࣔత ؾ࣋ͪ Media /modality Time/space Contents/type Input/Output text variable question answering State/Context text invariable shared/explicit 13 / 80

ͦ͏͢Δͱྫ͑͹ ೖग़ྗͱจ຺͕ࢹ֮৘ใʢจࣈ৘ใ͔Ͳ͏͔ɺγϯϘϧΛݟ ͤΔ͔ɺͳͲʣ → ϩϘοτͬΆ͍࿩ ೖग़ྗ͸ݴޠͰɺจ຺͕ࢹ֮৘ใ → ࣌ؒత޿͕Γ͕͋Ε͹ಈըɺͳ͚Ε͹ը૾ʹ͍ͭͯͷ QA ೖग़ྗͷݴޠߦҝΛ࣭໰Ԡ౴͔Β֦ு
→ ࡶஊ΍ɺػցૢ࡞ͷΠϯλʔϑΣʔεͳͲ จ຺͸ݴޠ৘ใ͕ͩɺshared(?) / implicit ʹ͢Δ → ωοτ্ͷ஌ࣝ΍ࠓ೔ͷχϡʔεʹ͍ͭͯ࿩Λ͢ΔͳͲ 14 / 80

2. Reading Comprehension Tasks 15 / 80

͜ͷ section ͰͲͷΑ͏ͳ ֓؍ ໰୊఺ λεΫઃܭͷํ਑ΛཱͯΔͨΊʹԿ͕ඞཁ͔ 16 / 80

Background: Reading Comprehension ՝୊จͱͦΕʹؔΘΔ໰͍ΛಡΈɺԿΒ͔ͷ৘ใΛฦ͢ ݀ຒΊɺ ՝୊จ → બ୒ࢶ or ൈ͖ग़͠
ͳͲͷܗࣜ 17 / 80

Background: Reading Comprehension Figure: طଘλεΫྫ (Rajpurkar+ 2016) 18 / 80

Background: Big Data vs. Realistic ࣗಈతʹ࡞Δͱͨ͘͞Μσʔλ͕Ͱ͖Δ΋ͷͷɺ࣭͕ո͍͠ CNN/Daily Mail (Hermann+ 2015)
SQuAD (Rajpurkar+ 2016) [url] ਓखͰσʔλΛ࡞Ζ͏ͱ͢ΔͱͲ͏ͯ͠΋খ͘͞ͳΔ MCTest (Richardson+ 2013) [url]: 660*4 questions ProcessBank (Berant+ 2014) [url]: 585 questions ਓखͰൺֱత࣭ͷྑ͍σʔλΛͨ͘͞Μ࡞ͬͨྫʁ LAMBADA (Paperno+ 2016) [url] dev+test Ͱ 10K passage (=question) ͘Β͍ 19 / 80

CNN/Daily Mail Dataset (DeepMind QA Dataset) CNN ΍ Daily Mail
ͷهࣄλΠτϧ΍ݟग़͕֘͠౰Օॴͷཁ໿ ʹͳ͍ͬͯΔ λΠτϧ΍ݟग़͠ͷ entity ෦෼Λ݀ʹͯͦ͠Ε͕Կ͔Λ౴͑ ͤ͞ΔλεΫ هࣄ͸ͨ͘͞Μ͋ΔͷͰͨ͘͞Μ࡞ΕΔ (context, query, answer) Ͱ 1 ୯Ґ هࣄ಺༰ʹ౴͕͑ग़ͯ͜ͳ͍Α͏ͳ΋ͷ͸࡞Βͳ͍ 20 / 80

CNN/Daily Mail Dataset: Example 21 / 80

Analysis - classiﬁer model - ablation ؆୯ͳ໰୊͕ଟ͍ʁ 22 / 80

SQuAD (2016) Reading comprehension dataset 100,000+, from Wikipedia articles, by
crowdworkers Human performance: 86.8% Baseline (Logistic Regression): 51.0% using features of dependency trees 23 / 80

Example - SQuAD 24 / 80

Example - SQuAD - dev set (v1.1) - cloze Super
Bowl 50 was an American football game to determine the champion of the National Football League (NFL) for the 2015 season. The Amer- ican Football Conference (AFC) champion Denver Broncos defeated the National Football Conference (NFC) champion Carolina Panthers 24 10 to earn their third Super Bowl title. The game was played on February 7, 2016, at Levi’s Stadium in the San Francisco Bay Area at Santa Clara, California. 1. Which NFL team represented the AFC at Super Bowl 50? Denver Broncos (177) 2. Where did Super Bowl 50 take place? Santa Clara, California (403), Levi’s Stadium (355), Levi’s Stadium in the San Francisco Bay Area at Santa Clara, California. (355) 3. Which NFL team won Super Bowl 50? Denver Broncos (177) 25 / 80

MCTest (2013) Reading comprehension dataset Manually written texts by crowdworkers
Contextual passages and multiple choice questions 660 tasks, and each task has 4 questions Human performance: better than 95% Baseline: 50-60% Bast score: around 70% 26 / 80

Example - MCTest - multiple choice James the Turtle was
always getting in trouble. One day, James went to the grocery store and pulled all the pudding oﬀ the shelves and ate two jars. Then he walked to the fast food restaurant and ordered 15 bags of fries. Q1: What is the name of the trouble making turtle? A1: James Q2: Where did James go after he went to the grocery store? A2: a fast food restaurant 27 / 80

Example - MCTest What we have to understand to answer
them...? “James the Turtle ...” paraphrase of the subject (James = the Turtle) “the Turtle was always getting in trouble” = “the trouble making turtle” “James” = “the name”: knowledge “The grocery store” and “the fast food restaurant” are places Temporal relation: James went to Place1. After that, he went to Place2. Pick the key information from the context sentences etc... 28 / 80

LAMBADA (2016, ACL) n-gram Ͱղ͚Δ໰୊Λഉআ context passage & cloze test
10k+ questions from 1300+ novels 29 / 80

Example - LAMBADA - cloze He shook his head, took
a step back and held his hands up as he tried to smile without losing a cigarette. “Yes you can,” Julia said in a reassuring voice. “I’ve already focused on my friend. You just have to click the shutter, on top, here.” Query: He nodded sheepishly, through his cigarette away and took the ʻʼ. camera 30 / 80

Big Data vs. Realistic: ࠶ܝ ࣗಈతʹ࡞Δͱͨ͘͞Μσʔλ͕Ͱ͖Δ΋ͷͷɺ࣭͕ո͍͠ CNN/Daily Mail (Hermann+ 2015)
SQuAD (Rajpurkar+ 2016) [url] ਓखͰσʔλΛ࡞Ζ͏ͱ͢ΔͱͲ͏ͯ͠΋খ͘͞ͳΔ MCTest (Richardson+ 2013) [url]: 660*4 questions ProcessBank (Berant+ 2014) [url]: 585 questions ਓखͰൺֱత࣭ͷྑ͍σʔλΛͨ͘͞Μ࡞ͬͨྫʁ LAMBADA (Paperno+ 2016) [url] dev+test Ͱ 10K passage (=question) ͘Β͍ 31 / 80

CNN/DailyMail in ACL2016 Chen+, A Thorough Examination of the CNN/Daily
Mail Reading Comprehension Task CNN/Daily Mail ͸΄ͱΜͲ಄ଧͪ: σʔλʹϊΠζ͕ଟ͍ɺ ࣗಈͰ࡞Εͨͷ͸ྑ͍͚Ͳ࣭΋େࣄʢຊ౰ʹಡղతͳਪ࿦Λ ଌΔͷʹ໾ཱͭͷ͔ʁʣ ͜͏͍͏σʔληοτΛ൱ఆ͢Δඞཁ͸ͳ͘ɺΑΓ realistic ͳσʔληοτͷͨΊͷֶशσʔλͱͯ͠׆͔ͤΔ͸ͣ 32 / 80

CNN/DailyMail - analysis 33 / 80

Reading comprehension tasks Ͱ ߟ͑Δඞཁͷ͋Δ͜ͱ ࣭Λҡ࣋͢Δ͜ͱʢղ౴ՄೳͰ͋Δ͜ͱʣ ͋Δఔ౓ͷ೉͠͞Λඋ͑ɺจ຺ͷཧղΛඞཁͱ͢Δ͜ͱ ෳ਺ͷจΛࢀর͢Δ͜ͱ simple rules
Ͱղ౴Ͱ͖ͳ͍͜ͱ ੑೳͷධՁ͕༰қʹͰ͖Δ͜ͱ ೉қ౓ͷ෼ྨɾඞཁͳೳྗͷ෼ྨ 34 / 80

ʢ୤ઢʣ Simple questions ͕Α͍ͷ͔ʁ 1. Tom is a student. Q:
Is Tom a student? A: Yes 2. Descartes was born in 1596. Q: When was Descartes born? A: 1596 (External knowledge is not necessary here) (These questions seem to be solved by simple rules) (They may not be useful to test more intelligent systems) → must be not too simple, but a little easier (and not diﬃcult). 35 / 80

Moderately easy??: bAbI tasks [Weston+ 2015] Task: Three Supporting Facts
John picked up the apple. John went to the oﬃce. John went to the kitchen. John dropped the apple. Where was the apple before the kitchen? A: oﬃce ✓ Easy (for human) ✓ Multiple sentences ✓ Automatically generated × Small vocabulary × Simple and unnatural sentences 36 / 80

bAbI tasks [Weston+ 2015] - Task categories Single, two, or
three supporting facts Two or three argument relations Yes/no questions Counting and lists/sets Simple negation and indeﬁnite knowledge Basic Coreference ,conjunctions and compound coreference Time reasoning Basic decuction and induction Positional and size reasoning Path ﬁnding Agent’s motivations 37 / 80

bAbI tasks [Weston+ 2015] - more examples Task: Basic Coreference
Daniel was in the kitchen. Then he went to the studio. Sandra was in the oﬃce. Where is Daniel? A:studio Task: Conpound Coreference Daniel and Sandra journeyed to the oﬃce. Then they went to the garden. Sandra and John travelled to the kitchen. After that they moved to the hallway. Where is Daniel? A: garden 38 / 80

bAbI tasks [Weston+ 2015] - more examples Task: Time Reasoning
In the afternoon Julie went to the park. Yesterday Julie was at school. Julie went to the cinema this evening. Where did Julie go after the park? A:cinema Where was Julie before the park? A:school Task: Basic Induction Lily is a swan. Lily is white. Bernhard is green. Greg is a swan. What color is Greg? A:white 39 / 80

What are problems in bAbI tasks? In order to construct
more sophisticated tasks, we want to follow the formulation of bAbI. However the problems are: 1. Low expressibility for linguistic expressions the tasks are not constructed in terms of “grammar” its vocabulary can not be extended easily → Re-organize tasks related on linguistic/grammatical elements 2. Skills required in the tasks being not enough/suﬃcient e.g. analogy, discourse relations, entailments... → Add skills in the reference of other existing tasks 40 / 80

Problem?? We/researchers try to develop a model to solve tasks/datasets
listed. However: error analysis is diﬃcult and complicated... What is a main cause of error? What does (not) a system understand? Can we distinguish if a system fortunately outputs a correct answer? ͋Δ໰͍ʹۮવਖ਼ղ͍ͯ͠ͳ͍͔ΛݟۃΊ͍ͨ গ͠ม͑ͨઃఆͰ͸ؒҧ͑Δ͜ͱ͕͋Δ͔΋͠Εͳ͍ 41 / 80

͜͜·Ͱ λεΫͷ঺հ λεΫΛͲ͏΍ͬͯઃܭ͢Δ͔͕໰୊ Τϥʔ෼ੳ͕Ͱ͖ͳ͚Ε͹࢓ํͳ͍ ໰͍ʹ೉қ౓΍ඞཁͳೳྗͷ۠ผ͕͋Δͱྑ͍ 42 / 80

3. Prerequisite Skills for Natural Language Understanding 43 / 80

΍Δ͜ͱͷ֓ཁ ໨త: ݴޠཧղʹඞཁͳཁૉ΍ೳྗΛ෼ྨ͢Δ 0. ݴޠཧղͬΆ͍΋ͷΛ෼ղͯ͠ߟ͑Δ ͻͱͭͷࣄ৅ͷදݱ (A) + ࣄ৅૬ޓͷؔ܎ੑ (B)
1. (A) จ๏ཁૉͱޠኮ಺༰ʹ෼͚ͯɺจ๏ཁૉΛ੔ཧ͢Δ 2. (B) ඞཁͬΆ͍ೳྗΛطଘλεΫ͔Βྻڍ 44 / 80

Overview of natural language understanding We naively assume that readers
follow the ﬁgure below: NL Understanding Facts Chaining Simple Sentence Understanding Logical Reasoning Entailment ... Grammatical Meanings Lexical Meanings → Each task should be intended to test each grammatical meaning and chaining skill?? 45 / 80

1. List of grammatical elements Table: Grammatical elements; ∗ =
appearing in the bAbI tasks; †N = appearing in our extra-task N (later). Word class etc. Elements Common nouns number†1, deﬁniteness Personal pronouns case†2, person∗, gender∗, number∗ Determinatives demonstrative, distributive†3, degree†4, etc. Adjectives attributive or predicative, comparison∗ Verbs voice∗, modality†5, tense-aspect†6, type Prepositions temporal∗, spatial∗, etc. Adverbs common adverbs, sentence adverbs Conjunctions coordinates∗, subordinates†7,†8 Sentence mood imperative, subjunctive Interrogative wh-∗, how∗, y/n∗ 46 / 80

2. List of prerequisite skills 47 / 80

ϙΠϯτ અؒͷؔ܎Λೝࣝ͢Δೳྗͱͯ͠ఆٛ͞ΕΔ ՝୊จʢจ຺ʣͱ࣭໰จʹ͍ͭͯɺղ౴͢ΔͨΊʹࢀর͢Δ จ຺͕୯จͰࡁΉ৔߹ʹ͸ɺ ʮଞͷઅͱؔ܎Λ࣋ͨͳ͍ʯͱ͍ ͏ҙຯͰೳྗΛཁٻ͠ͳ͍ͱఆٛ͢Δ ྫ: ݴ͍׵͕͑ඞཁʹͳΔ৔߹͸μϝʢ͋ͱͰΞϊςྫΛࣔ ͠·͢ʣ 48
/ 80

ʮલఏͱͯ͠ཁٻ͞ΕΔʯೳྗʁ গͳ͘ͱ΋͜Ε͸ಠཱͨ͠ೳྗͱͯ͠ඞཁͩΖ͏ɺͱ͍͏ߟ ͑ํ ͨͩ͋͘͠·Ͱ΋͜Ε͸ʮNLP ͱͯ͠ʯ఻౷తͳೳྗͷఆٛ ͱྨࣅతͰ͋Γɺ ʮਓ͕ؒຊ౰ʹ͜ͷΑ͏ͳೳྗΛൃش͍ͯ͠ Δ͔ʯͱ͍͏఺ʹ͸ٙ໰͕࢒Δ ຊདྷతʹ͸ΑΓࡉ෼Խ͞Εͨݸʑͷૢ࡞ɺ͋Δ͍͸ΑΓҰൠ Խ͞Εͨೝ஌աఔ͕ɺ
ʮςΩετ্ͷৼΔ෣͍ͱͯ͠ʯ͜͜Ͱ ఆٛͨ͠ೳྗΛൃش͍ͯ͠Δɺͱߟ͑Δ 49 / 80

Կ͕ྑͦ͞͏͔ ඞཁͱ͞ΕΔೳྗͷ਺ʹରԠͯ͠໰୊ͷ೉қ౓ΛఆٛͰ͖Δ σʔληοτʹ͓͍ͯඞཁͱ͞ΕΔೳྗͷ෼෍Ͱͦͷσʔλ ηοτͷ܏޲Λઆ໌Ͱ͖Δ ͜Εʹج͍ͯγεςϜͷ։ൃΛਐΊΔ͜ͱ͕Ͱ͖Δ 50 / 80

4. Annotations of Reading Comprehension Tasks 51 / 80

Annotation MCTest Λର৅ʹ͢Δ ࢠڙ޲͚ͷจষɺcrowdworker ͕จষͱ໰͍Λ࡞੒ multiple choice 120+200 questions Λ໨ࢹͰ
200 ͷ΄͏ͷҰக཰͸ 85% multi-labeling ඞཁͱ͠ͳ͍໰͍͸ɺ౴͕͑ͦͷ··ॻ͍ͯ͋Δจ͕ଘࡏ ͢Δ 52 / 80

Additional annotations ʮৗࣝʯͷ 3 ෼ྨͱͦͷΞϊςʔγϣϯ SQuAD (2016) ʹ΋Ξϊςʔγϣϯͯ͠ɺͦͷ݁ՌΛ MCTest ͱൺֱ
53 / 80

Toy example 54 / 80

Example 1 ID: MC160.dev.1 (3) one: C1: Sally had a
very exciting summer vacation. C2: She went to summer camp for the ﬁrst time. C3: Sally’s favorite activity was walking in the woods because she enjoyed nature. Q: Why does Sally like walking in the woods? A: She likes nature. Coreference resolution: · she in C3 = Sally in C3 Causal relation: · she enjoyed nature in C3 → Sally’s favorite activity was walking in the woods in C3 Commonsense reasoning: · Sally’s favorite activity was walking ... in C3 ⇒ Sally likes walking in the woods ... in Q · enjoyed nature in C3 ⇒ likes nature in A 55 / 80

Example 2 ID: MC160.dev.29 (1) multiple: C1: The princess climbed
out the window of the high tower and climbed down the south wall when her mother was sleeping. C2: She wandered out a good ways. C3: Finally she went into the forest where there are no electric poles but where there are some caves. Q: Where did the princess wander to after escaping? A: Forest Coreference resolution: · She in C2 = the princess in C1 · She in C3 = the princess in C1 Temporal relation: · the actions in C1 → wandered out ... in C2 → went into ... in C3 56 / 80

Example 2 ID: MC160.dev.29 (1) multiple: C1: The princess climbed
out the window of the high tower and climbed down the south wall when her mother was sleeping. C2: She wandered out a good ways. C3: Finally she went into the forest where there are no electric poles but where there are some caves. Q: Where did the princess wander to after escaping? A: Forest Complex sentence: · C1 = the princess climbed out ... and [the princess] climbed down ... Commonsense reasoning: · escaping in Q ⇒ the actions in C1 · wandered out in C2 and went into the forest in C3 ⇒ wander to the forest in Q and A 57 / 80

Annotation result 1 58 / 80

ৗࣝਪ࿦ͷ෼ྨ Lexical knowledge ࣙॻతͳ஌ࣝ΍ΧςΰϦʔ Qualitative knowledge ఆੑతͳؔ܎ɺ҉໧ͷҼՌؔ܎ Known facts Named
entity ͳͲʹؔΘΔ໌ࣔతͳ஌ࣝ 60 / 80

ଞͷλεΫͱͷൺֱ SQuAD ʹରͯ͠Ξϊςʔγϣϯ ࠓͷͱ͜Ζࢼݧతʹ 80 questions / 7 passages 62
/ 80

Example 3 ID: Civil disobedience, paragraph 1, question 1 C1:
One of its earliest massive implementations was brought about by Egyptians against the British occupation in the 1919 Revolution. C2: Civil disobedience is one of the many ways people have rebelled against what they deem to be unfair laws. Q: What is it called when people in society rebel against laws they think are unfair? A: Civil disobedience Coreference resolution: · they in C2 = people in C2 (diﬀerent clauses) · they in Q = people in Q (diﬀerent clauses) Temporal relation: · people have rebelled... in C2 → when people in society rebel... in Q 63 / 80

One of its earliest massive implementations was brought about by Egyptians against the British occupation in the 1919 Revolution. C2: Civil disobedience is one of the many ways people have rebelled against what they deem to be unfair laws. Q: What is it called when people in society rebel against laws they think are unfair? A: Civil disobedience Complex sentences: · C2 = one of the many ways people have (relative clause) · C2 = Civil disobedience is... against [the object] and [it is] what they deem to... (relative clause) · Q = What is it called... laws and they think [the laws] unfair? 64 / 80

One of its earliest massive implementations was brought about by Egyptians against the British occupation in the 1919 Revolution. C2: Civil disobedience is one of the many ways people have rebelled against what they deem to be unfair laws. Q: What is it called when people in society rebel against laws they think are unfair? A: Civil disobedience Commonsense reasoning: · What is it called in Q ⇒ Civil disobedience is · laws they think... in C2 = what they deem to in Q 65 / 80

ϥϕϧ਺ͷग़ݱස౓ͷൺֱ Table: Numbers of skills required in each question of
MCTest / SQuAD. # skill(s) 0 1 2 3 4 5 MCTest 10.3% 28.4% 28.4% 23.8% 8.1% 0.9% SQuAD 5.0% 48.8% 37.5% 6.2% 2.5% 0.0% 3 ͭҎ্ཁٻ͢Δͷ͸ MCTest ͷํ͕ଟ͍ ೉қ౓ͱͯ͠ఆٛͰ͖Δʁ 67 / 80

SQuAD ͷಛ௃ Wikipedia ͷهࣄʹରͯ͠ crowdworker ͕͕Μ͹ͬͯΫΠζΛ ߟ͑Δ จࣗମ͸େਓ޲͚ͰɺMCTest ΑΓ΋೉͍͠ ͳΔ΂͘໰୊จ͸จ຺จͱҰக͠ͳ͍Α͏ʹݴ͍׵͑ͯ͠
Ͷɺͱ͍͏ࢦ͕ࣔͳ͞Ε͍ͯΔ ԿΒ͔ͷ஌ࣝΛཁٻ͢Δ΋ͷ͕ଟ͍ จத͔Β࿈ଓͨ͠ޠ۟Λൈ͖ग़ͯ͠౴͑Δ 68 / 80

Discussion ͍Ζ͍Ζ 1. Commonsense reasoning 2. Task uniﬁcation / other
fomulations 3. Tasks for machine learning 4. Task diﬃculty 69 / 80

Discussion: Commonsense reasoning ΰϛശঢ়ଶ: ෼ྨෆ໌ͳ΋ͷΛʮ͜ΕৗࣝͬΆ͍ʯͱͯ͠͠·͏ ࢀߟจݙ Levesque 2011, Winograd Schema
Challenge [url] Davis and Marcus 2015, Commonsense reasoning and commonsense knowledge in artiﬁcial intelligence [url] Clark and Etzioni 2016, My computer is an honor student - but how intelligent is it? Standardized Tests as a Measure of AI [pdf] 70 / 80

Davis+ 2015, Commonsense reasoning and ... Commonsense reasoining ͱ͸ʁ ͱ͍͏ղઆΛ͍ͯ͠Δ
1. Taxonomic reasoning ੈք஌ࣝʹཪ෇͚ΒΕͨਪ࿦ 2. Temporal resoning ࣌ؒతॱংؔ܎ 3. Action and change ҼՌਪ࿦తͳ΋ͷɺ͜ΕΛͨ͠ΒͲ͏ͳΔ͔ 4. Qualitative reasoning ఆੑతͳؔ܎ɺ͋Ε͕มԽͨ͠Β͜Ε΋มԽ͢Δͱ͍͏ରԠ 71 / 80

Davis+ 2015, Commonsense reasoning and ... 1. Taxonomic reasoning Taxonomies
are something like “semantic networks” Node: entities, abstract concepts, category, ... Edge: subcategory, instance, property, ... 72 / 80

Davis+ 2015, Commonsense reasoning and ... 2. Temporal reasoning Πϕϯτͷॱংؔ܎Λೝࣝ͢Δ
࣌ܥྻతͳ஌͔ࣝΒ৽͍͠ਪ࿦Λ͢Δ (Temporal logic [url] తͳ) “ ..., if one knows that Mozart was born earlier and died younger than Beethoven, one can infer that Mozart died earlier than Beethoven.” 73 / 80

Davis+ 2015, Commonsense reasoning and ... 3. Action and change:
Constraints on events: Events are atomic. Every change in the world is the result of an event. Events are deterministic. ... Domains including: Continuous domains, where change is continuous Simultaneous events Multiple agents ... 74 / 80

Davis+ 2015, Commonsense reasoning and ... 4. Qualitative reasoning If
the price of an object goes up then (usually, other things being equal) the number sold will go down. If the temperature of gas in a closed container goes up, then the pressure will go up. If an ecosystem contains foxes and rabbits and the number of foxes decreases, then the death rate of the rabbits will decrease (in the short term). 75 / 80

Discussion: Task formulations Output ޠ۟Λ౴͑Δ͚ͩͷλεΫͰྑ͍ͷ͔ʁ ͋Δ͍͸ɺಡղλεΫ͸બ୒ࣜͷ΋ͷͰΑ͍ͷ͔ʁ ޠ۟ͷ QA ͩͱɺ౴͑Λ༻ҙ͢Δͷ͕େม SQuAD(2016)
Ͱ͸ෳ਺ਓʹղ౴Λ࡞Β͓ͤͯΓɺ౴͕͑ෳ਺௨ Γ͋Δʢผͷදݱɺspan ͷ௕͞ʣ ཧ༝આ໌ͳͲΛͤ͞Δͱ΋ͬͱେม ൑ఆثΛ࡞Δʁ ਓؒͷ࠾఺͢ΒखͰ΍ͬͯΔͷʹʜʜ ͜ͷ͋ͨΓͷٞ࿦͸ Clark and Etzioni 2016, My computer is an honor student - but how intelligent is it? Standardized Tests as a Measure of AI [pdf] ͋ͨΓΛࢀর খֶੜϨϕϧͷࢉ਺ɾཧՊͷ໰୊Λ࡞͍ͬͯΔ͕ɺଟذબ୒͔ࣜ Β୤٫͢Δͷʹ࢛ۤീ͍ۤͯ͠Δ 76 / 80

Discussion: Task formulations Input จ͚ͩ͡Όͳ͘ɺਤ΍දΛ࢖͍͍ͨ ͜ͷ͋ͨΓͷٞ࿦΋ Clark and Etzioni 2016
Ͱͳ͞Ε͍ͯ·͢ ࢉ਺ͰزԿͷ໰୊͕ग़ͤͳ͍ͷ͸೉͍͠ ཧՊͷ஌ࣝ໰୊Ͱը૾Λ࢖͑ΔͱҰؾʹ෯͕޿͕Δ ֎෦஌ࣝͷѻ͍ΛͲ͏͢Δ͔ʁ ৗࣝɿ஌͍ͬͯͯཉ͍͠ͷͰςετͱͯ͠͸ఏࣔ͠ͳ͍ ಛघͳ஌ࣝɿղ͚ΔΑ͏ʹิॿతͳઆ໌จΛ૿΍͢ʢਓ͕ղ͘ࠃ ޠͷจষ୊Ͱ஫͕෇͘Α͏ʹʣ 77 / 80

Discussion: Tasks for machine learning ͦ΋ͦ΋ reading comprehension ʹ͓͚Δʮֶशʯ͸ԿΛֶशͯ͠ ͍Δͷ͔
ʢݴޠ֫ಘΛ஍ಓʹ΍Δ΂͖Ͱ͸ʁʣ ʢຊ౰͔ʁʣ ೝ஌ݴޠֶͰ͸ɺ༻๏ج൫ཧ࿦ͱ͍͏ͷ͕͋Γʜʜ Langacker 1987 [amzn] ΛಡΈ͍ͨ ೔ຊޠͩͱ τϚηϩ 2006, ʰ͜ͱ͹Λͭ͘Δʱ[amzn] ͕͓͢ ͢Ί ʮςΩετϕʔεͷݴޠ֫ಘͱ͸ʁʯΛߟ͑Δඞཁ͸͋Δ ֓೦ɾจ๏͸ຊདྷతʹ͸஌֮৘ใͱ݁ͼͭ͘΋ͷͳʢ͸ͣʣͷͰɺ ஌֮৘ใΛ࣋ͨͳֶ͍शث͸֓೦ؒͷؔ܎ੑ͔͠शಘͰ͖ͳ͍ Taxonomy Λ֫ಘ͢Δɺͱ͍͏ײ͡ ʹɺೝ஌ʜʜ 78 / 80

Discussion: Task diﬃculty λεΫͷ೉қ౓ΛଌΔࢦඪΛ͞Βʹ༻ҙͰ͖ͳ͍͔ʁ ζϧʢʁʣΛͯ͠ղ͚Δɺͱ͍͏ͷ͕͋Γ͏Δ ౴͑ͱͳΔ඼ࢺ͕Ұޠ͔͠ग़ݱ͍ͯ͠ͳ͍ ࠷සͷޠΛ౴͑Ε͹Α͍ બ୒ࢶͷଞͷީิ͕໌Β͔ʹؒҧ͍ͬͯΔ n-gram Ͱղ͚ͪΌ͏
͜ͷ͋ͨΓͷ਺஋Λ໌ࣔతʹ͢ΔͱΑ͍ʁ ࣗ෼ͱͯ͠͸͕͜͜໌֬ʹͳΒͳ͍ͱ೉қ౓͕ଌΕͳ͍ͱײ ͡Δ 79 / 80

5. ·ͱΊࠓޙ ಡղλεΫΛ෼ੳ͢Δखஈͱͯ͠ɺprerequisite skills Λఆٛ͠ ·ͨ͠ ͜Εʹج͍ͯΞϊςʔγϣϯ͢Δ͜ͱͰ͕࣍໌Β͔ʹͰ͖ͦ ͏Ͱ͢: λεΫͷ೉қ౓ λεΫ͕ඞཁͱ͢Δೳྗͷ܏޲
ࠓޙ͸͜ͷఆٛͷଥ౰ੑ΍ࡉ෼Խʢͱ͘ʹ஌ࣝ·ΘΓʣΛߟ ͑Δඞཁ͕͋Γ·͢ ϞσϧΛ࡞ΒͶ͹ 80 / 80

TokyoCL-2016-10-05

TokyoCL-2016-10-05

More Decks by penzant

Featured

Transcript