Slide 1

Slide 1 text

೔ຊޠϨγϐσʔληοτͷ
 ܧଓతͳߏஙͱෳ߹తͳར༻ ݪౡ७ɺฏদ३ɺਂᖒ༞ԉɺࢁޱହ߂ʢΫοΫύουגࣜձࣾʣ NLP2022 Workshop on Japanese Evaluation Dataset (JED2022) 1

Slide 2

Slide 2 text

എܠ Πϯλʔωοτ΍εϚʔτϑΥϯͷීٴʹΑΓΠϯλʔωοτ্ͷϨγϐ͕૿Ճ ɾ೔ຊޠͩͱ 70 ສϨγϐʢ2010ʣˠ 500 ສϨγϐʢ2020ʣˎ1 Ϩγϐʹؔ͢Δݚڀ΍σʔληοτ΋૿Ճ ɾݚڀɿݴޠཧղ [Kiddon+ 15]ɺจॻੜ੒ [Kiddon+ 16]ɺ৘ใݕࡧ [Salvador+ 17]ɺ࣭໰Ԡ౴ [Yagcioglu+ 18]ɺ…
 ɾσʔληοτɿRecipe1M+ [Marin+ 19]ɺRISeC [Jiang+ 20]ɺARA [Donatelli+ 21]ɺ… ݚڀʹ͠Ζσʔληοτʹ͠ΖɺϝΠϯ͸΍͸ΓӳޠʢಛʹτοϓΧϯϑΝϨϯεʣˠ ೔ຊޠ΋ෛ͚ͯΒΕͳ͍ʂ ˎ1 ΫοΫύουͱָఱϨγϐʹ౤ߘ͞ΕͨϨγϐͷ૯਺ʢൃදऀௐ΂ʣ 2

Slide 3

Slide 3 text

໨࣍ ೔ຊޠϨγϐσʔληοτͷܧଓతͳߏங ೔ຊޠϨγϐσʔληοτͷෳ߹తͳར༻ ·ͱΊͱࠓޙͷల๬ 3

Slide 4

Slide 4 text

Πϯλʔωοτ্ͰϨγϐͷ౤ߘɾݕࡧ͕Ͱ͖Δ೔ຊ࠷େˎ1ͷ
 ϨγϐαʔϏε
 ɾϨγϐ౤ߘ਺ɿ365 ສ඼
 ɾࠃ಺݄ؒར༻ऀ਺ɿ5,600 ສਓ
 ɾϓϨϛΞϜձһ਺ɿ183 ສਓ
 ɾల։ࠃɾ஍Ҭ਺ɿ74 Χࠃ
 ɾରԠݴޠ਺ɿ32 ݴޠ ΫοΫύου ˎ1 ͦΕͧΕ 2021 ೥ 12 ݄ 31 ೔࣌఺ͷσʔλ 4

Slide 5

Slide 5 text

Ϩγϐ ྉཧͷࡐྉ΍࡞ΓํΛهड़ͨ͠จॻ ଟ͘ͷ৔߹ɺҎԼͷཁૉͰߏ੒͞ΕΔ
 ɾλΠτϧ
 ɾ࡞ऀͷίϝϯτ
 ɾ࡞ऀͷ໊લ
 ɾࡐྉ
 ɾ࡞Γํ
 ɾௐཧޙͷࣸਅʢ৔߹ʹΑͬͯ͸ಈըʣ
 ɾௐཧதͷࣸਅʢ৔߹ʹΑͬͯ͸ಈըʣ
 ɾ… 5

Slide 6

Slide 6 text

Cookpad Dataset ΫοΫύουגࣜձ͕ࣾܧଓతʹߏஙɾެ։͍ͯ͠Δσʔληοτ ɾCookpad Recipe Datasetʢ2015 ೥ެ։ʣ ɾCookpad Image Datasetʢ2017 ೥ެ։ʣ ɾCookpad Comparable Corpusʢ2017 ೥ެ։ʣ ɾCookpad Parsed Corpusʢ2020 ೥ެ։ʣ 6

Slide 7

Slide 7 text

Cookpad Recipe Dataset 2014 ೥ 9 ຤·Ͱʹ౤ߘ͞Εͨ໿ 172 ສϨγϐͷςΩετʢλΠτϧɺ
 ࡞ऀͷίϝϯτɺࡐྉɺ࡞Γํɺ…ʣΛऩ࿥ [Harashima+ 16]
 Ұ෦ͷϨγϐʹ͸ΧςΰϦ΍ݙཱͷ৘ใ΋͋Δʢٯʹݴ͏ͱɺશͯͷ
 Ϩγϐʹ͸ͳ͍ʣ 2015 ೥ʹެ։ɺϨγϐؔ࿈ͷςΩετσʔληοτͱͯ͠͸ੈք࠷େ 7

Slide 8

Slide 8 text

Cookpad Recipe Dataset ޙड़͢Δଞͷσʔληοτͱҧ͍ɺNIIˎ1 ܦ༝Ͱެ։ 2022 ೥ 3 ݄࣌఺Ͱશࠃ 110 େֶ 212 ݚڀࣨˎ2͕ར༻ ˎ1 https://www.nii.ac.jp/dsc/idr/cookpad/
 ˎ2 NLP Ҏ֎ͷݚڀࣨ΋ଟ਺ 8

Slide 9

Slide 9 text

Cookpad Image Dataset Recipe Dataset ͱಉ͡ 172 ສϨγϐͷը૾ʢௐཧޙͷࣸਅɺௐཧத
 ͷࣸਅʣΛऩ࿥ [Harashima+ 17] 2017 ೥ʹެ։ɺϨγϐؔ࿈ͷը૾σʔληοτͱͯ͠͸ੈք࠷େ 9

Slide 10

Slide 10 text

Cookpad Image Dataset ௐཧதͷࣸਅ਺Ͱ΋ੈք࠷େ ௐཧޙͷࣸਅ਺Ͱੈք࠷େ Recipe Dataset ͱඥ෇͚Մೳ 10

Slide 11

Slide 11 text

Cookpad Comparable Corpus 16,000 Ϩγϐʹର͢Δ຋༁σʔλʢ೔ˠӳʣΛऩ࿥ ɾաڈʹ։ൃ͍ͯͨ͠αʔϏεʢΫϩʔζࡁΈʣͰ࢖༻
 ɹ͍ͯͨ͠σʔλ 
 ຋༁ϓϩηε
 ɾ1. ೔ຊޠωΠςΟϒ 1 ໊ˎ1ˎ2 ͕຋༁
 ɾ2. ӳޠωΠςΟϒ 2 ໊ˎ2 ͕मਖ਼
 
 WAT 2017 ͱ 2018ˎ3 ͷ subtask ͱͯ͠ఏڙ ˎ1 ӳޠʹਫ਼௨͍ͯ͠ΔਓΛ࠾༻
 ˎ2 ྉཧʹਫ਼௨͍ͯ͠ΔਓΛ࠾༻
 ˎ3 http://lotus.kuee.kyoto-u.ac.jp/WAT/WAT{2017,2018}/index.html • ja: { • title: ཛ౾෗ͷ͢·͠ो, • ingredients: [ • ཛ౾෗, • … • ], • steps: [ • ͚ͨͷ͜͸্ͷ΍ΘΒ͔͍෦෼͚ͩΛബ͘੾Δɻ, • … • ], • }, • en: { • title: Clear Broth with Egg Tofu, • ingredients: [ • Egg tofu, • … • ], • steps: [ • Take the soft part of the top of the bamboo shoot and thinly slice., • … • ], • } ؆୯ͷͨΊɺ࣮ࡍͷσʔλΛվมɾলུ 11

Slide 12

Slide 12 text

Cookpad Comparable Corpus ϕϯνϚʔΫͷ݁Ռ΍࣮ݧ༻ͷεΫϦϓτ͕ӾཡɾऔಘՄೳ 12

Slide 13

Slide 13 text

Cookpad Parsed Corpus 500 ϨγϐʢλΠτϧͱ࡞Γํʣʹର͢Δܗଶૉղੳͱߏจղੳɺ
 ݻ༗දݱೝࣝͷਖ਼ղσʔλΛऩ࿥ [Harashima&Hiramatsu 20] ɾܗଶૉղੳɿMeCabʢipadicʣͷ݁ՌΛਓखͰमਖ਼ ɾߏจղੳɿCaboCha ͷ݁ՌΛਓखͰमਖ਼
 ɾݻ༗දݱೝࣝɿಠࣗͷ 17 λάΛਓखͰ෇༩ اۀʹΑΔ೔ຊޠղੳࡁΈίʔύεͷެ։͸ॳʁ # Step-ID:1 # Sentence-ID:1-1 * 0 4D 1/2 .7 1 3:,,?,35,*,*,*,*,1,,,B-Fi + ?,,<,*,*,*,*,+, , ,I-Fi  0,,$0,,*,*,*,*,,,,O * 1 2D 1/2 =4' ( ?,,<,*,*,*,*,(, , ,B-Sf 6 ?,,<,*,*,*,*,6, , ,I-Sf  0,, 0,,<,*,*,*,,,,O * 2 4P 0/0 /' 2 ;,,-A,*,*,&8),B@%,2, , ,B-Ap * 3 4D 0/1 =4'  ?,,<,*,*,*,*,, , ,B-Fi  0,, 0,,<,*,*,*,,,,O * 4 -1O 0/0 /'  ;,,-A,*,*,&8),!>%,,,,B-Ap  "*,#9,*,*,*,*,,,,O EOS 13

Slide 14

Slide 14 text

Cookpad Parsed Corpus ৽ฉهࣄͷղੳͱൺ΂Δͱ…
 ɾܗଶૉղੳ͸೉͍͠ʢະ஌ޠ͕ଟ͍ͨΊʣ
 ɾߏจղੳ͸қ͍͠ʢจ͕୹͍ͨΊʣ
 ɾݻ༗දݱೝࣝ͸ෆ໌ʢಉ͡λά͕෇͍ͯͳ͍ͨΊʣ ࠶ֶश ద߹཰ ࠶ݱ཰ '஋ ୯ޠ෼ׂͷΈ ͳ͠       ͋Γ       ୯ޠ෼ׂʴ ඼ࢺλά෇͚ ͳ͠       ͋Γ       ਖ਼ղ཰ ద߹཰ ࠶ݱ཰ '஋ <4BTBEB >         <-BNQMF >         ܗଶૉղੳثʢ.F$BCʣͷੑೳˎ ݻ༗දݱೝࣝثͷੑೳˎ ࠶ֶश ਖ਼ղ཰ จઅ୯Ґ จ୯Ґ ͳ͠     ͋Γ     ߏจղੳثʢ$BCP$IBʣͷੑೳˎ ˎ1 ࣮ݧ༻ͷεΫϦϓτ͸ https://github.com/cookpad/cpc1.0 Ͱެ։ 14

Slide 15

Slide 15 text

ͨ΂ΈΔʢ༨ஊʣ ΫοΫύουͷݕࡧσʔλΛ஝ੵɺ๏ਓ޲͚ʹ
 ల։͍ͯ͠Δ෼ੳπʔϧ 2016 ೥ʹެ։
 ɾσʔληοτͱͯ͠ެ։͍ͯ͠ΔΘ͚Ͱ͸ͳ͘
 ɹΞΧ΢ϯτΛແঈͰఏڙʢݚڀऀͷΈʣ 15

Slide 16

Slide 16 text

໨࣍ ೔ຊޠϨγϐσʔληοτͷܧଓతͳߏங ೔ຊޠϨγϐσʔληοτͷෳ߹తͳར༻ ·ͱΊͱࠓޙͷల๬ 16

Slide 17

Slide 17 text

ෳ߹తͳར༻ʁ ֤σʔληοτ͸ݸผʹར༻Մೳʢ౰ͨΓલʣ Ұํɺෳ߹తʹར༻͢Δ͜ͱͰॳΊͯऔΓ૊ΊΔλεΫ΍ख๏΋ 17

Slide 18

Slide 18 text

ݸผͷར༻ Recipe Dataset Image Dataset Comparable Corpus Parsed Corpus ɾػց຋༁ʢ೔ӳʣ ɾܗଶૉղੳ
 ɾߏจղੳ
 ɾݻ༗දݱೝࣝ ɾ௒ղ૾
 ɾ… 18 ɾจॻਪનʢओࡊਪનɾ෭ࡊਪનʣ
 ɾจॻੜ੒ʢλΠτϧɾ࡞Γํੜ੒ʣ
 ɾΩʔϫʔυਪનʢࡐྉਪનʣ
 ɾ…

Slide 19

Slide 19 text

ෳ߹తͳར༻ Recipe Dataset Image Dataset Comparable Corpus Parsed Corpus ࢹ֮త࣭໰Ԡ౴ Ωϟϓγϣϯੜ੒ ϚϧνϞʔμϧݕࡧ ϚϧνϞʔμϧ຋༁ ը૾ೝࣝʢྉཧೝࣝɾࡐྉೝࣝʣ ɾจॻਪનʢओࡊਪનɾ෭ࡊਪનʣ
 ɾจॻੜ੒ʢλΠτϧɾ࡞Γํੜ੒ʣ
 ɾΩʔϫʔυਪનʢࡐྉਪનʣ
 ɾ… ɾػց຋༁ʢ೔ӳʣ ɾܗଶૉղੳ
 ɾߏจղੳ
 ɾݻ༗දݱೝࣝ ɾ௒ղ૾
 ɾ… 19

Slide 20

Slide 20 text

Recipe Dataset Comparable Corpus Parsed Corpus ࣄલֶश
 ɾMasked Language Model
 ɾNext Sentence Prediction
 ɾ… ɾػց຋༁ʢ೔ӳʣ ɾܗଶૉղੳ
 ɾߏจղੳ
 ɾݻ༗දݱೝࣝ ෳ߹తͳར༻ʢख๏ͷ؍఺ʣ ϑΝΠϯνϡʔχϯά ϑΝΠϯνϡʔχϯά 20

Slide 21

Slide 21 text

ࣄલֶशϞσϧͷߏங طʹऔΓ૊Έ͸͡Ί͍ͯͩͬͯ͘͞Δํ΋ HCG γϯϙδ΢Ϝ 2021 21

Slide 22

Slide 22 text

͞ΒͳΔซ༻΋ʁ ɾָఱσʔληοτ ɾϑϩʔάϥϑίʔύε [Mori+ 14] ɾྉཧΦϯτϩδʔ [Nanba+ 14] ɾجຊྉཧ஌ࣝϕʔε [ਗ਼ؙ+ 18] ɾr-FG-BB σʔληοτ [Nishimura+ 20] ɾ… ͍ͣΕ΋Ϩγϐ΍ྉཧʹؔ͢Δ
 ೔ຊޠͷσʔληοτ 22

Slide 23

Slide 23 text

໨࣍ ೔ຊޠϨγϐσʔληοτͷܧଓతͳߏங ೔ຊޠϨγϐσʔληοτͷෳ߹తͳར༻ ·ͱΊͱࠓޙͷల๬ 23

Slide 24

Slide 24 text

·ͱΊ ೔ຊޠϨγϐσʔληοτͷܧଓతͳߏங
 ɾCookpad Recipe Datasetʢ2015 ೥ެ։ʣ
 ɾCookpad Image Datasetʢ2017 ೥ެ։ʣ
 ɾCookpad Comparable Corpusʢ2017 ೥ެ։ʣ
 ɾCookpad Parsed Corpusʢ2020 ೥ެ։ʣ ೔ຊޠϨγϐσʔληοτͷෳ߹తͳར༻
 ɾλεΫɿࢹ֮త࣭໰Ԡ౴ɺϚϧνϞʔμϧݕࡧɺΩϟϓγϣϯੜ੒ɺ…
 ɾख๏ɿࣄલֶशʴϑΝΠϯνϡʔχϯά 24

Slide 25

Slide 25 text

ࠓޙͷల๬ Cookpad Video Dataset with OMRON SINIC X Ӷҙ։ൃதʂ 25 Parsed Corpus # Step-ID:1 # Sentence-ID:1-1 * 0 4D 1/2 .7 1 3:,,?,35,*,*,*,*,1,,,B-Fi + ?,,<,*,*,*,*,+, , ,I-Fi  0,,$0,,*,*,*,*,,,,O * 1 2D 1/2 =4' ( ?,,<,*,*,*,*,(, , ,B-Sf 6 ?,,<,*,*,*,*,6, , ,I-Sf  0,, 0,,<,*,*,*,,,,O * 2 4P 0/0 /' 2 ;,,-A,*,*,&8),B@%,2, , ,B-Ap * 3 4D 0/1 =4'  ?,,<,*,*,*,*,, , ,B-Fi  0,, 0,,<,*,*,*,,,,O * 4 -1O 0/0 /'  ;,,-A,*,*,&8),!>%,,,,B-Ap … Video Dataset ղੳࡁΈϨγϐͱௐཧಈըΛඥ෇͚

Slide 26

Slide 26 text

ࢀߟจݙ • [Donatelli+ 21] Aligning Actions Across Recipe Graphs • [Harashima+ 16] A Large-Scale Recipe and Meal Data Collection as Infrastructure for Food Research • [Harashima+ 17] Cookpad Image Dataset: An Image Collection as Infrastructure for Food Research • [Harashima&Hiramatsu 20] Cookpad Parsed Corpus: Linguistic Annotations of Japanese Recipes • [Jiang+ 20] Recipe Instruction Semantics Corpus (RISeC): Resolving Semantic Structure and Zero Anaphora in Recipes • [Kiddon+ 15] Mise en Place: Unsupervised Interpretation of Instructional Recipes • [Kiddon+ 16] Globally Coherent Text Generation with Neural Checklist Models • [Lample+ 16] Neural Architectures for Named Entity Recognition • [Marin+ 19] Recipe1M+: A Dataset for Learning Cross-Modal Embeddings for Cooking Recipes and Food Images • [Mori+ 14] Flow Graph Corpus from Recipe Texts • [Nanba+ 14] Construction of a Cooking Ontology from Cooking Recipes and Patents • [Nishimura+ 20] Visual Grounding Annotation of Recipe Flow Graph • [Salvador+ 17] Learning Cross-modal Embeddings for Cooking Recipes and Food Images • [Sasada+ 15] Named Entity Recognizer Trainable from Partially Annotated Data • [Yagcioglu+ 18] RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes • [߳઒+ 21] ΫοΫύουσʔληοτͰֶशͨ͠ BERT ٴͼ GPT-2 ͷ׆༻๏ • [ਗ਼ؙ+ 18] ྉཧϨγϐͱΫϥ΢υιʔγϯάʹجͮ͘جຊྉཧ஌ࣝϕʔεͷߏங 26