Slide 1

Slide 1 text

ΫοΫύουʹ͓͚Δݚڀ։ൃ ΫοΫύουגࣜձࣾݪౡ७

Slide 2

Slide 2 text

೥݄ ೥݄ ژ౎େֶࠇڮݚ഑ଐ ΫοΫύουೖࣾ ത࢜ʢ৘ใֶʣऔಘ ΤϯδχΞʢ3VCZPO3BJMTʣ ೥݄ ֶੜʢࣗવݴޠॲཧ৘ใݕࡧʣ ݪౡ७ ೥݄ Ϛωʔδϟʔʢਓࣄ޿ใʜʣ ݚڀ։ൃ෦໳ઃཱ

Slide 3

Slide 3 text

ΫοΫύου Πϯλʔωοτ্ͰϨγϐͷ౤ߘɾݕࡧ౳͕Ͱ͖Δ ੈք࠷େͷྉཧϨγϐαʔϏε ݄ؒར༻ऀ਺ԯਓ௒ʢΧࠃݴޠʣ ϓϨϛΞϜձһ਺໿ສਓ

Slide 4

Slide 4 text

ΫοΫύουϚʔτ

Slide 5

Slide 5 text

ͨͷ͍͠Ωονϯෆಈ࢈

Slide 6

Slide 6 text

Ԋֵ ೥݄ ༗ݶձࣾίΠϯʢݱΫοΫύουגࣜձࣾʣઃཱ ೥݄ ೥݄ ೥݄ ೥݄ ೥݄ Ϩγϐͷ౤ߘɾݕࡧαʔϏεLJUDIFO!DPJO։࢝ ΫοΫύουʹαʔϏε໊มߋ ೥݄ ϓϨϛΞϜαʔϏε։࢝ Ϛβʔζ্৔ ౦ূҰ෦ʹࢢ৔มߋ ϓϨϛΞϜαʔϏεձһ਺ສਓಥഁ ೥݄ ւ֎ల։Λຊ֨Խ ೥݄ ݚڀ։ൃ෦໳ൃ଍

Slide 7

Slide 7 text

ؔ࿈αʔϏε ͓·͔ͤ੔ཧ ०ͷ໺ࡊϨγϐ΍ ϝΠϯʹ࢖͑ΔϨγϐ͕ ͙͢ʹݟ͔ͭΔ "MFYBεΩϧ ࢖͍͍ͨࡐྉ͚ͩͰ ͙͢ʹ࡞ΕΔਓؾͷϨγϐఏҊ ྉཧ͖Ζ͘ ࡱΔ͚ͩͰه࿥ʹʂ ͋ͳͨͷ͓ྉཧࣸਅΛ ࣗಈͰ੔ཧ Ϩγϐͷͦͷઌ΁ ͓͍͍͠εϚʔτΩονϯ 0J$Z

Slide 8

Slide 8 text

ϨγϐʢςΩετʴը૾ʣ ϑΟʔυόοΫ΋ςΩετʴը૾ ϝσΟΞॲཧʢಛʹ /-1$7ʣ͕ॏཁ

Slide 9

Slide 9 text

/-1

Slide 10

Slide 10 text

ϨγϐΛͦͷछྨ΍ௐཧ๏ɺδϟϯϧɺʜͰ෼ྨ ɾछྨʢFH ೑ྉཧɺڕྉཧʣɺௐཧ๏ʢFH ࣽ෺ɺম෺ʣɺδϟϯϧʢFH ࿨ɺ༸ʣ ɾ෼ྨ݁Ռ͸ݕࡧ݁ՌͷߜΓࠐΈ౳Ͱར༻ Ϟσϧ͸47.΍3' ɾλάʹ΋ґΔ͕ɺे෼ͳੑೳ ֶशσʔλ ɾΞϊςʔλʔ͕λά෇͚ͨ͠Ϩγϐ΍Ϣʔβʔ͕λά෇͚ͨ͠ϨγϐΛར༻ Ϩγϐͷ෼ྨ

Slide 11

Slide 11 text

ΫοΫύουͷࡐྉ໊͸਺ඦສछྨ ɾϨγϐͷΧϩϦʔ౳Λܭࢉ͢Δࡍʹ໰୊ &ODPEFS%FDPEFSͰਖ਼نԽ ɾDIBSBDUFSCBTFETUBDLFEVOJEJSFDUJPOBM-45.ʢ૚ʣ ɾͳ͓ɺCJEJSFDUJPOBM-45.ͱBUUFOUJPO͸ޮՌͳ͔ͬͨ ˒ ে ༉ ΐ ͠ ͏ Ώ &08 &08 ΐ ͠ ͏ Ώ ࡐྉ໊ͷਖ਼نԽ

Slide 12

Slide 12 text

$BMPSJF&TUJNBUJPO )BSBTIJNBFUBM "TBSJ$MBN3JDF HP ɹɹɹɹɹΞαϦɹ ɹɹɹɹɹถɹɹɹ ɹɹɹɹɹԘɹɹɹ ɹɹɹɹɹञɹɹɹ ɹɹɹɹɹ͠ΐ͏Ώ ɹɹɹɹɹΈΓΜɹ SJDF BTBSJDMBN TBMU TBLF TPZTBVDF TXFFUTBLF 㾎 㾎 㾎 㾎 㾎 㾎 8FIBWFFTUJNBUFEUIFOVNCFSPGDBMPSJFTJOPWFS  SFDJQFTBOEBDUVBMMZVTFUIFNJOPVSSFDJQFTFSWJDF 6TFUIFTJOHMFTPVSDFNPEFMGPSTFSWJOHFTUJNBUJPO  c(r) = ∑ i∈Ir c(i) ⋅ q(i)/100 s(r) = 306.6 㾎 *OHSFEJFOU

Slide 13

Slide 13 text

Ϣʔβʔ͕ೖྗͨ͠λΠτϧΛ&NCFE ɾ֤୯ޠΛ&NCFEͯ͠ɺͦͷฏۉΛऔಘ ϨγϐΛݕࡧ ɾ&NCFEEJOH4QBDF಺ͰࣅͨλΠτϧΛ࣋ͭϨγϐΛ ɹΫοΫύουͷطଘͷϨγϐ͔Βݕࡧ ࡐྉ໊ΛϨίϝϯυ ɾݕࡧ͞ΕͨϨγϐ಺Ͱڞ௨͢Δࡐྉ໊ΛϨίϝϯυ ࡐྉ໊ͷϨίϝϯυ

Slide 14

Slide 14 text

खॱͷ෼ྨ Ϩγϐͷ֤ௐཧखॱΛ5SVF4UFQͱ'BLF4UFQʹ෼ྨ ɾ໿ खॱΛλά෇͚ɺ-45.Λֶशʢਖ਼ղ཰ʣ ɾϨγϐͷಡΈ্͛౳ͷαʔϏεͰར༻

Slide 15

Slide 15 text

ࣾ಺ʹʢͨ·ͨ·ʣର༁ίʔύε͕͋ͬͨ ɾաڈͷαʔϏεͷҨ෺ ɾ໿ ඼ͷ೔ӳର༁ /.5<#BIEBOBVFUBM>Λςετ ɾ݁ՌɺαʔϏεԽͰ͖Δ΄Ͳͷ຋༁͸೉͍͠ ɾͨ·ʹͦΕͳΓͷ຋༁͕ੜ੒Ͱ͖Δ͜ͱ΋ ɹɾ೔ɿ֖·ͨ͸ΞϧϛϗΠϧͰམͱ֖͠Λͯ͠தՐͰৠ͠ম͖ʹ͠·͢ɻ ɹɾӳɿDPWFSXJUIBMJEPSBMVNJOVNGPJM BOETUFBNUIFDIJDLFOPONFEJVNIFBU Ϩγϐͷ຋༁ #BIEBOBVFUBM

Slide 16

Slide 16 text

͝ҙݟͷ෼ྨ ʮඒຯ͘͠Ͱ͖·ͨ͠ʂʯ ϙδςΟϒ Ϣʔβʔ͔Βͷ͝ҙݟΛࣗಈͰ෼ྨ ɾݸͷλάʹରԠʢݸͷ47.Λֶशʣ ɾ֬཰ͷߴ͍λάΛαδΣετɺελοϑͷ෼ྨ࡞ۀΛ൒ݮ

Slide 17

Slide 17 text

$7

Slide 18

Slide 18 text

ྉཧ ඇྉཧ ඇྉཧ ඇྉཧ ਫ਼౓࠶ݱ཰ ྉཧ͖Ζ͘͸ެ։೥Ͱສਓ͕ར༻ ྉཧࣸਅͷݕग़ ϢʔβͷεϚʔτϑΥϯ͔ΒྉཧࣸਅΛਂ૚ֶशͰݕग़

Slide 19

Slide 19 text

WFS ɾ$B ff F/FU ɾྉཧPSඇྉཧͷೋ஋෼ྨ WFS ɾ*ODFQUJPOW ɾྉཧ ২෺ ʜ PSඇྉཧͷଟ஋෼ྨ WFS ɾ*ODFQUJPOW QBUDIFEDMBTTJ fi DBUJPO ɾྉཧPSඇྉཧͷೋ஋෼ྨ ֶशσʔλ ɾਖ਼ྫɿΫοΫύουͷϨγϐͷը૾ ɾෛྫɿϥΠηϯεϑϦʔͷछʑͷը૾ WFSͷೝࣝ݁Ռ ྉཧࣸਅͷݕग़

Slide 20

Slide 20 text

https://twitter.com/ohmycorgi/status/867745923719364609 https://twitter.com/teenybiscuit/status/707727863571582978 㾎͕Ϟσϧ͕ྉཧͱ൑ఆͨࣸ͠ਅ ྉཧࣸਅͷݕग़

Slide 21

Slide 21 text

ඇྉཧ ྉཧ ྉཧ ྉཧ ඇྉཧࣸਅͷݕग़ Ϣʔβʔ͕౤ߘͨࣸ͠ਅ͔ΒඇྉཧࣸਅΛݕग़ ྉཧࣸਅͷݕग़ͱಉ͡ϞσϧΛར༻ ϢʔβʔʹΨΠυϥΠϯΛ௨஌ ˞ඇྉཧͷ৔߹ɺՈ଒΍ϖοτͷөΓࠐΈ͕ଟ͍

Slide 22

Slide 22 text

૝૾Λ௒͑ΔʁτϚτϨλεཛͷ̏৭ εʔϓ ؆୯τϚτεʔϓ ʢϛωετϩʔωʣ ৽ۄͶ͗ͱτϚτͷαϥμ ໨ࢦͤσϦ෩ʂϨλεͱτϚτͷα ϥμ ௒γϯϓϧʂτϚτͱେࠜͰ࿨෩ ྫྷ੩ύελ τϚτͱ৽ۄͶ͗ͷ ͬ͞ͺΓϚϦω τϚτ ࡐྉࣸਅͷ෼ྨ ࡐྉࣸਅΛ໿छྨʹ෼ྨɺαʔϏε։ൃʹར༻ʢ༧ఆʣ

Slide 23

Slide 23 text

ͦͷଞͷτϐοΫ

Slide 24

Slide 24 text

Ϩγϐσʔλͷެ։

Slide 25

Slide 25 text

ར༻ঢ়گ ؔ ࿈ τ ϐ ο Ϋ ར༻ঢ়گ ެ։લʢʙ ೥ ݄ʣ େֶ ݚڀࣨ ެ։ޙʢ ೥ ݄ʣ  େֶ  ݚڀࣨ

Slide 26

Slide 26 text

ྉཧݕग़෦໳ ɾࣸਅதͷྉཧྖҬΛݕग़ ྉཧ෼ྨ෦໳ ɾྉཧࣸਅΛΫϥεʹ෼ྨ ͦ͏ΊΜ ͏ͲΜ ୈճ"*νϟϨϯδίϯςετ

Slide 27

Slide 27 text

+4"*$VQ ࡐྉࣸਅͷ෼ྨ ɾΧςΰϦʢFH ۄೢʣ ɾશ෦Ͱ ຕ

Slide 28

Slide 28 text

8"5

Slide 29

Slide 29 text

MA - MeCab, the most popular morphological analyzer of Japanese, was tested - All metrics indicated 89–91% although the tool has already achieved over 98% on newspaper articles in Kudo et al. (2004) Cookpad Parsed Corpus: Linguistic Annotations of Japanese Recipes Jun Harashima and Makoto Hiramatsu (Cookpad Inc.) The 14th Linguistic Annotation Workshop  Background     Cookpad Parsed Corpus Name Year Main Content CURD 2008 Machine-readable language representations Flow Graph Corpus 2014 Graph representations and named entities SIMMR Recipe Dataset 2015 Graph representations Cookpad Recipe Dataset 2016 Reviews and meals Cookpad Image Dataset 2017 Food images and cooking images Recipe1M 2017 Food images RecipeQA 2018 Question-answer pairs Stroyboarding Data 2019 Cooking images r-FG BB dataset 2019 Bounding boxes for cooking images English Recipe Flow Graph Corpus 2020 Graph representations and named entities Mulitimodal Aligned Recipe Corpus 2020 URLs to YouTube videos Mulit-modal Recipe Structure dataset 2020 Graph representations and cooking images Cookpad Parsed Corpus 2020 Linguistic annotations Name Year Target documents KU Text Corpus 2002 Newspaper articles GDA Corpus 2005 Newspaper articles and dictionary entries NAIST Text Corpus 2007 Newspaper articles KU and NTT Blog Corpus 2011 Blogs KU Web Document Leads Corpus 2012 Web documents BCCWJ 2014 Newspaper articles, books, magazines, etc Cookpad Parsed Corpus 2020 Cooking recipes # Step-ID:1 # Sentence-ID:1-1 * 0 4D 1/2 .7 1 3:,,?,35,*,*,*,*,1,,,B-Fi + ?,,<,*,*,*,*,+, , ,I-Fi  0,,$0,,*,*,*,*,,,,O * 1 2D 1/2 =4' ( ?,,<,*,*,*,*,(, , ,B-Sf 6 ?,,<,*,*,*,*,6, , ,I-Sf  0,, 0,,<,*,*,*,,,,O * 2 4P 0/0 /' 2 ;,,-A,*,*,&8),B@%,2, , ,B-Ap * 3 4D 0/1 =4'  ?,,<,*,*,*,*,, , ,B-Fi  0,, 0,,<,*,*,*,,,,O * 4 -1O 0/0 /'  ;,,-A,*,*,&8),!>%,,,,B-Ap  "*,#9,*,*,*,*,,,,O EOS raw salmon (topic marker) a bite size (dative) cut salt (accusative) sprinkle . - The number of cooking recipes on the Internet has grown - Recipe-related studies and datasets are also increasing - However, there are still few datasets that provide linguistic annotations for recipe-related studies even though such annotations should form the basis of the studies Table1. Existing recipe-related datasets and our corpus Figure 1. Linguistic annotations for an example sentence,     (Cut the raw salmon into bite-size chunks and sprinkle them with salt.), in our corpus. Precision Recall F1 MeCab 88.91 88.95 88.93 MeCab w/ domain adaptation 91.12 91.04 91.08 Accuracy Precision Recall F1 Sasada et al. (2015) 88.30 74.65 82.77 78.50 Lample et al. (2016) 91.41 88.17 87.18 87.67 Accuracy CaboCha 92.21 CaboCha w/ domain adaptation 94.68 Table 3. Benchmark results for MA Table 4. Benchmark results for NER Table 5. Benchmark results for DP - We divided our corpus into training (400 recipes), validation (100 recipes), and test sets (100 recipes) and tested popular tools or methods for Japanese MA, NER, and DP - We also tested the tools with performing domain adaptation NER - We trained/tested two recognizers using our training/test sets - Many errors were caused by domain-specific unknown words DP - We tested CaboCha, the most popular dependency parser for Japanese - Accuracy was 92–95% (over 20% of the sentences in our test set had at least one parsing error) - We randomly selected 500 recipes from the Cookpad Recipe Dataset - 4,738 sentences in the 500 recipes were annotated with morphemes, named entities, and dependency relations - Construction of a novel corpus, which contains linguistic annotations of 500 Japanese recipes - Benchmark results on the corpus for Japanese morphological analysis (MA), named entity recognition (NER), and dependency parsing (DP)  Contributions of this study Morphemes - We decided boundaries and part-of-speech for each morpheme based on the IPA dictionary, commonly used for MA Named entities - Morphemes were annotated with 17 tags such as Fi (food ingredient) and Sf (state of food) based on IOB2 format Dependency relations - Bunsetsus were annotated with the relations such as D (normal dependency) and P (coordination dependency) - A bunsetsu is a unit of Japanese that consists of one or more content words and zero or more functions words - Bunsetsus were also annotated with 7 types such as  (Topic) Other - Content in the Cookpad Recipe and Image Datasets, which include the same 500 recipes, can also be used - There is still room for improvement in Japanese MA, NER, and DP of cooking recipes - By improving the analyses using our corpus, a variety of recipe- related studies based on them can also be improved Table 2. Existing Japanese parsed corpora and our corpus

Slide 30

Slide 30 text

No content

Slide 31

Slide 31 text

·ͱΊ ΫοΫύουʹ͓͚Δݚڀ։ൃ ɾ/-1ɿࡐྉͷਖ਼نԽɺࡐྉͷϨίϝϯυɺखॱͷ෼ྨɺʜ ɾ$7ɿྉཧࣸਅͷݕग़ɺඇྉཧࣸਅͷݕग़ɺࡐྉࣸਅͷ෼ྨɺʜ ৄ͘͠͸ҎԼ΋͝ཡ͍ͩ͘͞ ɾSFTFBSDIDPPLQBEDPN ɾUFDIMJGFDPPLQBEDPN ɾXXXBJHBLLBJPSKQSFTPVSDFBJ@DPNJDT ɹୈճʮϨγϐαʔϏεͱ"*ʯ

Slide 32

Slide 32 text

એ఻ དྷय़ɺྉཧͱ/-1ɾ$7ʹؔ͢Δॻ੶Λग़൛͢Δ༧ఆͰ͢ ࣗ෼ֶੜͷݚڀςʔϚʹ͓೰Έͷํ͸ੋඇ͝Ұಡ͍ͩ͘͞ ΩονϯɾΠϯϑΥϚςΟΫε ʙྉཧΛࢧ͑Δࣗવݴޠॲཧͱը૾ॲཧʙ ݪౡ७ɾڮຊರ࢙

Slide 33

Slide 33 text

͝ਗ਼ௌ͋Γ͕ͱ͏͍͟͝·ͨ͠ɻ