Slide 1

Slide 1 text

MioGatto ʹΑΔ਺ࣜάϥ΢ϯσΟϯάσʔληοτͷߏங MioGattoʹΑΔ਺ࣜάϥ΢ϯσΟϯά σʔληοτͷߏங ே૔୎ਓɼٶඌ༞հʢ౦େʣ ɼ૬ᖒজࢠʢNIIʣ 2022-03-15 @ NLP 2022 1 / 15

Slide 2

Slide 2 text

MioGatto ʹΑΔ਺ࣜάϥ΢ϯσΟϯάσʔληοτͷߏங ਺ࣜάϥ΢ϯσΟϯά [Asakura+ 2020] 1. ਺ֶ֓೦Λࢦ͢τʔΫϯͷάϧʔϓΛݟ͚ͭΔ άϧʔϓͷྫ x, α, cos, ∑ , =, × 2. ֤άϧʔϓʹͦͷࢦࣔ͢͠਺ֶ֓೦Λඥ෇͚Δ ࠓճͷߩݙ ʜ ࣗಈԽʹ޲͚ͯσʔληοτΛߏங ▶ ࿦จ 15 ຊͷܭ 12,352 ࣝผࢠʹखಈΞϊςʔγϣϯ ▶ จॻʹ͓͚Δࣝผࢠείʔϓͷ༷૬ͳͲ͕ݟ͖͑ͯͨ 2 / 15

Slide 3

Slide 3 text

MioGatto ʹΑΔ਺ࣜάϥ΢ϯσΟϯάσʔληοτͷߏங ਺ࣜάϥ΢ϯσΟϯά [Asakura+ 2020] ˺ΞϥΠϝϯτʴʓʓʓʓʓ આ໌ͷΞϥΠϝϯτ ▶ ֤τʔΫϯʹઆ໌ (description) Λ෇༩͢ΔλεΫ ▶ ෳ਺ͷઌߦݚڀ͋Γ [Aizawa+ 2013, Alexeeva+ 2020, etc.] ˠ΄ͱΜͲ͕τʔΫϯͷҙຯ͸จॻ಺ͰҰఆͱԾఆ 3 / 15

Slide 4

Slide 4 text

MioGatto ʹΑΔ਺ࣜάϥ΢ϯσΟϯάσʔληοτͷߏங ਺ࣜάϥ΢ϯσΟϯά [Asakura+ 2020] ˺ΞϥΠϝϯτʴڞࢀরղੳ ࣗવݴޠͷڞࢀর ౧ଠ࿠͸౧͔Βੜ·Εͨɽ ൴͸َୀ࣏ʹग़͔͚ͨɽ ڞࢀর ਺ࣜʹ͓͚Δڞࢀর ػցֶशͷΞϧΰϦζϜʹΑͬͯಘΒΕΔͷ͸ɹ ؔ਺ ɹ y(x) Ͱ͋Δɽ͜ ͷؔ਺ʹɼ৽ͨʹ਺ࣈͷը૾ x Λೖྗ͢Δͱɼ໨ඪϕΫτϧͱූ߸Խ ͷ࢓ํ͕౳͍͠ɹ ग़ྗϕΫτϧ ɹ y ͕ग़ྗ͞ΕΔɽɹ ؔ਺ ɹ y(x) ͷৄࡉͳܗ ͸܇࿅σʔλʹج͍ͮͯٻΊΒΕΔɽ (PRML, p. 2) 4 / 15

Slide 5

Slide 5 text

MioGatto ʹΑΔ਺ࣜάϥ΢ϯσΟϯάσʔληοτͷߏங ਺ࣜάϥ΢ϯσΟϯάͷඞཁੑͱ೉͠͞ ▶ ਺ࣜʹ΋ࣗવݴޠͱࣅͨᐆດੑ͋Γ [Kohlhase+, 2014] ▶ ه߸ʢτʔΫϯʣͷিಥ ▶ ਺ࣜͷߏจతᐆດੑ ྫ f (a + b) ▶ લޙͷςΩετͳ͠ʹ͸ղऍͰ͖ͳ͍ ▶ ৗࣝ΍υϝΠϯ஌ࣝͷඞཁੑ ྫ π: ԁप཰ PRML ୈ 1 ষʹ͓͚ΔτʔΫϯ y ͷଟٛੑ ຊจͷςΩετஅย y ͷҙຯ ...ಘΒΕΔͷ͸ؔ਺ y(x) Ͱ͋Δ... ը૾Λೖྗͱ͢Δؔ਺ ...ग़ྗϕΫτϧ y ͕ग़ྗ͞ΕΔ... ؔ਺ y(x) ͷग़ྗϕΫτϧ 2 ͭͷ֬཰ม਺ϕΫτϧ x ͱ y ʹ... ֬཰ม਺ϕΫτϧ ...ಉ࣌෼෍ p(x,y) Λߟ͑Α͏ɽ x ʹରԠ͢Δ஋ 5 / 15

Slide 6

Slide 6 text

MioGatto ʹΑΔ਺ࣜάϥ΢ϯσΟϯάσʔληοτͷߏங άϥ΢ϯσΟϯά৘ใݯ จॻ಺֎ͷ਺ࣜάϥ΢ϯσΟϯάͷࠜڌͱͳΔ΋ͷ จॻ಺ पลςΩετɼ਺ࣜ ྫ ಉ໊֨ࢺɼdef = จॻ֎ ৗࣝɼυϝΠϯ஌ࣝ ྫ Wikidata Ξϊςʔγϣϯ৘ใ ʜ ࣗಈԽʹ޲͚ͯඞཁͳ৘ใ ▶ ਺ֶ֓೦ ʜ άϥ΢ϯσΟϯάͷ݁Ռɽਖ਼ղϥϕϧ ▶ ৘ใݯ ʜ ͜ͷࣗಈநग़͸ࣗಈԽͷୈҰา 6 / 15

Slide 7

Slide 7 text

MioGatto ʹΑΔ਺ࣜάϥ΢ϯσΟϯάσʔληοτͷߏங ΞϊςʔγϣϯπʔϧMioGatto [Asakura+ 2021] Math Identifier-Oriented Grounding Annotation Tool ▶ ਺ࣜάϥ΢ϯσΟϯάσʔλߏஙͷͨΊͷಠࣗπʔϧ ᵋ Web ϕʔε GUIʢPython + TypeScript ࣮૷ʣ ▶ ΦʔϓϯιʔεʢMIT ϥΠηϯεʣͰ։ൃதʂ https://github.com/wtsnjp/MioGatto 7 / 15

Slide 8

Slide 8 text

MioGatto ʹΑΔ਺ࣜάϥ΢ϯσΟϯάσʔληοτͷߏங Ξϊςʔγϣϯख๏ ֶੜΞϊςʔλ Twitter ౳Ͱͷ΂ 10 ໊ΛืूɽँۚΛࢧ෷ͬͯՔಇ ▶ ͞·͟·ͳ෼໺ NLP × 4ɼ਺ཧ࿦ཧֶ × 2ɼ਺ֶ × 1ɼ෺ཧ × 1ɼఱจ × 1 ▶ ͞·͟·ͳֶੜ ߴߍੜ × 1ɼֶ෦ੜ × 1ɼमֶ࢜ੜ × 5ɼതֶ࢜ੜ × 3 https://wtsnjp.com/annotator.html ํ๏ ▶ Ξϊςʔγϣϯର৅͸਺ࣜࣝผࢠ ྫ x, θ, sin ▶ ࿦จͷબ୒͸ΞϊςʔλͷࡋྔʢҰ෦ࢦఆʣ ▶ ΞϊςʔγϣϯΨΠυϥΠϯΛ༻ҙ https://github.com/wtsnjp/MioGatto/wiki/Annotator’s-Guide 8 / 15

Slide 9

Slide 9 text

MioGatto ʹΑΔ਺ࣜάϥ΢ϯσΟϯάσʔληοτͷߏங Ξϊςʔγϣϯ݁Ռ ਺ࣜάϥ΢ϯσΟϯάσʔληοτ ࿦จ ෼໺ ୯ޠ਺ छྨ ग़ݱ ࣙॻ߲໨ ฏۉީิ਺ ৘ใݯ 1 ML 10976 40 937 104 6.4 232 2 NLP 4267 42 266 73 2.6 30 3 NLP 3563 38 433 79 2.5 34 4 ࿦ཧֶ 3567 46 1648 64 1.9 30 5 ୅਺ֶ 13154 141 4629 424 5.2 180 6 NLP 2881 25 162 30 2.7 12 7 NLP 5543 31 203 47 2.6 36 8 NLP 4613 23 217 27 1.1 28 9 NLP 6255 34 510 74 2.7 27 10 NLP 5415 73 1175 167 3.3 60 11 NLP 4451 33 237 61 2.9 34 12 NLP 4261 31 186 39 1.7 25 13 NLP 2257 23 124 27 1.2 18 14 ఱจֶ 10032 59 1064 129 4.2 97 15 ఱจֶ 4863 41 561 73 2.3 95 ߹ܭ — 86098 680 12352 1418 — 938 https://sigmathling.kwarc.info/resources/grounding-dataset/ 9 / 15

Slide 10

Slide 10 text

MioGatto ʹΑΔ਺ࣜάϥ΢ϯσΟϯάσʔληοτͷߏங σʔλ෼ੳᶃ ΞϊςʔλؒҰக཰ ৘ใݯͷ਺ͱΞϊςʔλؒҰக཰ʢରΞϊςʔλ Aʣ Ξϊςʔλ A B C D E Ұக཰ (%) — 96.5 87.4 92.1 84.2 κ ஋ ˞ — 0.94 0.80 0.87 0.75 ৘ใݯͷ਺ 232 — — 249 257 ᵋ ॏෳ཰ (%) — — — 80.3 93.4 ˞ࣝผࢠ͝ͱʹܭࢉͨ͠ κ ஋ͷՃॏฏۉʢࢀߟ஋ʣ ▶ ࿦จ 1 ʹ 5 ໊͕ಠཱʹΞϊςʔγϣϯ ▶ ਺ֶ֓೦ɿશһ ▶ ৘ใݯɿΞϊςʔλ A, D, E ͷΈ ▶ ਺ֶ֓೦ͷΞϊςʔλؒҰக཰ɾκ ஋͸े෼ʹߴ͍ ▶ ৘ใݯͱೝࣝ͞ΕΔεύϯҐஔ΋Α͘Ұக 10 / 15

Slide 11

Slide 11 text

MioGatto ʹΑΔ਺ࣜάϥ΢ϯσΟϯάσʔληοτͷߏங σʔλ෼ੳᶄ είʔϓ੾ସ ࿦จ 1 𝐷 E 𝐿 𝑁 𝑇 maximize 𝑝 𝑞 𝑡 t 𝑤 𝑥 x 𝑧 z 𝜃 𝜙 D §1 §2 §3 §4 §5 §6 §7 ࿦จ 15 𝐸 HS IS LS 𝑁 𝑅 𝑆 𝑇 𝑉 𝑊 𝑗 𝑘 𝑙 H 𝒓 §1 §2 §3 §4 §5 είʔϓ੾ସ ʜ จॻ಺Ͱࣝผࢠͷҙຯ͕มΘΔ ▶ είʔϓ੾ସͷ 89.5%͸ಉҰͷηΫγϣϯ಺Ͱൃੜ ▶ Ұ౓੾ΓସͬͨޙʹɼҎલͷείʔϓʹ໭Δ͜ͱ΋ 11 / 15

Slide 12

Slide 12 text

MioGatto ʹΑΔ਺ࣜάϥ΢ϯσΟϯάσʔληοτͷߏங σʔλ෼ੳᶅ άϥ΢ϯσΟϯά৘ใݯ ৘ใݯͷྫ ͜ͷؔ਺ʹɼ৽ͨʹ਺ࣈͷը૾ x Λೖྗ͢Δͱɼ ໨ඪϕΫτϧͱූ ߸Խͷ࢓ํ͕౳͍͠ग़ྗϕΫτϧ y ͕ग़ྗ͞ΕΔɽ (PRML, p. 2) ऩूͨ͠ 938 ৘ใݯͷ෼ੳ ▶ 76.5%͕ࣝผࢠΑΓઌߦ ▶ ࣝผࢠͱ৘ใݯͷڑ཭͸ ฏۉ 14.7 ୯ޠ ᵋ தԝ஋͸ 0ʙ4 ୯ޠ యܕతʹ͸௚લͷಉ໊֨ࢺ ৘ใݯͷҐஔ 718 220 0 200 400 600 800 前 後 ࣝผࢠͱ৘ใݯͷڑ཭ 距離(単語数) 0 100 200 300 400 500 0 1 2 <10 <100 >=100 12 / 15

Slide 13

Slide 13 text

MioGatto ʹΑΔ਺ࣜάϥ΢ϯσΟϯάσʔληοτͷߏங ࠓޙͷ՝୊ Ξϊςʔγϣϯίετͷ௿ݮ ▶ ಉ͡࿦จʹෳ਺ਓͰΞϊςʔγϣϯ͢Δͷ͸େม ˠ શର৅࿦จʹ͍ͭͯ͸Ұக཰ΛܭࢉͰ͖͍ͯͳ͍ ▶ ෼໺ؒͷൺֱΛߦ͏ʹ͸਺͕଍Γͳ͍ ▶ ਺ֶ΍෺ཧͰ͸਺ࣜͷ਺͕ଟ͗͢ ˠ ͢΂ͯखಈͰ͸ඇݱ࣮తɽࣙॻ׬੒Λ༏ઌ ▶ ࿦ཧֶͷ࿦จ͸ಛʹ Notation ͕ಛघ ˠ ਺ࣈ΍ԋࢉࢠʹ΋ᐆດੑɽࣝผࢠ͚ͩͰ͸ෆे෼ ະղܾͷϦαʔνɾΫΤενϣϯ ▶ ஶऀΞϊςʔγϣϯͱಡऀΞϊςʔγϣϯͷൺֱ ▶ ෼໺֎ͷਓͰ΋άϥ΢ϯσΟϯά͸ਖ਼͘͠Ͱ͖Δʁ 13 / 15

Slide 14

Slide 14 text

MioGatto ʹΑΔ਺ࣜάϥ΢ϯσΟϯάσʔληοτͷߏங άϥ΢ϯσΟϯάࣗಈԽͷํ਑ 3 εςοϓͰࣗಈԽ 1. จॻ಺άϥ΢ϯσΟϯά৘ใݯͷಛఆɾநग़ ᵋ ύλʔϯϚονʴ඼ࢺ෼ղʢಉ໊֨ࢺʣར༻ 2. จॻ಺৘ใݯͷΫϥελϦϯάʹΑΔʮࣙॻʯੜ੒ ᵋ Short Text Clustering ख๏ [Jiaming+, 2017] ͷద༻ 3. จॻதͷ֤਺ࣜτʔΫϯͱʮࣙॻʯ߲໨ͷؔ࿈෇͚ ᵋ ύλʔϯϚονʴ඼ࢺ෼ղʴ෼ྨϞσϧ ৘ใݯͷநग़ ʮࣙॻʯੜ੒ ؔ࿈෇͚ ൓෮ɾվળ ఏҊσʔληοτ ֦ॆ ධՁ 14 / 15

Slide 15

Slide 15 text

MioGatto ʹΑΔ਺ࣜάϥ΢ϯσΟϯάσʔληοτͷߏங ࢀߟจݙ ▶ Akiko Aizawa, et al. “NTCIR-10 Math Pilot Task Overview.” In Proceedings of NTCIR-10 (2013). ▶ Maria Alexeeva, et al. “MathAlign: Linking Formula Identifiers to their Contextual Natural Language Descriptions”. Proceedings of LREC 2020. ▶ Takuto Asakura, et al. “Towards Grounding of Formulae.”. In Proceedings of SDP 2020. ▶ Takuto Asakura, et al. “MioGatto: A Math Identifier-oriented Grounding Annotation Tool.” In 13th MathUI Workshop at 14th Conference on Intelligent Computer Mathematics (MathUI 2021). ▶ Christopher M Bishop. Pattern Recognition and Machine Learning (2006). ▶ Xu, Jiaming, et al. “Self-taught convolutional neural networks for short text clustering.” Neural Networks 88 (2017). ▶ Michael Kohlhase and Mihnea Iancu. “Co-representing structure and meaning of mathematical documents” (2014). 15 / 15