Slide 1

Slide 1 text

Heterogeneous Graph Neural Networks for Extractive Document Summarization

Slide 2

Slide 2 text

಺ڮ ݎࢤ uchi_k @__uchi_k__ About me yuni, inc. ୅ද nlpaper.challenge ӡӦ Freelance Machine Learning ɹɹɹɹɹEngineer / Researcher former ژେ৘ใӃ, ະ౿16 FreakOut Machine Learning Engineer

Slide 3

Slide 3 text

nlpaper.challenge ࣗવݴޠॲཧͷ࿩Λ͍Ζ͍Ζ͢ΔࣾձਓɾֶੜɾݚڀऀͷίϛϡχςΟ ʢϘϥϯςΟΞத৺ͰӡӦʣ "$-ͷશ෼໺໢ཏΛ໨ࢦͯ͠ɺ"$-ެࣜʹ͋Δ෼໺ʹै͍ɺͷ෼ ໺Λઃఆͯ͠ɺͦΕͧΕͷνʔϜʹ෼͔ΕͯαʔϕΠ ೥͸ຊఔ౓ͷ࿦จΛಡΈɺٞ࿦΍-5ձͳͲΛ͍ͯ͠·ͨ͠

Slide 4

Slide 4 text

ACL2020 ੜ੒ܥɺάϥϑܥͷ࿦จ͕͔ͳΓ૿͑ͨҹ৅ #&35 3P#&35B౳ͷࣄલֶशݴޠϞσϧʹؔ͢Δݴٴ͕΄΅ඞͣ͋Δ ࠶ݱੑͷࢹ఺΍࣮຿΁ͷԠ༻͔Βɺࢦඪͷݟ௚͕͠ਐΜͩ ϕετϖʔύʔ΋ɺ/-1λεΫͷςετέʔεΈ͍ͨͳ΋ͷΛఆ ٛͯ͠௨ա཰ΛݟΑ͏Έ͍ͨͳ࿩Λ͍ͯͨ͠Γ ,OPXMFEHFHSBQIʹճؼͯ͠ɺάϥϑ্Ͱͷԋࢉ΍άϥϑߏ଄ɺֶ शΛߦ͏Α͏ͳ࿩͕૿Ճ Ҏ্ɺࢲݟͰͨ͠

Slide 5

Slide 5 text

)FUFSPHFOFPVT(SBQI/FVSBM/FUXPSLT GPS&YUSBDUJWF%PDVNFOU4VNNBSJ[BUJPO #abstract จॻཁ໿Ͱ͸ɺηϯςϯεؒͷؔ܎ੑͷϞσϧԽ͕ ඇৗʹॏཁɻैདྷ͸ɺ3//ϕʔεͷख๏ͰܥྻͰ ϞσϧԽ͍ͯͨ͠ %BORJOH8BOH 4IBOHIBJ,FZ-BCPSBUPSZPG*OUFMMJHFOU*OGPSNBUJPO1SPDFTTJOH 'VEBO6OJWFSTJUZ FUBM "$- நग़తจॻཁ໿Ͱηϯςϯεؒͷؔ܎ੑΛදݱ͢ΔͨΊʹ IFUFSPHFOFPVTHSBQIΛಋೖ͠ɺ4P5"Λୡ੒֦ுੑͳͲʹ͍ͭͯݕূͨ͠ɻ จॻͷҙຯߏ଄͸ܥྻΑΓάϥϑߏ଄ͷํ͕దͯ͠ ͍Δ͜ͱ͕࠷ۙͷݚڀͰΘ͔͖͍ͬͯͯΔ͕ɺྑ͍ άϥϑߏ଄͸·ͩఏҊ͞Ε͍ͯͳ͔ͬͨ ୯ޠϊʔυͱจϊʔυΛ࣋ͭIFUFSPͳHSBQIߏ ଄ΛఏҊ͠ɺ୯จॻɾଟจॻཁ໿ͦΕͧΕͰ 4P5"Λୡ੒ɻ֦ுੑʹ͍ͭͯ΋ٞ࿦ͨ͠

Slide 6

Slide 6 text

#abstract #extractive document summarization ݩͷจॻ͔Βؔ࿈͢ΔจॻΛऔΓग़ͯ͠ɺཁ໿ ͱͯ͠࠶ߏ੒͢ΔλεΫ நग़తจॻཁ໿ ୯ޠΛܦ༝ͨ͠จͷؔ܎ੑΛදݱ͢ΔIFUFSPHSBQIΛఆٛ υΩϡϝϯτͷ֤ηϯςϯεΛ#JEJSFDUJPOBM-45.ͰϕΫτϧԽɻ͜Ε ʹΑͬͯηϯςϯεͷҙຯΛଊ͑ͨϕΫτϧ͕࡞ΒΕΔʢXPSEMBZFSʣ நग़ܕͱɺදݱΛந৅Խͯ͠θϩ͔Βཁ໿จΛ ࡞Δੜ੒ܕɺͦΕΒͷࠞ߹ͷύλʔϯ͕͋Δ ͞Βʹ͜ͷϕΫτϧಉ࢜ͷؔ܎ੑΛ#JEJSFDUJPOBM-45.Ͱֶश͢Δ ʢTFOUFODFMBZFSʣ ηϯςϯεΛநग़͢Δ֬཰Λग़ྗ 4VNNB3V//FS ॳظͷݚڀ

Slide 7

Slide 7 text

)FUFSPHFOFPVT(SBQI ࣮ੈքͷάϥϑ͸IFUFSPHFOFPVTͳ΋ͷ͕ଟ͍ ࣮ੈքͷάϥϑ͸ɺҟͳΔಛ௃ۭؒͷ༷ʑͳλΠϓͷϊʔυɾΤοδͰ ߏ੒͞Ε͍ͯΔ #abstract #heterogeneous graph

Slide 8

Slide 8 text

#model overview ηϯςϯεͷΈΛϊʔυͱͯ͠άϥϑΛߏங͢ ΔͷͰ͸ͳ͘ɺηϯςϯεΛͭͳ͙஥հ໾ͷΑ ͏ͳϊʔυΛ௥Ճ 1SPQPTFE(SBQI ୯ޠΛܦ༝ͨ͠จͷؔ܎ੑΛදݱ͢ΔIFUFSPHSBQIΛఆٛ จ৘ใͰ୯ޠϊʔυΛߋ৽Ͱ͖Δ ଞͷϊʔυλ ΠϓΛ௥Ճ͢ΔͳͲͷ֦ுੑ͕͋ΔɺͳͲͷར ఺ ͜ͷ࿦จͰ͸ɺ࠷খҙຯ୯ҐΛ୯ޠʹ͍ͯ͠ Δɻྫ͑͹ɺΑΓந৅Խͯ͠୯ޠͷҙຯ΍֓೦ ΛϊʔυλΠϓͱ͢Δ͜ͱ΋໘നͦ͏ HSBQIJOJUJBMJ[Fˠ("5Ͱߋ৽ˠηϯςϯε ಛ௃͔Βཁ໿จʹ௥Ճ͢Δ͔൱͔ͷ෼ྨ໰୊Λ ղ͘ɺͱ͍͏खॱ

Slide 9

Slide 9 text

#model overview #learning step HSBQIJOJUJBMJ[FSͰɺจʹΧʔωϧαΠζͷҟ ͳΔ$//Λద༻ͯ͠OHSBNಛ௃Λநग़ʢہ ॴಛ௃ʣɺ࣍ʹ#J-45.ͰηϯςϯεϨϕϧͷ ಛ௃Λநग़ʢେҬಛ௃ʣ 1SPQPTFE(SBQI ֶशखॱͱNPEFMPWFSWJFX ୯ޠϊʔυͱจϊʔυͷؔ܎ੑʹؔ͢Δ৘ใͱ ͯ͠ɺUGJEGΛΤοδಛ௃Ͱ࢖༻͢Δ άϥϑಛ௃͸(SBQI"UUFOUJPO/FUXPSLͰ ߋ৽

Slide 10

Slide 10 text

#model overview #graph attention network ࣗ਎ͱपғʹͦΕͧΕॏΈΛ͔͚ͨϕΫτϧ͔ΒBUUFOUJPOΛܭࢉ ͠ɺपลϊʔυ͔ΒͷBHHSFHBUJPOʹར༻ (SBQI"UUFOUJPO/FUXPSL άϥϑ্ͰͷBUUFOUJPOΛఆٛ "UUFOUJPO ྡ઀ϊʔυ "UUFOUJPOΛܭࢉ͢Δؔ਺ "UUFOUJPOΛߟྀͨ͠ BHHSFHBUJPO άϥϑू໿ͷڑ཭ؔ਺Λɺάϥϑߏ଄ʹґଘ͠ͳ͍BUUFOUJPOͱͯ͠ ఆֶٛ͠शϕʔεͰٻΊΔɺΈ͍ͨͳ࿩ ϊʔυಛ௃

Slide 11

Slide 11 text

#dataset #train test split %BUBTFU ୯จॻཁ໿Ͱ͸ͭɺෳ਺จॻཁ໿Ͱ͸ͭͷσʔληοτͰ࣮ݧ • ୯จॻཁ໿Ͱ࠷΋޿͘ར༻͞Ε͍ͯΔϕϯνϚʔΫσʔληοτ • USBJO WBMJE UFTUσʔλ͸ͦΕͧΕ $//%BJMZ.BJM2"σʔλ • /FX:PSL5JNFT"OOPUBUFE$PSQVT 4BOEIBVT ͔Βऩू͞Εͨ୯จॻཁ໿ σʔληοτ • USBJO WBMJE UFTUσʔλ͸ͦΕͧΕ ݅ /:5 .VMUJ/FXT • ෳ਺จॻཁ໿σʔληοτ • ͦΕͧΕʙͷจॻʹର͠ɺਓ͕ؒॻ͍ͨཁ໿͕͋Δ • USBJO WBMJE UFTUσʔλ͸ͦΕͧΕ

Slide 12

Slide 12 text

#experiment #setting #hyper-parameter #preprocessing 4FUUJOH)ZQFSQBSBNFUFST લॲཧ άϥϑ ࣮ݧ ετοϓϫʔυ΍۟ಡ఺ͷআڈ ೖྗจॻͷ࠷େ௕Λจʹ ઃఆ UGJEGԼҐΛআڈ ޠኮ਺Λʹ੍ݶ ࣍ݩͷ(MP7FͰຒΊࠐΈ จϕΫτϧαΠζ͸ͰॳظԽ Τοδಛ௃ྔ ࣍ݩ͸ͰॳظԽ IFBE όοναΠζ ֶश཰F "EBN FQPDIͰMPTT ͕Լ͕Βͳ͍৔߹FBSMZTUPQQJOH ୯จॻཁ໿Ͱ͸্Ґจ ෳ਺จॻཁ໿Ͱ͸্ҐจΛબ୒

Slide 13

Slide 13 text

#methods #extractor • &YU#J-45. ◦$//૚#J-45. ◦จॻΛจͷܥྻͱΈͳ͠จؔ܎Λֶश͢Δ • &YU5SBOTGPSNFS ◦5SBOTGPSNFS૚USBOTGPSNFS ◦શจͷϖΞϫΠζ૬ޓ࡞༻Λֶश ◦จϨϕϧͷ׬શ࿈݁άϥϑͱΈͳͤΔ • )4( )FUFS4VN(SBQI ◦ఏҊख๏ɻจ୯ޠจͷؔ܎ੑΛάϥϑͰϞσϧԽ ◦)4(Ͱ͸ϊʔυ෼ྨʹΑͬͯཁ໿จΛબ୒͠ɺ͞ΒʹUSJHSBN CMPDLJOHʹΑͬͯUSJHSBN͕ࣅ͍ͯΔจΛআ֎͠৑௕ੑΛ཈͑ͨόʔ δϣϯ΋࣮ݧ .FUIPET

Slide 14

Slide 14 text

#result #CNN/DailyMail 3FTVMUʢ୯จॻཁ໿ɿ$//%BJMZ.BJMʣ $//%BJMZ.BJMͰͷ୯จॻཁ໿ͷ݁Ռɻطଘख๏͢΂ͯΛ্ճΔείΞ͕ಘΒΕͨɻ -&"%͕ϕʔεϥΠϯɺ 03"$-&͕VQQFSCPVOE MBCFM QSFWJPVTTUVEZ QSPQPTFENFUIPE จ຺όϯσΟου໰୊ͱͯ͠ఆٛ ͨ͠)&3ʹؔͯ͠͸ಛʹϙϦ γʔ͋Γͳ͠΋࣮ݧ͠ɺ͍ͣΕ ΋উͪ ʢ#&35Λ࢖͍ͬͯͳ͍ʣશͯͷطଘख๏ΑΓߴ͍είΞ͕ಘΒΕͨ 306(& -ͰධՁɻͦΕ ͧΕHSBN HSBN Ұக͢Δ ࠷௕ܥྻͷྨࣅ౓ͷείΞ

Slide 15

Slide 15 text

#result #CNN/DailyMail 3FTVMUʢ୯จॻཁ໿ɿ$//%BJMZ.BJMʣ จܥྻ΍׬શ઀ଓάϥϑΛར༻ͨ͠ख๏ͱൺ΂Δ͜ͱͰɺ IFUFSPHSBQIߏ଄ͷ༗༻ੑ͕ࣔ͞Εͨɻ &YUNFUIPE QSPQPTFENFUIPE จܥྻ΍ɺ׬શ઀ଓάϥϑΛ࢖ͬ ͨ&YU#J-45. &YU 5SBOTGPSNFSΑΓߴ͍είΞ IFUFSPHSBQIΛ࢖͏͜ͱͰɺ ηϯςϯεؒͷෆཁͳ݁߹ΛޮՌ తʹআڈͰ͖͍ͯΔ

Slide 16

Slide 16 text

#result #NYT50 3FTVMUʢ୯จॻཁ໿ɿ/:5ʣ /:5Ͱͷ୯จॻཁ໿ͷ࣮ݧ݁Ռɻ$//%BJMZ.BJMͱجຊతʹಉ͡܏޲͕ݟΒΕͨɻ جຊతʹ$//%BJMZ.BJM ͱಉ͡ͰɺఏҊख๏͕طଘ ख๏Λ্ճ͍ͬͯΔ QSPQPTFENFUIPE USJHSBNCMPDLJOH͋Γ όʔδϣϯ͕ҐͰ͸ͳ͍ ͷ͸ͳͥɾɾɾʁ ˠ$//%BJMZ.BJMͰ͸ॏෳͷ গͳ͍Օ৚ॻ͖Λ࿈݁͢Δܗࣜ ͕ͩɺ/:5Ͱ͸Ωʔϑ Ϩʔζ͕ෳ਺ճొ৔͢ΔͳͲॏ ෳ͕͋ΔɻͳͷͰɺUSJHSBN CMPDLJOHͰ͸/:5Ͱε ίΞΛग़ͮ͠Β͍ͷͰ͸

Slide 17

Slide 17 text

#ablation #CNN/DailyMail ୯ޠϑΟϧλϦϯάͷ࡟আͰ 3 3-͸είΞݮগ 3 ͸είΞ૿Ճ "CMBUJPO $//%BJMZ.BJMͰBCMBUJPO͠Ϟδϡʔϧͷߩݙ౓Λௐ΂ͨɻ ୯ޠϑΟϧλϦϯάʹΑΓɺಛʹॏཁͳ୯ޠϊʔυʹϑΥʔΧεͰ͖Δར఺ ͕CJHSBN৘ใΛࣦ͏σϝϦοτΛ্ճ͍ͬͯΔͷͰ͸ͳ͍͔ ("5૚ؒͷSFTJEVBM DPOOFDUJPOΛ࡟আ͢Δ͜ͱͰ είΞ͕େ͖͘ݮগ ("5૚ͷSFTJEVBMDPOOFDUJPO͸ɺIFUFSPHSBQIʹ͓͚ΔผλΠϓͷ ϊʔυ͔Βͷू໿Ͱཧ࿦తʹॏཁͳͷͰ୯ͳΔ݁߹Ͱ͸ஔ͖׵͑Ͱ͖ͳ͍

Slide 18

Slide 18 text

#result #multidocument )4( )%4(ڞʹطଘख๏Λ্ճ ΔείΞ͕ಘΒΕ͍ͯͯɺಛʹ )%4(ͰείΞ্ঢ͕େ͖͍ 3FTVMUʢଟจॻཁ໿ʣ ଟจॻཁ໿Ͱ΋จॻϊʔυΛ௥Ճͨ͠ఏҊख๏Ͱݕূ จॻϊʔυͷ௥Ճ͕ଟจॻཁ໿ʹ ޮՌతͰ͋Δ͜ͱ͕ࣔࠦ USJHSBNCMPDLJOH͕ޮ͍͍ͯͳ͍ ͷ͸ɺ͓ͦΒ͖ͬ͘͞ͱಉ͡ཧ༝ ఏҊख๏Ͱ͸୯ʹϊʔυλΠϓΛ௥Ճ͢Δ͚ͩͰผλεΫʹԠ༻Ͱ͖͓ͯ Γɺൃలੑ͕ߴ͍ QSPQPTFENFUIPE

Slide 19

Slide 19 text

#qualitative analysis #degree ୯ޠϊʔυͷ౓਺͕ߴ͍ͱɺͦͷ୯ޠ ͷग़ݱ਺͕ଟ͍ͱ͍͏͜ͱʹͳΓจॻ ͷ৑௕౓Λʢଟগʣද͢ 2VBMJUBUJWF"OBMZTJT ୯ޠϊʔυͷ౓਺͕༩͑ΔӨڹΛௐࠪ ୯ޠϊʔυ͕͋Δ͜ͱͰɺจ৘ใͷू໿ͱେҬදݱͷ఻೻͕ߦΘΕ͍ͯΔՄ ೳੑ͕ࣔࠦ͞ΕΔ ୯ޠͷ౓਺ͱ306(&͕ൺྫ ˠ৑௕ੑͷߴ͍จॻ΄Ͳཁ໿͠қ͍ ౓਺͕ߴ͍ͱෳ਺ͷจͷ৘ใΛू໿͢ Δ͜ͱ͕Ͱ͖ɺϞσϧͷԸܙΛΑΓڧ ͘ड͚Δ͜ͱ͕Ͱ͖Δͱߟ͑ΒΕΔ

Slide 20

Slide 20 text

#qualitative analysis #source จॻ͕૿Ճ͢Δ͜ͱͰɺϕʔεϥΠϯ ͸্ঢ͢Δ͕ఏҊख๏Ͱ͸௿Լ͠ จͰฒͿ 2VBMJUBUJWF"OBMZTJT ଟจॻཁ໿Ͱɺจॻͷ਺ͷӨڹΛௐࠪ จॻ਺ͷ૿ՃͰ)&5&346.(3"1)ͱ)&5&3%0$46.(3"1)ͷੑ ೳ͕֦ࠩେจॻͱจॻͷؔ܎͕ෳࡶʹͳΔ΄Ͳɺจॻϊʔυͷར఺͕Α Γେ͖͘ͳΔ 'JSTU͸ɺΧόϨοδΛ֬อͰ͖Δ จষΛ֤จॻ͔Βڧ੍తʹநग़Ͱ͖Δ จॻ਺ͷ૿Ճʹ൐͍ɺશจͷओࢫΛΧ όʔͰ͖ΔݶΒΕͨ਺ͷจΛநग़͢Δ ͜ͱ͕ࠔ೉ʹͳ͍ͬͯͨ͘Ί

Slide 21

Slide 21 text

#key points ·ͱΊ IFUFSPHSBQIΛ࢖͏͜ͱͰɺจॻཁ໿ʹpOFHSBJOFEͳҙຯ୯Ґ Λಋೖ͢Δ͜ͱ͕Ͱ͖ɺจɾจষؒͷؔ܎ੑͷϞσϦϯά΁ͷ༗ޮੑ ͕͔֬ΊΒΕͨ ख๏ͷ֦ுੑ͸ߴ͘ɺ୯จॻཁ໿͔ΒϊʔυλΠϓͷ௥ՃͷΈͰଟจ ॻཁ໿ʹରԠՄೳ IFUFSPHSBQIʹಛԽͨ͠ख๏ʢϝλύεΛ࢖ͬͨαϒάϥϑͷఆ ٛɺIFUFSPHSBQIʹର͢ΔBUUFOUJPO౳ʣΛࢼ͢ͱ໘ന͍͔΋ ࠓޙ͸#&35౳ࣄલֶशϞσϧΛ͍Ζ͍Ζݕ౼͍ͨ͠ͱͷ͜ͱ චऀ΋ܰ͘৮Ε͍͕ͯͨɺ୯ޠϊʔυʹ౰ͨΔ෦෼͕ҙຯϊʔυ·Ͱ ந৅Խ͞ΕͨΓͨ͠Βख๏ͷ༏Ґੑ͕ΑΓ׆͔͞ΕΔͱࢥ͏ɻͦ͏Ͱ ͳͯ͘΋ɺϊʔυλΠϓͷ௥Ճ͸͍Ζ͍Ζࢼͤͦ͏