Upgrade to Pro — share decks privately, control downloads, hide ads and more …

ELMoで文脈に応じた類似キーワード検索システムを作った話 / ELMo for Searching Similar Keywords

tagucci
March 24, 2020

ELMoで文脈に応じた類似キーワード検索システムを作った話 / ELMo for Searching Similar Keywords

ML@Loft #11. 類似画像/テキスト検索

tagucci

March 24, 2020
Tweet

Other Decks in Technology

Transcript

 1. ே೔৽ฉͷݚڀ։ൃ!ϝσΟΞϥϘ 2 ࣗಈཁ໿ɺࣗಈߍਖ਼ͳͲࣗવݴޠॲཧͷجૅݚڀΛߦ͍ͬͯ ΔɻΞ΢τϓοτͷܗࣜ͸ֶձൃද͕ϝΠϯ ݚڀدΓ ઌ೔ͷ/-1Ͱ͸ݴޠࢿݯ৆Λ͍͖ͨͩ·ͨ͠ ݚڀ੒Ռ͸ཁ໿"1*ͱͯ͠ެ։தͳͷͰ࢖ͬͯΈ͍ͯͩ͘͞ʂ .-!-PGU͸ओʹ"84্ͰػցֶशϫʔΫϩʔυΛ ӡ༻͍ͯ͠ΔσϕϩούʔɾσʔλαΠΤϯςΟετ

  ͷͨΊͷʮ͓೰Έ૬ஊձʯͰ͢ɻ݄ʹճ"84-PGU 5PLZPͰ։࠵͠ɺຖճ׆ൃͳٞ࿦͕ߦΘΕ͍ͯ·͢ɻ ୈճ͸ྨࣅը૾ݕࡧΛςʔϚʹɺಛ௃ྔʹམͱ͠ࠐ ΉωοτϫʔΫͷ࿩΍ɺಛ௃ྔݕࡧͷ଎౓ɾՄ༻ੑ޲ ্ͷ࿩ɺྨࣅը૾ݕࡧͷγεςϜઃܭશମͷ࿩ͳͲʹ ͍ͭͯٞ࿦͠·͢ɻ͸͡Ίʹొஃऀͷํʑ͔Β෼ͣ ͭͷࣗݾ঺հ-5ܗࣜͰྨࣅը૾ݕࡧʹ͓͚Δ՝୊Λ໰ ୊ఏى͍͍ͨͩͨޙɺࢀՃऀͷօ͞Μ͔Β௖͍࣭ͨ໰ ΛݩʹύωϧσΟεΧογϣϯΛ࣮ࢪ͠·͢ɻ ϞόΠϧ޲͚χϡʔεΞϓϦ༻ จࣈ Ͱੜ੒ "84Ͱʮ͓೰Έ૬ஊձʯ -PGU͕ʮ͓೰Έ૬ஊձʯ "84-PGUͰ͓೰Έ૬ஊ -4-PGUͰ͓೰Έ૬ஊձ .-!-PGU͕૬ஊձ ީิΛͭग़ྗ χϡʔεهࣄͰ͸ͳͯ͘΋͋Δఔ౓ͦΕΒ͍͠΋ͷ͕ग़ͤΔ https://cl.asahi.com/ "1*Λ࢖ͬͯΈ͍ͨਓ͸ͥͻͪ͜Β΁ ©The Asahi Shimbun Company 2020
 2. 6 ©The Asahi Shimbun Company 2020 2SJDIʹ͓͚Δ࢛୒ΫΠζͷ࡞੒ϑϩʔ هࣄσʔλͷऔಘ هࣄຊจ͔Β݀ൈ͖͢ΔΩʔϫʔυΛબ୒ ॏཁޠநग़

   બ୒ͨ͠Ωʔϫʔυʹྨࣅ͢Δ୯ޠΛݕࡧ͢Δ ͱΛ߹Θ࢛ͤͯ୒໰୊ͱ͢Δ ࠓ೔͸࿩͢ͷ͸͜ͷ෦෼
 3. 7 ©The Asahi Shimbun Company 2020 2SJDIʹ͓͚Δ࢛୒ΫΠζͷ࡞੒ϑϩʔ هࣄσʔλͷऔಘ هࣄຊจ͔Β݀ൈ͖͢ΔΩʔϫʔυΛબ୒ ॏཁޠநग़

   બ୒ͨ͠Ωʔϫʔυʹྨࣅ͢Δ୯ޠΛݕࡧ͢Δ ͱΛ߹Θ࢛ͤͯ୒໰୊ͱ͢Δ ࠓ೔͸࿩͢ͷ͸͜ͷ෦෼ ˞ʮΩʔϫʔυʯͷཻ౓͸/&PMPHE
 ࣙॻͰ෼ׂ͞Ε໊ͨࢺΛࢦ͠·͢
 4. 8 ©The Asahi Shimbun Company 2020 XPSEWFDʹΑΔྨࣅΩʔϫʔυݕࡧ ே೔৽ฉͷهࣄσʔλΛ༻͍ͯXPSEWFD TLJQHSBN ͷ


  ୯ޠϕΫτϧΛߏங ୯ޠϕΫτϧ͸HFOTJNͰಡΈࠐΊΔͷͰ؆୯ʹۙ๣ͷΩʔ ϫʔυ͕ݕࡧՄೳ
 5. 11 ©The Asahi Shimbun Company 2020 XPSEWFDͩͱࠔΔϙΠϯτ ͦ΋ͦ΋ྑ͍දݱ͕֫ಘͰ͖͍ͯͳ͍ Ωʔϫʔυͷ୯ޠϕΫτϧͷۙ๣L୯ޠ͸ৗʹಉ͡ ถΧϦϑΥϧχΞभͷશҬʹ̍̕೔໷ɺ࣮࣭తͳ֎ग़ېࢭ

  ໋ྩ͕ग़͞Εͨɻϩαϯθϧε΍αϯϑϥϯγείͳͲͷ େ౎ࢢͷ΄͔ɺγϦίϯόϨʔΛ༴͠ɺ׆ؾ͋Δ๛͔ͳ஍ Ҭͱͯ͠஌ΒΕΔಉभ͕ͩɺܦࡁ׆ಈΑΓ৽ܕίϩφ΢Π ϧεରࡦΛ༏ઌͨ͠ܗͩɻഎܠʹ͸ઌखΛଧͭ͜ͱͰײછ ֦େΛ৯͍ࢭΊΔૂ͍͕͋Δɻ هࣄຊจ ૂͬͨ෦෼ͷҨ఻ࢠΛࣗ༝ʹվมͰ͖ΔήϊϜฤूٕज़ɻ ಛʹɺ೥ʹొ৔ͨ͠ʮ$3*413$BTʢΫϦεύʔ ΩϟεφΠϯʣʯ͸ɺϊʔϕϧ৆ީิͱ΋ݴΘΕΔɻҩྍ ΍৯඼ͳͲ͞·͟·ͳ෼໺ͰԠ༻͕ظ଴͞ΕΔҰํɺͦͷ ಛڐΛ८ͬͯੈքதͰ૪͍͕܁Γ޿͛ΒΕ͍ͯΔɻڊֹͷ རӹʹͭͳ͕ΔՄೳੑ͕͋Δಛڐͷߦํ͸ʕʕɻ
 6. 12 ©The Asahi Shimbun Company 2020 XPSEWFDͩͱࠔΔϙΠϯτ ͦ΋ͦ΋ྑ͍දݱ͕֫ಘͰ͖͍ͯͳ͍ Ωʔϫʔυͷ୯ޠϕΫτϧͷۙ๣L୯ޠ͸ৗʹಉ͡ ถΧϦϑΥϧχΞभͷશҬʹ̍̕೔໷ɺ࣮࣭తͳ֎ग़ېࢭ

  ໋ྩ͕ग़͞Εͨɻϩαϯθϧε΍αϯϑϥϯγείͳͲͷ େ౎ࢢͷ΄͔ɺγϦίϯόϨʔΛ༴͠ɺ׆ؾ͋Δ๛͔ͳ஍ Ҭͱͯ͠஌ΒΕΔಉभ͕ͩɺܦࡁ׆ಈΑΓ৽ܕίϩφ΢Π ϧεରࡦΛ༏ઌͨ͠ܗͩɻഎܠʹ͸ઌखΛଧͭ͜ͱͰײછ ֦େΛ৯͍ࢭΊΔૂ͍͕͋Δɻ هࣄຊจ ૂͬͨ෦෼ͷҨ఻ࢠΛࣗ༝ʹվมͰ͖ΔήϊϜฤूٕज़ɻ ಛʹɺ೥ʹొ৔ͨ͠ʮ$3*413$BTʢΫϦεύʔ ΩϟεφΠϯʣʯ͸ɺϊʔϕϧ৆ީิͱ΋ݴΘΕΔɻҩྍ ΍৯඼ͳͲ͞·͟·ͳ෼໺ͰԠ༻͕ظ଴͞ΕΔҰํɺͦͷ ಛڐΛ८ͬͯੈքதͰ૪͍͕܁Γ޿͛ΒΕ͍ͯΔɻڊֹͷ རӹʹͭͳ͕ΔՄೳੑ͕͋Δಛڐͷߦํ͸ʕʕɻ ʮγϦίϯόϨʔʯͷۙ๣ΛXPSEWFDͰऔಘ͢Δͱɺ
 *5اۀ
 αϯϊθ
 όΠΦϕϯνϟʔ
 ͱ͍͏݁Ռʹ ʮήϊϜฤूʯͷۙ๣ΛXPSEWFDͰऔಘ͢Δͱɺ
 &4ࡉ๔
 ώτ
 ສೳࡉ๔
 ͱ͍͏݁Ռʹ
 7. 13 ©The Asahi Shimbun Company 2020 ιϑτόϯΫ͸̑೔ɺ࣍ੈ୅ͷߴ଎Ҡಈ௨৴ํࣜʮ̜̑ʯͷαʔ ϏεΛࠓ݄̎̓೔ʹ࢝ΊΔͱൃදͨ͠ɻ͍·ͷ̜̐ͷྉۚʹɺ݄ ֹ̍ઍԁʢ੫ൈ͖ʣΛ௥Ճ͢Ε͹࢖͑ΔΑ͏ʹ͢Δɻܞଳి࿩େ ख̏ࣾͰ̜̑ͷ։࢝࣌ظͱྉۚΛൃදͨ͠ͷ͸ॳΊͯɻ

  ιϑτόϯΫͷ౦඿ڊ౤ख͕ࠓقͷ։ນ౤खΛ຿ΊΔ͜ ͱ͕̒೔·Ͱʹܾ·ͬͨɻϓϩ̔೥໨Ͱॳͷେ໾ͱͳ Δɻ̎̌೔ͷ։ນઓ͸ຊڌ஍ͷ෱ԬϖΠϖΠυʔϜͰ ϩοςͱରઓ͢Δɻ ถάʔάϧͷ਌ձࣾʮΞϧϑΝϕοτʯͷ࣌Ձ૯ֹ͕̍ ̒೔ɺ̍ஹυϧʢ̍̍̌ஹԁʣΛಥഁͨ͠ɻถاۀͰ ͸ɺΞοϓϧɺΞϚκϯɺϚΠΫϩιϑτʹଓ͖̐ࣾ໨ ʹͳΔɻ೔ຊͷ࣌Ձ૯ֹτοϓͷτϤλࣗಈंʢ̎̑ஹ ԁʣͳͲɺ೔ຊ੎ͱͷ͕ࠩ·͢·֦͢େ͍ͯ͠Δɻ ೥͔Β೥ʹ͔͚ͯɺੈք֤஍Ͱʮաڈ࠷ѱʯͱ΋ݺ͹Ε Δ৿ྛՐࡂ͕૬࣍͗·ͨ͠ɻ஍ٿԹஆԽ͕ҰҼͱ΋ࢦఠ͞Ε͍ͯ ·͕͢ɺݱ৔Ͱऔࡐͯ͠ΈΔͱผͷଆ໘΋ݟ͖͑ͯ·ͨ͠ɻΦʔ ετϥϦΞɺϒϥδϧͷΞϚκϯɺถΧϦϑΥϧχΞͰԿ͕ى͖ ͍ͯΔͷ͔ɺಛ೿һ͕ใࠂ͠·͢ɻ XPSEWFDͩͱࠔΔϙΠϯτ εϙʔπɺܦࡁهࣄʹ͓͚ΔʮιϑτόϯΫʯɻۙ๣୯ޠ͸ͲͪΒ΋ʮָఱʯɺʮ%F/"ʯɺʮΦϦοΫεʯ اۀͷΞϚκϯͱ೤ଳӍྛͷΞϚκϯɻۙ๣୯ޠ͸ͲͪΒ΋ʮωοτ௨ൢʯɺʮάʔάϧʯɺʮΩϯυϧʯ ͦ΋ͦ΋ྑ͍දݱ͕֫ಘͰ͖͍ͯͳ͍ Ωʔϫʔυͷ୯ޠϕΫτϧͷۙ๣L୯ޠ͸ৗʹಉ͡
 8. 16 ©The Asahi Shimbun Company 2020 จ຺ʹԠͨ͡୯ޠϕΫτϧ &-.PϕΫτϧ .- !

  -PGU ͸ ओ ʹ "84 ʜ DIBS$//Λ࢖ͬͯ
 ୯ޠͷϕΫτϧΛදݱ CJ-45.ͷ૚໨ CJ-45.ͷ૚໨ ࠓճ͸CJ-45.ͷ૚໨ͱ૚໨ͷӅΕঢ়ଶΛ݁߹ͯ͠&-.P ϕΫτϧ ࣍ݩ ͱ͢Δ &-.PϕΫτϧ
 9. 17 ©The Asahi Shimbun Company 2020 ͳͥ&-.Pͳͷ͔ʁ XPSEWFD TLJQHSBNDCPX Ҏ֎ͷ୯ޠϕΫτϧͷϞσϧ

  (MP7F GBTU5FYU Ͱ΋୯ޠʹͭͷϕΫτϧ͕͋Δ͚ͩͰ ҙຯͷଟ༷ੑʹରॲͰ͖ͳ͍ #&35<>͸܇࿅ʹ͕͔͔࣌ؒΔ্ʹɺ#1&ʹΑΔ୯ޠ෼ׂ͕ લఏͳͷͰࠓճͷλεΫʹ͸߹Θͳ͍ /&PMPHEࣙॻʹΑΔ෼ׂͰ΋#&35͸ͭ͘ΕΔ͕ίετେ &-.P͸DIBS$//ʹΑͬͯ୯ޠͷϕΫτϧදݱΛͭ͘ΔͷͰ ޠኮαΠζ͕େ͖͘ͳͬͯ΋ରԠ͕Ͱ͖Δ
 10. 18 ©The Asahi Shimbun Company 2020 ɹ&-.PʹಡΈࠐ·ͤΔهࣄΛબͿ &-.PʹಡΈࠐ·ͤΔهࣄΛ֤Ωʔϫʔυ͝ͱʹͭબͿ ෳ਺ͷҙຯʹରॲ͢ΔͨΊɺࠓճ͸શͯͷΩʔϫʔυ͕ ͭͷҙຯΛ࣋ͭͱԾఆ

  ֤୯ޠͰͭͷϕΫτϧΛͭ͘Δ ͳΔ΂͘ҙຯ͕ҟͳΔهࣄΛͭݟ͚͍ͭͨ ʮιϑτόϯΫʯͷεϙʔπهࣄΛͭΛ&-.PʹಡΈࠐ ·ͤͯ΋ࣅͨϕΫτϧ͕Ͱ͖ͯ͠·͏ 5'*%'ͱ48&.<>Λ݁߹ͯ͠จॻΛϕΫτϧԽ͠ɺͭͷ هࣄΛબͿ த৺ʹ࠷΋͍ۙ΋ͷΛީิهࣄ த৺͔Β࠷΋ԕ͍΋ͷΛީิهࣄ
 11. 19 ©The Asahi Shimbun Company 2020 ࣮ࡍʹͲͷΑ͏ͳهࣄ͕બ͹Ε͔ͨ த৺ʹ͋Δ΋ͷ͕ٿஂͷʮιϑτόϯΫʯɺத৺͔Β࠷΋ԕ ͍ͱ͜Ζʹ͋Δ΋ͷ͕اۀͷʮιϑτόϯΫʯ εϓϦϯτΛΊ͙Δ࠶ฤڠٞʹͭ

  ͍ͯιϑτόϯΫͷଙਖ਼ٛձ௕݉ ࣾ௕͸݄ɺถχϡʔϤʔΫͰه ऀஂʹʮண࣮ʹਐΜͰ͍Δʯͱ ޠͬͨɻ ٿஂ͸೔ɺલ޿ౡͷ܀ݪ݈ଠ ಺໺ख ɺલϩοςͷ઒ຊྑ ฏัख ɺલιϑτόϯΫͷ ۚແӳ౤ख ͷબख͕೔ ͔ΒԬࢁݝ૔ෑࢢͰͷळقΩϟ ϯϓʹࢀՃ͠ɺೖஂςετΛड ͚Δͱൃදͨ͠ɻ ީิهࣄ ީิهࣄ̎
 12. 20 ©The Asahi Shimbun Company 2020 ۙ๣ݕࡧ༻ͷΠϯσοΫεΛ࡞੒ ֤ΩʔϫʔυΛͭͷهࣄΛ࢖ͬͯ&-.PϕΫτϧΛऔಘ͠ɺ "OOPZΛ࢖ͬͯΠϯσοΫεΛ࡞੒ 2SJDIͷΫΠζ࡞੒࣌͸ɺهࣄຊจͱ݀ൈ͖ର৅ͷΩʔϫʔ

  υΛಡΈࠐ·ͤͯ&-.PϕΫτϧΛऔಘ͠ɺͦͷϕΫτϧͷ ۙ๣ʹ͋Δ୯ޠΛݕࡧ ৽ޠ͕ొ৔ͨ͠৔߹͸ਓखͰΩʔϫʔυͱهࣄΛબΜͰ &-.PϕΫτϧΛऔಘ͠ɺ"OOPZͷΠϯσοΫεʹ௥Ճɾߋ ৽͢Δӡ༻ʹ͍ͯ͠Δ
 13. 21 ©The Asahi Shimbun Company 2020 &-.PͰۙ๣ͷ୯ޠ͸Ͳ͏มΘ͔ͬͨʁ &-.Pͷۙ๣୯ޠΦϦοΫεɺڊਓɺϨουιοΫε &-.Pͷۙ๣୯ޠϥΠϒυΞɺιχʔɺΩϟϊϯ ถάʔάϧͷ਌ձࣾʮΞϧϑΝϕοτʯͷ࣌Ձ૯ֹ͕̍

  ̒೔ɺ̍ஹυϧʢ̍̍̌ஹԁʣΛಥഁͨ͠ɻถاۀͰ ͸ɺΞοϓϧɺΞϚκϯɺϚΠΫϩιϑτʹଓ͖̐ࣾ໨ ʹͳΔɻ೔ຊͷ࣌Ձ૯ֹτοϓͷτϤλࣗಈंʢ̎̑ஹ ԁʣͳͲɺ೔ຊ੎ͱͷ͕ࠩ·͢·֦͢େ͍ͯ͠Δɻ ೥͔Β೥ʹ͔͚ͯɺੈք֤஍Ͱʮաڈ࠷ѱʯͱ΋ݺ͹Ε Δ৿ྛՐࡂ͕૬࣍͗·ͨ͠ɻ஍ٿԹஆԽ͕ҰҼͱ΋ࢦఠ͞Ε͍ͯ ·͕͢ɺݱ৔Ͱऔࡐͯ͠ΈΔͱผͷଆ໘΋ݟ͖͑ͯ·ͨ͠ɻΦʔ ετϥϦΞɺϒϥδϧͷΞϚκϯɺถΧϦϑΥϧχΞͰԿ͕ى͖ ͍ͯΔͷ͔ɺಛ೿һ͕ใࠂ͠·͢ɻ XPSEWFDͰ͸ۙ๣୯ޠ͸ͲͪΒ΋ʮωοτ௨ൢʯɺʮάʔάϧʯɺʮΩϯυϧʯ &-.Pͷۙ๣୯ޠϨϊϘɺμΠϜϥʔɺωοτϑϦοΫε &-.Pͷۙ๣୯ޠϝϯϑΟεɺϘεχΞɺΨϘϯ ιϑτόϯΫ͸̑೔ɺ࣍ੈ୅ͷߴ଎Ҡಈ௨৴ํࣜʮ̜̑ʯͷαʔ ϏεΛࠓ݄̎̓೔ʹ࢝ΊΔͱൃදͨ͠ɻ͍·ͷ̜̐ͷྉۚʹɺ݄ ֹ̍ઍԁʢ੫ൈ͖ʣΛ௥Ճ͢Ε͹࢖͑ΔΑ͏ʹ͢Δɻܞଳి࿩େ ख̏ࣾͰ̜̑ͷ։࢝࣌ظͱྉۚΛൃදͨ͠ͷ͸ॳΊͯɻ ιϑτόϯΫͷ౦඿ڊ౤ख͕ࠓقͷ։ນ౤खΛ຿ΊΔ͜ ͱ͕̒೔·Ͱʹܾ·ͬͨɻϓϩ̔೥໨Ͱॳͷେ໾ͱͳ Δɻ̎̌೔ͷ։ນઓ͸ຊڌ஍ͷ෱ԬϖΠϖΠυʔϜͰ ϩοςͱରઓ͢Δɻ XPSEWFDͰ͸ۙ๣୯ޠ͸ͲͪΒ΋ʮָఱʯɺʮ%F/"ʯɺʮΦϦοΫεʯ
 14. ࢀߟจݙ 23 [1] Peters+18, Deep contextualized word representations, NAACL2018 [2]

  Devlin+19, BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, NAACL2019 [3] Shen+18, Baseline Needs More Love: On Simple Word- Embedding-Based Models and Associated Pooling Mechanisms, ACL2018 ©The Asahi Shimbun Company 2020