$30 off During Our Annual Pro Sale. View Details »

自然言語処理を用いた効果的な広告テキストの自動生成【CADC2022】

 自然言語処理を用いた効果的な広告テキストの自動生成【CADC2022】

インターネット広告は年々増加の一途をたどっており、その激しい新陳代謝から人手による制作は限界を迎えています。さらに、近年の人工知能技術の成功から、広告クリエイティブ、特に自然言語処理技術を使った広告テキストの自動生成には大きな期待が寄せられています。この発表では、NAACL や EMNLP などの難関国際会議にも採択され、AI Lab と極プロダクトを中心に研究開発してきた、自然言語生成技術を用いた広告効果を考慮した広告テキストの自動生成手法と、その周辺の取り組みについてご紹介します。

CyberAgent
PRO

March 24, 2022
Tweet

More Decks by CyberAgent

Other Decks in Technology

Transcript

  1. None
  2. ுഓೇ 3FTFBSDI4DJFOUJTU $ZCFS"HFOU"*-BC w ೥ʹ"*-BCʹத్ೖࣾ w ۃϓϩμΫτͱ࿈ܞ͠ͳ͕Βɺ޿ࠂςΩετͷࣗಈੜ੒΍ ޿ࠂޮՌͷ༧ଌͳͲɺࣗવݴޠॲཧٕज़ͷ޿ࠂ෼໺ద༻ʹ ͍ͭͯͷݚڀ։ൃʹैࣄ

  3. ߴ඼࣭ͳ޿ࠂΛ ࣗಈͰ੍࡞͠ଓ͚͍ͨ

  4. എܠ  ੍࡞෺ͷधཁ֦େ  ੍࡞Ϧιʔεͷރׇ  ਓ޻஌ೳٕज़ͷ୆಄

  5. എܠ੍࡞෺ͷधཁ֦େ ࢢ৔ن໛ͷ֦େ Πϯλʔωοτ޿ࠂࢢ৔͸͜͜೥Ͱ໿ഒ΋ͷن໛ʹ੒௕ ग़యɿ೥ͷΠϯλʔωοτ޿ࠂഔମඅ͸ஹԯԁʹɻϞόΠϧʴಈը޿ࠂͷ৳ͼʹ஫໨

  6. ݕࡧ࿈ಈܕ޿ࠂʴσΟεϓϨΠ޿ࠂͰ૯޿ࠂඅͷ͏ͪ ໿ׂͷγΣΞΛތΔ ग़యɿʮ೥Πϯλʔωοτ޿ࠂഔମඅʯղઆɻϚεഔମͱ΄΅ฒΜͩʮஹԁ௒ʯͷ ಺༁͸ʁ എܠ੍࡞෺ͷधཁ֦େ

  7. σΟεϓϨΠ޿ࠂͱ͸ w 8FCϖʔδͳͲͷ޿ࠂ࿮ʹදࣔ͞ΕΔ޿ࠂ w ߦಈཤྺͳͲ͔Βझຯᅂ޷ʹ߹͏Α͏ͳλʔήςΟ ϯά͕͞ΕΔ͜ͱ΋͋Δ w ը૾ɺςΩετɺಈըͳͲ͞·͟·ͳഔମ͔Βߏ੒ ͞ΕΔ

  8. ݕࡧ࿈ಈܕ޿ࠂͱ͸ w ݕࡧΤϯδϯͰ࢖༻͞ΕΔ޿ࠂ w ϢʔβͷೖྗΩʔϫʔυͱ޿ࠂओͷઃఆΩʔϫʔυ͕Ϛονͨ͠৔߹ʹදࣔ͞ΕΔ w جຊతʹςΩετͷΈͰߏ੒͞ΕΔ

  9. എܠ੍࡞Ϧιʔεͷރׇ ݕࡧΫΤϦ਺͸૿ՃͷҰ్ΛͨͲΔ w ຖ೥લ೥ͷ໿લޙͰ૿͑ଓ͚Δͱ༧૝͞ΕΔ w ೥࣌఺Ͱ໿ஹҎ্΋ͷݕࡧΩʔϫʔυʹ౸ୡ͢Δͱ ͍ΘΕΔग़యɿ(PPHMF4FBSDI4UBUJTUJDTBOE'BDUT :PV.VTU,OPX  ͜ΕΒશͯͷΩʔϫʔυʹ

    ରͯ͠ɺਓखʹΑͬͯߴ඼࣭ͳ ޿ࠂΛ੍࡞͢Δͷ͸೉͍͠
  10. എܠਓ޻஌ೳٕज़ͷ୆಄ ۙ೥ɺଟ͘ͷ෼໺ʹͯϒϨΠΫεϧʔΛى͜͢ IUUQTXXXXJSFEDPNTUPSZBJUFYUHFOFSBUPSHQUMFBSOJOHMBOHVBHFpUGVMMZ IUUQTEFFQNJOEDPNSFTFBSDIDBTFTUVEJFTBMQIBHPUIFTUPSZTPGBS ೥)JOUPOΒͷݚڀνʔϜ͕0$3ͷίϯϖςΟγϣϯͰ ਂ૚ֶशΛ࢖ͬͨख๏Ͱطଘख๏ʹେࠩΛ͚ͭͯ༏উ <,SJ[IFWTLZFUBM /FVS*14> ೥%FFQ.JOEͷڧԽֶशϞσϧʮ"MQIB(Pʯ͕ ғޟͰ౰࣌ͷੈքع࢜ϨʔτҐͷᐬܿʹউར

    <4JMWFSFUBM /BUVSF> ೥0QFO"*ͷݴޠϞσϧʮ(15ʯʹΑͬͯ ·ΔͰਓ͕ؒॻ͍ͨΑ͏ͳߴਫ਼౓ͳจষΛੜ੒Մೳʹ <#SPXOFUBM /FVS*14>
  11. എܠ  ੍࡞෺ͷधཁ֦େ  ੍࡞Ϧιʔεͷރׇ  ਓ޻஌ೳٕज़ͷ୆಄

  12. ߴ඼࣭ͳ޿ࠂΛ ࣗಈͰ੍࡞͠ଓ͚͍ͨ

  13. ߴ඼࣭ͳ޿ࠂΛ ࣗಈͰ੍࡞͠ଓ͚͍ͨ w w w w w w

  14. ͲͪΒ͕ߴ඼࣭ʁ 74

  15. 74 DMJDLT DMJDLT ͲͪΒ͕ߴ඼࣭ʁ

  16. 74 DMJDLT DMJDLT ͲͪΒ͕ߴ඼࣭ʁ

  17. ͲͪΒ͕ΑΓΫϦοΫ͞ΕΔʁ 74 74

  18. 74 74 ࣗવɾྲྀெͳ೔ຊޠ ෆࣗવͳ೔ຊޠ ݕࡧΩʔϫʔυʹϚον͍ͯ͠Δ ݕࡧΩʔϫʔυʹϚον͍ͯ͠ͳ͍ ͲͪΒ͕ΑΓΫϦοΫ͞ΕΔʁ

  19. ߴ඼࣭ͳ޿ࠂจͱ͸ ҎԼͷ߲໨Λ඼࣭֬ೝͷͨΊͷ൑அࢦඪͱߟ͑Δ ͜ͱ͕Ͱ͖Δ w޿ࠂ഑৴࣮੷ wࣗવ͞ɺྲྀெ͞ wݕࡧΩʔϫʔυͱͷؔ࿈ੑ

  20. ߴ඼࣭ͳ޿ࠂΛ ࣗಈͰ੍࡞͠ଓ͚͍ͨ w w w w w

  21. جຊతͳ޿ࠂӡ༻ͷϑϩʔ ޿ࠂςΩετΛࣗಈͰ࡞Δ

  22. جຊతͳ޿ࠂӡ༻ͷϑϩʔ ޿ࠂςΩετΛࣗಈͰ࡞Δ ͜͜ΛࣗಈԽ͍ͨ͠

  23. ೖྗΩʔϫʔυɺ-1৘ใͳͲ ग़ྗ޿ࠂจ Ωʔϫʔυ ΢Ϛ່ɼ%..ɼ1$ɼը໘ λΠτϧ ΢Ϛ່ϓϦςΟʔμʔϏʔ%..(".&4൛ެࣜαΠτ ʛ$ZHBNFT આ໌จ ήʔϜʮ΢Ϛ່ϓϦςΟʔμʔϏʔʯ%..(".&4൛͕ ग़૸தʂ1$ͷେը໘Ͱഭྗຬ఺ͷϥΠϒɾϨʔεγʔϯ

    Λָ͠΋͏ʂεϚʔτϑΥϯ൛ͱσʔλ࿈ܞͯ͠༡΂Δʂ ޿ࠂςΩετΛࣗಈͰ࡞Δ
  24. ςϯϓϨʔτϕʔε ޿ࠂจςϯϓϨʔτʹରͯ͠ɺద੾ʹΩʔϫʔυΛૠೖ͢Δ <#BSU[FUBM &$><'VKJUBFUBM *$&$> ܎Γड͚ؔ܎ͳͲͷߏจ৘ใͷར༻ ঎඼ͳͲͷ৘ใΛઆ໌ͨ͠௕͍จ௕ͷߏจ໦Λ࡞੒͠ɺద੾ʹ ࢬמΓ͢Δ͜ͱͰ୹͍޿ࠂจΛ࡞੒<'VKJUBFUBM *$&$> ݴޠϞσϧ

    -1͔Β୯ޠͷ࿈ͳΓΛநग़͠ɺۃੑ൑ఆثͰϙδςΟϒ͞Λ ߟྀͯ͠ɺݴޠϞσϧΛ࢖ͬͯ޿ࠂจੜ੒<5IPNBJEPVFUBM $*,. > ͍Ζ͍ΖͳΞϓϩʔν
  25. ςΩετ͔ΒςΩετΛੜ੒͢Δܥྻม׵ϞσϧʢTFRTFRʣ <4VUTLFWFSFUBM /FVS*14>ͷొ৔ 4FRTFRϞσϧ͸ػց຋༁ɺࣗಈཁ໿ɺର࿩ॲཧͳͲͷ ࣗવݴޠੜ੒෼໺Ͱ਺ʑͷ੒ޭ ήʔϜ ΢Ϛ່ ʜ ༡΂Δʂ ΢Ϛ່

    ϓϦςΟ ʜ ޷ධൃചத ΢Ϛ່ ϓϦςΟ ʜ χϡʔϥϧωοτϫʔΫ΁
  26. ͔͠͠ैདྷͷTFRTFRख๏͸޿ࠂͷޮՌΛ௚઀ߟྀͰ͖ͳ͍ w ޿ࠂޮՌͷ஋͸ඍ෼Ͱ͖ͳ͍ͷͰɺܭࢉάϥϑʹ૊ΈೖΕΒ Εͳ͍ w ௚઀తͳ޿ࠂޮՌҎ֎ʹ΋ɺྲྀெ͞ɺΩʔϫʔυؔ࿈౓ͳͲ ͷࢦඪ΋ߟྀ͍ͨ͠ ޿ࠂςΩετΛࣗಈͰ࡞Δ ήʔϜ ΢Ϛ່

    ʜ ༡΂Δʂ ΢Ϛ່ ϓϦςΟ ʜ ޷ධൃചத ΢Ϛ່ ϓϦςΟ ʜ ޿ࠂ഑৴࣮੷ɺࣗવ͞ɾྲྀெ͞ɺݕࡧΩʔϫʔυͱͷؔ࿈ੑ
  27. ڧԽֶशͷऔΓೖΕ 4FMGDSJUJDBM4FRVFODF5SBJOJOH 4$45 <3FOOJFFUBM $713> ैདྷͷTFRTFRͰܭࢉ͞ΕΔଛࣦؔ਺ͱɺαϯϓϦϯάʹΑΓಘΒΕͨτʔΫϯʹରͯ͠ใु Λܭࢉͯ͠ଛࣦؔ਺ʹՃ͑ͯ࠷దԽ͢Δ

  28. ڧԽֶशͷऔΓೖΕ ڧԽֶश 4$45 Λಋೖͨ͠޿ࠂจͷࣗಈੜ੒Ϟσϧͷશମਤ<,BNJHBJUPFUBM /""$-> If you are nding the

    most popular insurance ... Decoder Output by Sampling!y! Decoder Output by MLE!y* Input!x BiLSTM Layer Attention Layer Context Vector c Calculating Rewards r Advertisement Quality score calculated with GBRT Fluency score calculated with LSTM Language Model Relevance score calculated with Keyword Matching Lmle : Loss of Maximum Likelihood Estimation Lrl : Loss of Reinforcement Learning Model Parameters Reference y Which insurance is the best ... Checkout the most popular insurance ... Document Tag Keywords Contents of a web-page r = rF + rR + rQ    "*-BCɺࣗવݴޠॲཧ෼໺ͷτοϓΧϯϑΝϨϯεʮ/""$-)-5ʯʹͯڞஶ࿦จ࠾୒ɹʕ޿ࠂޮՌΛߟྀͨ͠޿ࠂจੜ੒ख๏ΛఏҊʕ cגࣜձࣾαΠόʔΤʔδΣϯτ
  29. ڧԽֶशͷऔΓೖΕ ڧԽֶश 4$45 ʹΑΓ޿ࠂޮՌ΍ͦͷଞͷࢦඪΛใुͱͯ͠ѻ͏͜ͱͰֶशՄೳʹ If you are nding the most

    popular insurance ... Decoder Output by Sampling!y! Decoder Output by MLE!y* Input!x BiLSTM Layer Attention Layer Context Vector c Calculating Rewards r Advertisement Quality score calculated with GBRT Fluency score calculated with LSTM Language Model Relevance score calculated with Keyword Matching Lmle : Loss of Maximum Likelihood Estimation Lrl : Loss of Reinforcement Learning Model Parameters Reference y Which insurance is the best ... Checkout the most popular insurance ... Document Tag Keywords Contents of a web-page r = rF + rR + rQ    "*-BCɺࣗવݴޠॲཧ෼໺ͷτοϓΧϯϑΝϨϯεʮ/""$-)-5ʯʹͯڞஶ࿦จ࠾୒ɹʕ޿ࠂޮՌΛߟྀͨ͠޿ࠂจੜ੒ख๏ΛఏҊʕ cגࣜձࣾαΠόʔΤʔδΣϯτ If you are nding the most popular insurance ... Decoder Output by Sampling!y! Decoder Output by MLE!y* Input!x BiLSTM Layer Attention Layer Context Vector c Calculating Rewards r Advertisement Quality score calculated with GBRT Fluency score calculated with LSTM Language Model Relevance score calculated with Keyword Matching Lmle : Loss of Maximum Likelihood Estimation Lrl : Loss of Reinforcement Learning Model Parameters Reference y Which insurance is the best ... Checkout the most popular insurance ... Document Tag Keywords Contents of a web-page r = rF + rR + rQ    ࣗવ͞ɺྲྀெ͞ Flu  ੜ੒݁Ռʹରͯ͠ɺݴޠϞσϧͰ ࢉग़͞ΕΔʮΒ͠͞ʯ ݕࡧΩʔϫʔυͱͷؔ࿈ੑ Rel  ੜ੒݁ՌͷΩʔϫʔυʹର͢ΔΧόʔ཰ͱ ΩʔϫʔυͷҐஔ ޿ࠂ഑৴࣮੷ QS  ੜ੒݁Ռʹରͯ͠ɺաڈͷ޿ࠂ഑৴ͰಘΒΕͨ σʔλͰ܇࿅ͨ͠ճؼϞσϧʹΑͬͯࢉग़͞ΕΔ ਪఆ඼࣭஋
  30. ߴ඼࣭ͳ޿ࠂΛ ࣗಈͰ੍࡞͠ଓ͚͍ͨ w w w w w w w

  31. ੜ੒͞Εͨ޿ࠂͷධՁ ධՁ͢ΔͨΊͷํ๏ wఆΊΒΕͨࢦඪͰࣗಈͰධՁʢࣗಈධՁʣ wਓ͕ؒ௚઀ݟͯධՁʢਓखධՁʣ w࣮ࡍʹ഑৴ͯ͠ධՁʢ഑৴ධՁʣ ΦϑϥΠϯධՁ ΦϯϥΠϯධՁ

  32. ΦϑϥΠϯධՁ ਓखධՁͰ͸ɺ޿ࠂ੍࡞ऀͱΤϯυϢʔβʔΛ૝ఆͨ͠ Ϋϥ΢υιʔγϯάͦΕͧΕʹΑΔԼͷ߲໨ͷධՁ w ྲྀெੑ Fluency  w ັྗ Attractive.

     w Ωʔϫʔυͱͷؔ࿈ੑ Relevance Model Copywriter Crowdsourcing Fluency Attractive. Relevance Fluency Attractive. Relevance Reference 87.5 25.5 24.4 75.6 26.8 29.1 Seq2Seq 83.3 25.1 23.7 64.5 23.8 26.1 + Flu, QS 81.7 25.3 22.8 64.3 24.4 26.6 + Flu, Rel 77.5 24.2 23.7 60.9 24.8 26.2 + Flu, Rel, QS 81.2 23.9 24.3 62.7 25.4 26.9
  33. ΦϯϥΠϯධՁ ࣗಈੜ੒͞Εͨ޿ࠂจΛ࣮ࡍʹ഑৴ͯ͠ҎԼͷ߲໨Λɺ ਓख੍࡞޿ࠂͱͷഒ཰ΛධՁ w දࣔճ਺ Impression  w ΫϦοΫ཰ CTR

     w ফඅ༧ࢉ Cost Model Impression CTR Cost Seq2Seq 3.54x 0.66x 3.31x + Flu, Rel 3.80x 0.52x 3.62x + Flu, Rel, QS 1.32x 0.71x 2.58x
  34. ੍࡞͠ଓ͚ΔͨΊʹ ੜ੒⁶ධՁͷ܁Γฦ͠ ͜ͷϧʔϓΛΑΓߴ଎ʹɺΑΓޮ཰తʹߦ͏ඞཁ

  35. '"45<,BXBNPUPFUBM &./-1> ਓखධՁΛΑΓޮ཰తʹߦ͏πʔϧͷ։ൃ $ZCFS"HFOUGBTUBOOPUBUJPOUPPM ࣗવݴޠॲཧ෼໺ͷτοϓΧϯϑΝϨϯεʮ&./-1ʯͷ4ZTUFN%FNPOTUSBUJPO5SBDLʹͯ࿦จ ࠾୒ʔϞόΠϧ୺຤༻ͷޮ཰తͳΞϊςʔγϣϯπʔϧΛఏҊʔcגࣜձࣾαΠόʔΤʔδΣϯτ

  36. '"45<,BXBNPUPFUBM &./-1> ଞπʔϧͱͷ࢖༻ײͷධՁ w ࡞ۀޮ཰ Efficiency  w ࡞ۀਫ਼౓ Quality

     w ࢖༻ײ Usability Tool Efficiency (!) Quality (") Usability (!) Baseline (Mobile) 6.7 0.98 5.82 FAST (Mobile, Card UI) 4.6 0.97 2.03 FAST (Mobile, Multi-label UI) 4.9 0.97 3.15
  37. ݚڀ੒ՌͷϓϩμΫτ΁ͷ׆༻ ࠓճ঺հͨ͠಺༰ΛؚΊɺଞʹ΋ଟ͘ͷख๏΍πʔϧ͕ݚڀ։ൃ͞Εɺ ࣮ࡍʹϓϩμΫτʹಋೖ͞Εɺݕূ͞Ε͍ͯΔ

  38. ߴ඼࣭ͳ޿ࠂΛ ࣗಈͰ੍࡞͠ଓ͚͍ͨ ͜Ε͔Β΋

  39. ͜Ε͔Β ͞ΒͳΔੜ੒඼࣭ͷ޲্ w ྲྀெੑ w ଟ༷ੑ w ஧࣮ੑ ΑΓଟछͳσʔλͷ׆༻ w

    ϚϧνϞʔμϧʢը૾ɺ-1ϨΠΞ΢τʣ <.VSBLBNJFUBM ݴޠॲཧֶձ> ࢼߦࡨޡͷͨΊͷ؀ڥ w ֤छධՁج൫ͷ੔උ w σʔλऩूͷͨΊͷج൫੔උ
  40. 3FGFSFODFT w<,SJ[IFWTLZFUBM /FVS*14>*NBHF/FU$MBTTJpDBUJPOXJUI%FFQ$POWPMVUJPOBM/FVSBM/FUXPSLT w<4JMWFSFUBM /BUVSF>.BTUFSJOHUIFHBNFPG(PXJUIEFFQOFVSBMOFUXPSLTBOEUSFFTFBSDI w<#SPXOFUBM /FVS*14>-BOHVBHF.PEFMTBSF'FX4IPU-FBSOFST w<#BSU[FUBM &$>/BUVSBMMBOHVBHFHFOFSBUJPOGPSTQPOTPSFETFBSDIBEWFSUJTFNFOUT w<'VKJUBFUBM

    *$&$>"VUPNBUJDHFOFSBUJPOPGMJTUJOHBETCZSFVTJOHQSPNPUJPOBMUFYUT w<5IPNBJEPVFUBM $*,.>"VUPNBUFETOJQQFUHFOFSBUJPOGPSPOMJOFBEWFSUJTJOH w<4VUTLFWFSFUBM /FVS*14>4FRVFODFUP4FRVFODF-FBSOJOHXJUI/FVSBM/FUXPSLT w<3FOOJFFUBM $713>4FMGDSJUJDBM4FRVFODF5SBJOJOHGPS*NBHF$BQUJPOJOH w<,BNJHBJUPFUBM /""$->"O&NQJSJDBM4UVEZPG(FOFSBUJOH5FYUTGPS4FBSDI&OHJOF"EWFSUJTJOH w<,BXBNPUPFUBM &./-1>'"45'BTU"OOPUBUJPOUPPMGPS4NBS5EFWJDFT w<.VSBLBNJFUBM ݴޠॲཧֶձ>-1UP5FYUϚϧνϞʔμϧ޿ࠂจੜ੒