Upgrade to Pro — share decks privately, control downloads, hide ads and more …

時間情報表現抽出とルールベース解析器のこれから / Temporal Expression Analysis in Japanese and Future of Rule-based Approach

yag_ays
April 08, 2022

時間情報表現抽出とルールベース解析器のこれから / Temporal Expression Analysis in Japanese and Future of Rule-based Approach

【NLP Hacks vol.3】『実装』に特化した、NLP勉強会コミュニティ開催!
https://connpass.com/event/241079/

yag_ays

April 08, 2022
Tweet

More Decks by yag_ays

Other Decks in Research

Transcript

  1. ࣌ؒ৘ใදݱநग़ͱϧʔϧϕʔεղੳثͷ͜Ε͔Β
    2022/04/08


    Ubieגࣜձࣾ @yag_ays

    View Slide

  2. 2
    Ԟా ༟थ


    @yag_ays


    Recruit → Sansan → Ubie


    ࣗݾ঺հ
    https://yag-ays.github.io/

    View Slide

  3. 3
    ࠓ೔͓࿩͢Δ͜ͱ
    • ࣌ؒ৘ใදݱΛղੳ͢Δja-timexʹ͍ͭͯ


    • ࣌ؒ৘ใදݱͷநग़/ن֨Խͱ͍͏λεΫʹ͍ͭͯઆ໌͠·͢


    • ࢲ͕࡞੒ͨ࣌ؒ͠৘ใදݱΛղੳ͢ΔϥΠϒϥϦΛ঺հ͠·͢


    • ͲͷΑ͏ʹ࣮૷͍͔ͯͬͨ͠ͷഎܠ΍ۤ࿑΋؆୯ʹ঺հ͠·͢


    • ϧʔϧϕʔεղੳثͱ࣌ؒ৘ใදݱͷ͜Ε͔Β


    • ϧʔϧϕʔεͬͯࠓͲ͖Ͳ͏ͳͷʁେن໛ݴޠϞσϧʹஔ͖׵Θ͍ͬͯ͘ͷʁͱ͍͏ٙ໰Λɹ
    ߟ͑ͯΈ·͢


    • طଘݚڀͳͲ΋౿·͑ͯɺ࣌ؒ৘ใදݱͷࠓޙʹ͍ͭͯߟ͑ͯΈ·͢

    View Slide

  4. 4
    ja-timexͱ͸
    • ࣗવݴޠͰॻ͔Εͨ࣌ؒ৘ใදݱΛநग़/ن֨Խ͢Δϧʔϧϕʔεͷղੳث


    • Pythonύοέʔδͱͯ͠pipͰΠϯετʔϧͰ͖ΔܗͰPyPIͰެ։


    • ࣌ؒ৘ใදݱͱ͸ʁ


    • ਓ͕ؒݴ༿ͱͯ͠දݱ͢Δ೔෇΍࣌ࠁɺظؒɺස౓ͱ͍ͬͨ࣌ؒʹؔ࿈͢Δ֓೦ͷ͜ͱ
    https://github.com/yagays/ja-timex/


    https://ja-timex.github.io/docs/

    View Slide

  5. 5
    ൴͸2008೥4݄͔Βिʹ3ճͷδϣΪϯάΛ


    ே8͔࣌Β1࣌ؒߦ͖ͬͯͨ
    ࣌ؒ৘ใදݱͰλά෇͚͢ΔTIMEX3ܗࣜ [1]

    View Slide

  6. 6
    2008೥4݄ ே8࣌ 1࣌ؒ िʹ3ճ
    நग़
    ൴͸2008೥4݄͔Βिʹ3ճͷδϣΪϯάΛ


    ே8͔࣌Β1࣌ؒߦ͖ͬͯͨ
    ࣌ؒ৘ใදݱͰλά෇͚͢ΔTIMEX3ܗࣜ [1]

    View Slide

  7. 7
    ࣌ؒ৘ใදݱͰλά෇͚͢ΔTIMEX3ܗࣜ [1]
    ൴͸2008೥4݄͔Βिʹ3ճͷδϣΪϯάΛ


    ே8͔࣌Β1࣌ؒߦ͖ͬͯͨ
    2008೥4݄ ே8࣌ 1࣌ؒ िʹ3ճ
    "2008-04-XX" "T08-XX-XX" "PT1H" "P1W", "3X"
    DATE


    ೔෇දݱ
    TIME


    ࣌ؒදݱ
    DURATION


    ࣋ଓ࣌ؒදݱ
    SET


    ස౓ू߹දݱ
    ن֨Խ
    நग़

    View Slide

  8. 8
    • ࠷ॳͷ͖͔͚ͬ (2021/07ࠒ)


    • NAISTߥ຀ݚओ࠵ͷNTCIR16 Real-MedNLPͱ͍͏γΣΞʔυλεΫ͕ެ։͞Εͨ


    • ຊλεΫͷݻ༗දݱநग़ͷΤϯςΟςΟͷ1ͭʹ࣌ؒ৘ใදݱ͕͋ͬͨ


    • Real-MedNLPͰͷར༻΋ߟ͑ɺ͍ͣΕࣗࣾͷϓϩμΫτ։ൃ౳ͰඞཁʹͳΔͩΖ͏ͳͱࢥͬͯ
    நग़ΤϯδϯΛ࡞Γ࢝ΊͨΒɺ൚༻తͳ΋ͷʹͨ͘͠ͳͬͨ (खஈͷ໨తԽ)


    • ҩྍυϝΠϯʹ͓͚Δ࣌ؒ৘ใදݱͷॏཁ͞


    • ిࢠΧϧςจॻɺ࿦จͷ঱ྫใࠂͳͲͰ͸ɺ࣌ؒ৘ใදݱͱࣄ৅͕ϖΞͰهࡌ͞ΕΔ


    • ױऀ͕ૌ͑Δ঱ঢ়ͷൃੜ࣌ظ΍͔ͦ͜Βͷܦա (׮ղ/૿ѱ) ͸ɺ਍࡯ʹ͓͍ͯॏཁ


    • Temporal Relation Extraction, Temporal reasoningͳͲͱݺ͹ΕΔλεΫ [2] [3]
    ja-timex࡞੒ͷϞνϕʔγϣϯ - ཧ೦ฤ

    View Slide

  9. 9
    ҩྍυϝΠϯͷจॻʹ͓͍ͯ࣌ؒ৘ใදݱ͸Ͳ͏දΕΔ͔
    22ࡀͷஉੑɻΉ͘ΈΛओૌʹདྷӃͨ͠ɻ2 िલɺٸʹإ໘ͱԼḰͷΉ͘Έٴͼഉ೘ޙͷ
    ೘ͷࡉ͔͍๐ཱͪʹؾ෇͍ͨɻΉ͘Έ͸࣍ୈʹ૿ѱ͠ɺ͜ͷؒʹମॏ͸໿ 20 kg૿Ճ͠
    ͨɻ5 ೔લ͔Β৯ཉ͕ͳ͘৯ࣄྔ͸൒ݮ͠ɺԼཀྵؾຯͰɺશ਎᷺ଵײ͕ѱԽ͍ͯ͠Δɻ
    Ұࡢ೔͔Βഉ೘ճ਺͕ݮগ͠ɺࡢ೔͸৭͕ೱ͘๐ཱͪͷڧ͍೘͕ 2 ճɺ͍ͣΕ΋গྔग़
    ͨͷΈͰɺຊ೔͸ىচޙ 10 ࣌ؒͰ·ͩഉ೘͕ͳ͍ɻ


    ҩࢣࠃՈࢼݧ ୈ115ճ A໰୊ 26໰໨ΑΓ


    View Slide

  10. 10
    ҩྍυϝΠϯͷจॻʹ͓͍ͯ࣌ؒ৘ใදݱ͸Ͳ͏දΕΔ͔
    22ࡀͷஉੑɻΉ͘ΈΛओૌʹདྷӃͨ͠ɻ2 िલɺٸʹإ໘ͱԼḰͷΉ͘Έٴͼഉ೘ޙͷ
    ೘ͷࡉ͔͍๐ཱͪʹؾ෇͍ͨɻΉ͘Έ͸࣍ୈʹ૿ѱ͠ɺ͜ͷؒʹମॏ͸໿ 20 kg૿Ճ͠
    ͨɻ5 ೔લ͔Β৯ཉ͕ͳ͘৯ࣄྔ͸൒ݮ͠ɺԼཀྵؾຯͰɺશ਎᷺ଵײ͕ѱԽ͍ͯ͠Δɻ
    Ұࡢ೔͔Βഉ೘ճ਺͕ݮগ͠ɺࡢ೔͸৭͕ೱ͘๐ཱͪͷڧ͍೘͕ 2 ճɺ͍ͣΕ΋গྔग़
    ͨͷΈͰɺຊ೔͸ىচޙ 10 ࣌ؒͰ·ͩഉ೘͕ͳ͍ɻ


    ҩࢣࠃՈࢼݧ ୈ115ճ A໰୊ 26໰໨ΑΓ


    2िલ 5೔લ ࠓ
    ࡢ೔ 10࣌ؒ
    ࣌ܥྻ
    Ұࡢ೔

    View Slide

  11. 11
    • ༰қʹར༻Ͱ͖Δ΋ͷ͕౰࣌ݶΒΕ͍ͯͨ


    • ϧʔϧϕʔε࣮૷͸੒ᖒΒ[4] ͷ࿦จ࣮૷Ͱ͋ΔnormalizeNumexp ͷΈ


    • C++࣮૷ͰɺPythonόΠϯσΟϯά͸͋Δ΋ͷͷखݩͰಈ͔͢͜ͱ͕Ͱ͖ͳ͔ͬͨ


    • ͳ͓ɺݱࡏͰ͸ pynormalizenumexp ͱ͍͏PythonͰͷ࠶࣮૷͕ଘࡏ͢Δ


    • spaCyͷݻ༗දݱநग़Ϟσϧ͸༰қʹར༻Ͱ͖ͨ


    • DATEͱTIMEʹରԠ͍͕ͯͨ͠ɺ೔෇දݱʹऑ͍ͳͲਫ਼౓ͷ՝୊Λײͨ͡


    • ࣌ؒ৘ใදݱΛநग़͢ΔͷΈͰɺ͔ͦ͜ΒPythonͷdatetime౳΁ͷม׵͸Ͱ͖ͳ͍


    • σʔληοτ͕ͳ͔ͬͨ


    • TIMEX3ܗࣜ౳Ͱ࣌ؒ৘ใදݱ͕Ξϊςʔγϣϯ͞Ε͍ͯΔσʔληοτ͕ར༻Ͱ͖ͳ͔ͬͨ


    • CRF/ܥྻϞσϧ/BERT౳Ͱɺؾܰʹڭࢣ͋ΓֶशͷNERΛߏஙͰ͖ͳ͔ͬͨ
    ja-timex࡞੒ͷϞνϕʔγϣϯ - ࣮૷ฤ

    View Slide

  12. 12
    ja-timexͷ࣮૷

    View Slide

  13. 13
    ja-timexͷ࢖͍ํ

    View Slide

  14. 14
    ͲͷΑ͏ʹϧʔϧΛهड़͍ͯ͠Δ͔ʁ
    1. ࣌ؒ৘ใදݱͷ਺஋෦෼ΛάϧʔϓԽ


    • ਖ਼نදݱͰͷ਺஋දݱͷऔಘͱՄಡੑ
    ୲อͷͨΊʹɺ໊લ෇͖άϧʔϓΛ
    ࡞͓ͬͯ͘


    2. ࣌ؒ৘ใදݱͷύλʔϯΛਖ਼نදݱͰߏங


    • จࣈྻ্Ͱදݱ͞Ε͏Δ࣌ؒ৘ใදݱ
    Λਖ਼نදݱʹམͱ͠ࠐΉ


    • ରԠ͢Δղੳ༻ͷؔ਺ͱඥ෇͚Δ


    3. நग़͞Εͨ࣌ؒ৘ใදݱΛղੳ


    • நग़ͨ࣌ؒ͠৘ใදݱͷจࣈྻΛ
    TIMEX3λάͷ࢓༷ʹ߹Θͤͯղੳ͠ɹ
    ग़ྗ͢Δ

    View Slide

  15. 15
    ͲͷΑ͏ʹϧʔϧΛهड़͍ͯ͠Δ͔ʁ
    1. ࣌ؒ৘ใදݱͷ਺஋෦෼ΛάϧʔϓԽ


    • ਖ਼نදݱͰͷ਺஋දݱͷऔಘͱՄಡੑ
    ୲อͷͨΊʹɺ໊લ෇͖άϧʔϓΛ
    ࡞͓ͬͯ͘


    2. ࣌ؒ৘ใදݱͷύλʔϯΛਖ਼نදݱͰߏங


    • จࣈྻ্Ͱදݱ͞Ε͏Δ࣌ؒ৘ใදݱ
    Λਖ਼نදݱʹམͱ͠ࠐΉ


    • ରԠ͢Δղੳ༻ͷؔ਺ͱඥ෇͚Δ


    3. நग़͞Εͨ࣌ؒ৘ใදݱΛղੳ


    • நग़ͨ࣌ؒ͠৘ใදݱͷจࣈྻΛ
    TIMEX3λάͷ࢓༷ʹ߹Θͤͯղੳ͠ɹ
    ग़ྗ͢Δ

    View Slide

  16. 16
    ͲͷΑ͏ʹϧʔϧΛهड़͍ͯ͠Δ͔ʁ
    1. ࣌ؒ৘ใදݱͷ਺஋෦෼ΛάϧʔϓԽ


    • ਖ਼نදݱͰͷ਺஋දݱͷऔಘͱՄಡੑ
    ୲อͷͨΊʹɺ໊લ෇͖άϧʔϓΛ
    ࡞͓ͬͯ͘


    2. ࣌ؒ৘ใදݱͷύλʔϯΛਖ਼نදݱͰߏங


    • จࣈྻ্Ͱදݱ͞Ε͏Δ࣌ؒ৘ใදݱ
    Λਖ਼نදݱʹམͱ͠ࠐΉ


    • ରԠ͢Δղੳ༻ͷؔ਺ͱඥ෇͚Δ


    3. நग़͞Εͨ࣌ؒ৘ใදݱΛղੳ


    • நग़ͨ࣌ؒ͠৘ใදݱͷจࣈྻΛ
    TIMEX3λάͷ࢓༷ʹ߹Θͤͯղੳ͠ɹ
    ग़ྗ͢Δ

    View Slide

  17. 17
    ࣌ؒ৘ใදݱΛѻ͏্Ͱͷͦͷଞͷػೳ
    • ΞϥϏΞ਺ࣈ/׽਺ࣈɺ੢ྐྵ/࿨ྐྵͳͲͷଟ࠼ͳϑΥʔϚοτʹରԠ


    • ׽਺ࣈˠΞϥϏΞ਺ࣈ΁ͷม׵͢Δ͜ͱͰ࣮ݱ


    • ࿨ྐྵͷදݱ΋ରԠ


    • Pythonͷ೔෇ܕ/࣌ؒܕ΁ͷม׵


    • datetime΍timedeltaܗࣜʹม׵͠ɺϓϩάϥϜ͔Βར༻Ͱ͖ΔΑ͏ʹ͢Δ


    • ج४࣌Λઃఆ͠ɺ૬ରతͳ࣌ؒ৘ใදݱͰͷิ׬͕Մೳ


    • ࠓ೔Λج४೔ͱͨ͠ͱ͖ɺʮ4/8ʯˠ ʮ2022-04-08ʯͷΑ͏ʹ೥͕ิ׬͞ΕΔ

    View Slide

  18. 18
    ։ൃͷొΓํ (1)
    • طଘݚڀΛಡΈࠐΉ


    • ͱʹ͔͘࢓༷͕ෳࡶͰཧղ͕೉͍͠


    • ͜ΕΛҰ͔ΒखಈΞϊςʔγϣϯ͠ΖͱݴΘΕΔͱɺ͔ͳΓ೉͍͠Μ͡Όͳ͍͔ͱࢥ͏


    • ແݶʹٙ໰͕ग़ͯ͘ΔͷͰɺͳΜͱ͔໌จԽͰ͖Δܗʹམͱ͠ࠐΜͰ͍͘


    • e.g. DATEͱTIMEͬͯԿ͕ҧ͏ͷʁ → 1೔ͱ͍͏ɺ஍ٿ্ͷपظతͳ࠷খ୯ҐͰ۠ผ


    • e.g.ʮே8࣌ʯͷʮேʯ͸ؚΊΔ΂͖ʁ → طଘݚڀͩͱؚΊ͍ͯΔͷͰϤγʂ


    • ·ͣ͸ߟ͑ΒΕΔදݱΛ۪௚ʹ࣮૷


    • ςετۦಈ։ൃͰɺ࣌ؒ৘ใදݱͱ͋Δ΂͖ղੳ݁ՌΛେྔʹྻڍ͢Δ


    • ͦͷςετʹ௨ΔΑ͏ʹͻͨ͢Βຊମͷ࣮૷ΛਐΊΔ

    View Slide

  19. 19
    ։ൃͷొΓํ (2)
    • Livedoor χϡʔείʔύεશจʹର࣮ͯ͠ߦͯ͠False
    PositiveΛ௵͍ͯ͘͠ɺΛ܁Γฦ͢


    • ͜ΕͰ͸False Negative͸վળ͕೉͍͕͠ɺϧʔϧ
    ϕʔεͷ࣮૷Ͱ͸͋·Γى͜Βͳ͍ͱ൑அ


    • ੨ۭจݿͰ׽਺ࣈͷදݱΛ୳ͯ͠ɺςετʹ௥Ճ͍ͯ͘͠


    • ׽਺ࣈˠΞϥϏΞ਺ࣈม׵ࣗମͷ࣮૷΋ฒߦ


    • ༧૝͠ͳ͍਺ࣈදݱ͕େྔʹݟ͔ͭΓࠔ࿭


    • ׽਺ࣈ͔ΒΞϥϏΞ਺ࣈͷม׵ʹࠔΔྫ


    • ʮҰɺʓʓ࢛ઍࣣޒʓʯ


    • ʮࡾೋʓઍʓʓʓʯ
    ʮޒϱ೥ܭըͱιϰΣτಉໍͷจԽతඈ༂ʯ ٶຊඦ߹ࢠ ΑΓ

    View Slide

  20. 20
    ja-timexͷݶքͱ՝୊
    • จ຺Λߟྀͨ͠೔෇දݱͷநग़͕Ͱ͖ͳ͍


    • False Positive: ʮੴͷ্ʹ΋ࡾ೥ʯʮҙؾࠐΈ͸े෼Ͱ͢ʯʮ12/5 ͸ 2.4Ͱ͢ʯ


    • False Negative: DD/MM/YYYYͳͲͷඇ೔ຊޠݍͷ೔෇දه


    • TIMEX3࢓্༷ͷݶք


    • ͋Δಛఆͷ1೔Λද͍ͯ͠Δͷʹ΋ؔΘΒͣɺෳ਺ʹ෼ׂ͞Εͯ͠·͏


    • ʮ4݄8೔༵ۚ೔ʯˠ


    • ᐆດ͕͋͞Δ೔෇දݱΛදݱ͖͠Εͳ͍


    • ʮઌ݄ͷ೔༵೔ʯʮ4݄7,8೔ʯ


    • ਫ਼౓ΛͪΌΜͱܭଌͰ͖͍ͯͳ͍


    • ධՁσʔλΛ࡞Δͷ͕໘౗ष͗͢Δͱ͍͏ͷͰɺ͜Ε͸ࢲͷଵຫͰ͢͝ΊΜͳ͍͞

    View Slide

  21. 21
    ϧʔϧϕʔεղੳثͱ࣌ؒ৘ใදݱநग़ͷ͜Ε͔Β

    View Slide

  22. 22
    ࣌ؒ৘ใදݱͷநग़͸ࠓޙͲ͏ͳ͍ͬͯ͘ͷ͔ʁ
    • લड़ͷ௨Γɺจ຺Λߟྀ͠ͳ͍ͱؒҧ͑Δྫ͕͋Δͷ͸͔֬


    • ԿΒ͔จ຺Λߟྀͨࣗ͠વݴޠॲཧͷΞϓϩʔν͕ඞཁ


    • ৭ʑͳߟ͑ํ͕͋Γ͏Δ


    • ݫີͳऔಘʹ͸ਖ਼نදݱͰͷܾఆతͳॲཧ͕ඞਢͩΑ೿


    • ϧʔϧʹج͍࣮ͮͯ֬ʹಈ͘ͱ͍͏҆৺ײɺϛεΛଈ࠲ʹमਖ਼Ͱ͖ΔରԠྗ


    • TransformerͳͲͷDNNʹΑΔݻ༗දݱநग़͕༏੎ʹͳΔΑ೿


    • จ຺Λߟྀͨ͠ݻ༗දݱநग़ͱͯ͠ɺநग़ثͷ෦෼Λ୅ସ͢Δ


    • ͦ΋ͦ΋End2EndͳγεςϜͰ͸࣌ؒ৘ใදݱΛऔಘ͢ΔͳΜͯ͜ͱ͸͠ͳ͍Α೿


    • μΠϨΫτʹܭࢉػ͕ղऍՄೳͳ࣌ؒΛग़ྗ͢ΔɺQuestion Answeringతʹղ͘ɺͳͲ

    View Slide

  23. 23
    ࣌ؒ৘ใදݱͷۙ೥ͷݚڀ
    • BERTΛ༻͍ͨݻ༗දݱநग़Ξϓϩʔν (2019) [5]


    • σʔληοτͷछྨʹΑͬͯɺBERTͱϧʔϧϕʔεͷ༏ྼ͕มΘΔ݁Ռ


    • seq2seqλεΫͱͯ͠ɺTIMEX3λάΛؚΉςΩετΛ௚઀ੜ੒͢ΔΞϓϩʔν (2021) [6]


    • ϧʔϧϕʔεͷ݁ՌΛWeak SupervisionʹΑΓFine-tuningͷσʔλʹར༻


    • BERT౳ͷNER (Token Classification) ͱൺֱͯ͠ɺଟ͘ͷσʔληοτͰߴਫ਼౓


    • Huggingfaceʹͯར༻ՄೳͳϞσϧ͕഑෍͞Ε͍ͯΔ


    • https://github.com/satya77/Transformer_Temporal_Tagger

    View Slide

  24. 24
    ͦ΋ͦ΋࣌ؒ৘ใදݱ͸ࣗવݴޠॲཧʹͱͬͯԿͳͷ͔ʁ
    • ਺஋తͳଆ໘


    • Ճࢉݮࢉͱ͍ͬͨૢ࡞͕Մೳͳɺ఺΍ͦͷൣғͱͯ͠ͷ਺஋දݱ


    • 1࣌ؒ + 30෼ = 1.5࣌ؒ = 90෼


    • ࡢ೔ͷ໌ޙ೔ = ໌ޙ೔ͷࡢ೔ = ໌೔


    • ݴޠతͳଆ໘


    • ਓؒੜ׆Λओ࣠ͱͨ͠ײ֮తදݱ


    • ேɾனɾ൩ɿओ؍తͳ1೔ͷ۠෼ͷදݱ


    • ઌ೔ɿ͋ΔҰఆظؒ಺ͷաڈͷ1೔Λࢦ͢දݱ


    • 25࣌ɿཌ1࣌ΛಉҰͷ೔ͷԆ௕ઢ্Ͱ͋Δଊ͑ͨͱ͖ͷදݱ


    • Ұສ೥ͱೋઍ೥લɿ͸Δ͔ੲͰ͋Δ͜ͱΛද͢ތுදݱ

    View Slide

  25. 25
    ࣌ؒ৘ใදݱ͸ ৗࣝ Commonsense ͱߟ͑ΒΕΔ
    • Numerical Commonsense [7]


    • ࣌ؒ৘ใදݱ͸ɺ਺஋ʹؔ͢ΔৗࣝͷҰ෦


    • ۩ମతͳλεΫ


    • ؚҙؔ܎ೝࣝ (Recognizing Textual Entailment: RTE)


    • ৗࣝਪ࿦ (Commonsense Reasoning)


    • ϚεΫ͞Εͨ୯ޠͷ༧ଌ (Masked-word-prediction)


    • σʔληοτ


    • NumerSense [8]


    • Ұ෦ʹӈਤͷΑ͏ͳ࣌ؒʹؔ͢Δ࣭໰͕͋Δ ※ ja-timexͰ͸೥ྸ͸࣌ؒ৘ใදݱͱͯ͠
    ͸ѻ͍ͬͯͳ͍ͷͰɺݫີʹ͸͜Ε͸ɹ
    ࣌ؒ৘ใදݱʹؔ͢ΔλεΫͰ͸ͳ͍
    https://inklab.usc.edu/NumerSense/

    View Slide

  26. 26
    ݱࡏͷେن໛ݴޠϞσϧʹΑΔจੜ੒͸࣌ؒΛͲ͏ଊ͍͑ͯΔ͔ʁ
    • Tnrasformerϕʔεͷେن໛ݴޠϞσϧʹΑͬͯɺੜ੒͞ΕΔจॻͷࣗવ͞͸֨ஈʹ্͕ͬͨ


    • ҰํͰɺจͱͯ͠ͷهड़಺༰ͷҰ؏ੑ΍ৗࣝͷѻ͍͸·ͩ·ͩ


    • ࢼ͠ʹGPT-3Ͱ࣌ؒʹؔ͢ΔςΩετΛੜ੒ͯ͠ΈΔ


    • ʮ9࣌ͷ1࣌ؒޙ͸10࣌Ͱ͢ɻ9࣌ͷ2࣌ؒޙ͸ʯʹଓ͘จΛੜ੒ͤ͞Δ


    • Ϟσϧ͸rinna/japanese-gpt-1bɺύϥϝʔλ͸do_sample=True, top_p=0.95
    9࣌ͷ1࣌ؒޙ͸10࣌Ͱ͢ɻ9࣌ͷ2࣌ؒޙ͸1࣌Ͱ͢ɻ͜͜Ͱݴ͏1࣌ͱ͸9࣌00෼ͷ͜ͱΛࢦ͠·͢ɻͦͯ͠


    9࣌ͷ1࣌ؒޙ͸10࣌Ͱ͢ɻ9࣌ͷ2࣌ؒޙ͸ɺ10͔࣌ͩΒɺԿ΋ແ͠Ͱྑ͍ɻԿ΋ແ͍͔Βग़དྷΔΜͰ͢ɻͦΕʹࠓɺ


    9࣌ͷ1࣌ؒޙ͸10࣌Ͱ͢ɻ9࣌ͷ2࣌ؒޙ͸ɺ͓னͷ12࣌Ͱ͢ɻ2͔࣌Β6࣌·Ͱ͕ɺࢲֶ͕ͨͪߍʹ͍Δ࣌ؒͰ͢


    9࣌ͷ1࣌ؒޙ͸10࣌Ͱ͢ɻ9࣌ͷ2࣌ؒޙ͸11࣌Ͱ͢ɻ18࣌ͷ1࣌ؒޙ͸20࣌Ͱ͢ɻ͜ΕΒͷ1೔ͷதʹ͸


    9࣌ͷ1࣌ؒޙ͸10࣌Ͱ͢ɻ9࣌ͷ2࣌ؒޙ͸18࣌Ͱ͢ɻͳͷͰɺͦΜͳʹ͋ͬͱ͍͏ؒʹ͸ಈ͖·ͤΜɻ͔ͩΒ஗͔Ε


    9࣌ͷ1࣌ؒޙ͸10࣌Ͱ͢ɻ9࣌ͷ2࣌ؒޙ͸10࣌ͷ5෼ޙͰ͢ɻͭ·Γ10࣌͸ʮ໿5࣌50෼ʯ


    9࣌ͷ1࣌ؒޙ͸10࣌Ͱ͢ɻ9࣌ͷ2࣌ؒޙ͸11࣌ɺ12࣌͸͓னɻ25࣌24෼͸2࣌ؒޙɺͦͯ͠ɺ౔༵೔ͷ


    9࣌ͷ1࣌ؒޙ͸10࣌Ͱ͢ɻ9࣌ͷ2࣌ؒޙ͸11࣌Ͱ͢ɻ9͔࣌Β11࣌·Ͱͷ2࣌ؒͷฏۉͷ࣌ܭͷ਑ͱͷ࣌ؒ

    View Slide

  27. 27
    ͋ΒͨΊͯ࣌ؒ৘ใදݱநग़ͷ͜Ε͔Β
    • ϧʔϧϕʔεղੳث͸ࠓޙ΋ར༻͞Εଓ͚ΔͩΖ͏


    • ࢖͍΍͢͞΍ϝϯςφϯεͷ༰қ͞ͱ͍͏ར༻࣌ͷ؍఺͔Β


    • Weak SupervisionͰͷֶशσʔλ࡞੒ͱ͍͏ɺΑΓߴਫ਼౓ͳख๏ʹܨ͕Δେ͖ͳ໾໨


    • จ຺ΛߟྀͰ͖ΔTransformerϕʔεͷݻ༗දݱநग़͸ݱ࣮త


    • ӳޠΛର৅ͱͨ͠ݚڀͰ͸ϧʔϧϕʔεͱൺֱͯ͠΋ߴਫ਼౓


    • ja-timexͷ݁ՌΛ࢖ͬͨWeak SupervisionతΞϓϩʔνͰja-timexͷڧԽ൛Λ࡞ͬͯΈ͍ͨ


    • ධՁ༻ͷ೔ຊޠσʔληοτ΋࡞ΒͶ͹……


    • େن໛ݴޠϞσϧࣗମ͕࣌ؒ৘ใදݱΛਖ਼͘͠ѻ͑Δͷ͸·ͩ·ͩઌ͔΋


    • ςΩετੜ੒΍ਪ࿦ϨϕϧͰ͖ͪΜͱͨ݁͠Ռ͕ฦͬͯ͘Δͷ͸ظ଴Ͱ͖ͳ͍ (?)


    • End2Endతʹ࣌ؒ৘ใΛ͍͍ײ͡ʹѻͬͯ͘ΕΔͷ͸ɺ·ͩ·ͩઌ͔΋

    View Slide

  28. 28
    ·ͱΊ
    • ja-timexͷ঺հ


    • ࣌ؒ৘ใදݱΛ؆୯ʹநग़/ن֨ԽͰ͖Δϧʔϧϕʔεͷղੳث


    • େྔͷਖ਼نදݱΛهड़͢Δ͜ͱͰ࣮ݱ͍ͯ͠Δ


    • จ຺Λߟྀͨ࣌ؒ͠৘ใදݱͷநग़ʹ͸ɺϧʔϧϕʔεͰ͸ݶք͕͋Δ


    • ϧʔϧϕʔεղੳثͷ͜Ε͔Β


    • େن໛ݴޠϞσϧʹΑΔվྑ͕ൃද͞Ε࢝Ίͨ


    • ϧʔϧϕʔεղੳثͷ݁ՌΛֶशσʔλʹར༻͢ΔྲྀΕ


    • ࠓޙTransformerϕʔεͷݻ༗දݱநग़ثʹஔ͖׵Θ͍͖ͬͯͦ͏


    • ࣌ؒ৘ใදݱ͸ɺৗࣝΛѻ͏λεΫͷҰ෦ͱଊ͑Δ͜ͱ͕Ͱ͖Δ


    • େن໛ݴޠϞσϧ͕࣌ؒ৘ใදݱΛݩʹਖ਼͘͠ਪ࿦͢Δͷ͸·ͩ·ͩ೉ͦ͠͏

    View Slide

  29. 29
    [1] খ੢ޫ, ઙݪਖ਼޾, & લ઒تٱ༤. (2013). ʰݱ୅೔ຊޠॻ͖ݴ༿ۉߧίʔύεʱ ʹର͢Δ࣌ؒ৘ใΞϊςʔγϣϯ. ࣗવݴޠॲཧ, 20(2),
    201-221.


    [2] Sun, W., Rumshisky, A., & Uzuner, O. (2013). Temporal reasoning over clinical text: the state of the art. Journal of the American
    Medical Informatics Association, 20(5), 814-819.


    [3] Alfattni, G., Peek, N., & Nenadic, G. (2020). Extraction of temporal relations from clinical free text: A systematic review of current
    approaches. Journal of Biomedical Informatics, 108, 103488.


    [4] ੒ᖒࠀຑ (2014)ʮࣗવݴޠॲཧʹ͓͚Δ਺ྔදݱͷऔΓѻ͍ʯ౦๺େֶେֶӃ म࢜࿦จ


    [5] Chen, S., Wang, G., & Karlsson, B. (2019). Exploring word representations on time expression recognition. Technical report,
    Microsoft Research Asia.


    [6] Almasian, S., Aumiller, D., & Gertz, M. (2021). BERT got a Date: Introducing Transformers to Temporal Tagging. arXiv preprint
    arXiv:2109.14927.


    [7] Narisawa, K., Watanabe, Y., Mizuno, J., Okazaki, N., & Inui, K. (2013, August). Is a 204 cm man tall or small? acquisition of
    numerical common sense from the web. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics
    (Volume 1: Long Papers) (pp. 382-391).


    [8] Lin, B. Y., Lee, S., Khanna, R., & Ren, X. (2020). Birds have four legs?! numersense: Probing numerical commonsense knowledge of
    pre-trained language models. Proceedings of EMNLP
    Reference

    View Slide