Upgrade to Pro — share decks privately, control downloads, hide ads and more …

erikuroda 4th ph.d defense

Eri KURODA
February 08, 2025
6

erikuroda 4th ph.d defense

Feb.10 2025
Eri Kuroda@Ochamonizu University: 4th ph.d defense, public hearing

Eri KURODA

February 08, 2025
Tweet

Transcript

  1. Real-world Understanding based on Predictions with Physical Properties ʢ෺ཧಛੑΛ;·͑ͨ༧ଌʹج࣮ͮ͘ੈքཧղʣ ͓஡ͷਫঁࢠେֶ

    খྛҰ࿠ݚڀࣨ ത࢜ޙظ՝ఔ3೥ ࠇా ክᣦ ത࢜࿦จୈ4ճ৹ࠪձɾެௌձ 2025.2.10
  2. 96 ࿦จߏ੒ • ݚڀഎܠ • ch1ɿΠϯτϩμΫγϣϯ • ຊݚڀʹ͓͚Δؔ࿈ݚڀ • ch2ɿχϡʔϥϧωοτϫʔΫ

    • ch3ɿ࣮ੈքཧղɾݴޠϞσϧ • ఏҊݚڀ • ch4ɿώτ೴಺ͷ༧ଌූ߸ԽΛ໛ͨ͠ਂ૚ੜ੒ֶशϞσϧͷߏங • ch5ɿ෺ཧ؀ڥʹ͓͚ΔมԽ఺நग़ͷऔΓ૊Έ • ch6ɿ࣮ੈքʹ͓͚Δ෺ཧ؀ڥͷ༧ଌਪ࿦Ϟσϧ • ch7ɿ෺ཧతৗࣝΛ;·͑ͨ෺ମিಥ࣌ͷ༧ଌʹ͓͚Δݴޠදݱ • ݁࿦ • ch8ɿ݁࿦ 2
  3. 96 ໨࣍ • ݚڀഎܠ • ch1ɿΠϯτϩμΫγϣϯ • ຊݚڀʹ͓͚Δؔ࿈ݚڀ • ch2ɿχϡʔϥϧωοτϫʔΫ

    • ch3ɿ࣮ੈքཧղɾݴޠϞσϧ • ఏҊݚڀ • ch4ɿώτ೴಺ͷ༧ଌූ߸ԽΛ໛ͨ͠ਂ૚ੜ੒ֶशϞσϧͷߏங • ch5ɿ෺ཧ؀ڥʹ͓͚ΔมԽ఺நग़ͷऔΓ૊Έ • ch6ɿ࣮ੈքʹ͓͚Δ෺ཧ؀ڥͷ༧ଌਪ࿦Ϟσϧ • ch7ɿ෺ཧతৗࣝΛ;·͑ͨ෺ମিಥ࣌ͷ༧ଌʹ͓͚Δݴޠදݱ • ݁࿦ • ch8ɿ݁࿦ 3
  4. 96 ώτʹΑΔ࣮ੈքཧղͱ༧ଌ ώτʹΑΔ࣮ੈքͷଊ͑ํ • ෺ମ͕ͳʹͰ͋Δ͔ͷཧղ • ෺ମͷಈ͖ͷཧղ • ಈ͖ͷ༧ଌ ྫ

    • ಓ࿏ͷԣஅ • ࣗಈंͷӡస • ৗʹ༧ଌʢ΍ਪྔʣΛ͠ͳ͕Βੜ׆ ώτʹΑΔ༧ଌ • ॠؒతͳ༧ଌʢ୹ظؒʣ • ҼՌతͳ༧ଌʢ௕ظؒʣ 5 ૊Έ߹Θͤͨ༧ଌΛৗʹߦ͍ͬͯΔ
  5. 96 ػցʹΑΔ࣮ੈքͷཧղͱ༧ଌ ώτʹয఺Λ͋ͯͨݚڀ • AI, ػցֶशͷଆ໘ͰͷཧղɿχϡʔϥϧωοτϫʔΫ, BMIʢϒϨΠϯϚγϯΠϯλʔϑΣʔεʣ • ҩֶతͳཧղɿfMRI •

    ಺໘తͳཧղɿ৺ཧֶ ػցͱώτ͕ڞଘ͢Δੈք • ૒ํ޲ͷཧղ • ώτ͕ओମͱͳͬͯ࢖͏΋ͷ / ώτͷੜ׆Λࢧ͑Δ΋ͷ ػցʹ͓͚ΔώτΒ͠͞ͷදݱ • ۙ೥LLMͷ։ൃ͕Ί͟·͍͠ • ݴޠੜ੒͕Ͱ͖Δ΋ͷ͔ΒɺώτΒ͠͞΍ώτͷࢥߟʹয఺͕౰ͯΒΕ͖͍ͯͯΔ 6
  6. 96 എܠɿػցʹΑΔώτΒ͍͠ਪ࿦ɾ༧ଌΛ໨ࢦͯ͠ ػցֶशʹΑΔ࣮ੈքཧղɾ༧ଌݚڀͷॏཁੑ • ࣗಈӡసɺհޢ༻ϩϘοτͱ͍ͬͨਓؒͷੜ׆Λࢧԉ͢Δػೳ • ੈքϞσϧ [Ha+, 18] •

    ؍ଌͨ͠؀ڥͷܹ͔ࢗΒώτ೴಺Ͱͦͷ؀ڥΛϞσϧԽ͠ɺγϛϡϨʔγϣϯ͢Δ͜ͱͰ੒ཱ • AIݚڀ͕ࠓޙऔΓ૊Ή΂͖3ͭͷ՝୊ͷ1ͭ [LeCun, 23] • ʮػց͸͍͔ʹͯ͠؍࡯͔ΒੈքΛදݱͯ͠༧ଌΛ͠ɺߦಈ͢Δ͜ͱΛֶͿ͜ͱ͕Ͱ͖Δͷ͔ʯΛࣔࠦ • าߦऀͷΘ͔ͣͳಈ͖ͷҙຯ߹͍Λ൑அͰ͖͍ͯͳ͍ ༧ଌݚڀ • ࣌ܥྻ༧ଌ • ҼՌਪ࿦༧ଌ • ϚϧνϞʔμϧ༧ଌ • ໨ʹͨ͠؀ڥΛɺώτͷΑ͏ʹཧղɾ༧ଌɾઆ໌͕Ͱ͖Δ࢓૊Έ 7
  7. 96 ໨తɿػցʹΑΔώτΒ͍͠ਪ࿦ɾ༧ଌΛ໨ࢦͯ͠ 8 • ෺ମͷ࣍ͷಈ͖Λ༧ଌ͠ɺߦಈΛܾఆ • গ͠ઌͷ༧ଌΛͯ͠ܭըΛཱͯΔ • ΍ΓऔΓ΍؍࡯͔ΒഎܠͳͲΛֶͿ •

    ग़དྷࣄͷॏཁͳ఺͕େࣄ • ࣮ੈքͱݴޠͷ݁ͼ͖ͭ ώτͷ࣮ੈքཧղɾ༧ଌ • ஞ࣍తͳʢܾ·ͬͨ୹ִ͍ؒͷʣ༧ଌ • ը૾ʢࢹ֮ʹ૬౰ʣΛ༧ଌ • ը૾ಛ௃ྔͷ༧ଌ = ࣮ੈքͷ༧ଌ • ෺ମͷ෺ཧ๏ଇΛ΋ͱʹͨ͠༧ଌ͕೉͍͠ • ෺ମʹ͍ͭͯͷʮ෺ཧಛੑΛཧղ͠ɺ༧ଌ͢Δʯ ͜ͱͱɺݴޠ͕݁ͼ͍͍ͭͯͳ͍ ܭࢉػʹΑΔ༧ଌ  ༷ʑͳ࣌ؒ෯Ͱ༧ଌΛߦ͑Δώτ೴಺ͷ֊૚ߏ଄Λ໛ͨ͠ਂ૚ੜ੒ֶशϞσϧͷߏங  ը૾಺ͷ෺ମͷ෺ཧಛੑͷมԽ͔Βɺಈ͖͕େ͖͘มΘΔมԽ఺Λநग़Ͱ͖Δ࢓૊Έ  มԽ఺ʢিಥʣΛ༧ଌ͢ΔมԽ఺༧ଌϞσϧͷఏҊ  ࣮ੈքͱݴޠΛ݁ͼ͚ͭΔͨΊʹɺ༧ଌ಺༰ΛݴޠͰදݱ ෺ཧతৗࣝΛ༩͑ͨͱ͖ɺ؀ڥͷಛੑΛ;·͑ͯɺΑΓৄࡉͳจষͰදݱ ໨త
  8. 96 11 ch4: A Deep Generative Model Imitating Predictive Coding

    in Human Brain ώτ೴಺ͷ༧ଌූ߸ԽΛ໛ͨ͠ ਂ૚ੜ੒ֶशϞσϧͷߏங
  9. 96 എܠɾ໨త 12 ώτ೴಺ʹ͓͚Δ࣌ؒೝࣝΛػցֶशϞσϧͰදݱ͍ͨ͠ PredNet [Lotter+, 2016] ώτͷΑ͏ʹ༷ʑͳ࣌ؒ෯Ͱ ະདྷΛ༧ଌͰ͖ͳ͍ ώτ೴಺ͷ֊૚ߏ଄Λ໛฿

    TD-VAE [Gregor+, 2018] ೚ҙͷ࣌ؒ෯Ͱ ະདྷΛ༧ଌՄೳ ώτ೴಺ʹ͓͚Δ৘ใॲཧ ػߏΛ൓ө͍ͯ͠ͳ͍ ༧ଌΛର৅ͱͨ͠୅දతͳઌߦݚڀ 1. ώτͷ೴಺৘ใॲཧػߏʹΑΓ͍ۙϞσϧΛߏங 2. ࣮ࡍͷώτ೴಺ͱͷ૬ؔͷௐࠪ ఏҊ
  10. 96 શମਤ 13 fMRIσʔλ ೴׆ಈ৘ใ ࣗવಈը૾ ༧ଌ ग़ྗ ը૾ ༧ଌϞσϧ

    ೴ ͷ ֊ ૚ త ͳ ߏ ଄ ॊ ೈ ͳ ࣌ ؒ ෯ Ͱ ͷ ༧ ଌ ॊೈͳ࣌ؒ෯Ͱকདྷͷ ग़དྷࣄΛ༧ଌՄೳʹ͢Δ
  11. 96 • ਂ૚ֶशΛ༻͍ͨɺಈը૾͔Β࣍ͷը૾Λ༧ଌ͢Δݚڀ • େ೴ൽ࣭ʹ͓͚Δ༧ଌූ߸ԽͷॲཧΛ໛฿ • ೴಺ͷ৘ใॲཧػߏΛදݱ PredNet [Lotter+, 16]

    14 ༧ଌූ߸Խͱ͸ • ༧ଌ஋ͱ؍ଌ஋ͷޡࠩΛࢉग़ • ޡࠩΛϘτϜΞοϓʹ఻ୡ • ޡࠩΛ࠷খԽ͢Δ༧ଌ஋Λग़ྗ • ༧ଌ஋Λτοϓμ΢ϯʹ఻ୡ ࣮ը૾ ༧ଌը૾ ࣌ࠁt →
  12. 96 • Temporal Difference Variational Auto-Encoder • ಈը૾͔Β೚ҙεςοϓઌͷը૾Λ༧ଌ͢Δݚڀ • ৴೦ঢ়ଶΛಋೖ

    • POMDPʢ෦෼؍ଌϚϧίϑܾఆաఔʣͷ৴೦ঢ়ଶʹ૬౰ • ॊೈͳ࣌ؒ෯Ͱͷ༧ଌ TD-VAE [Gregor+, 18] 16 ࣌ࠁt → ༧ଌը૾2 ༧ଌը૾1
  13. 96 TD-VAE [Gregor+, 18] 17 !!! !!" !!! !!" "

    " !! # # !!|!" $!! " % !" " " !" $!" " & !" Input Input Prediction Decoder Inference
  14. 96 TD-VAE [Gregor+, 18] 18 !!! !!" !!! !!" "

    " !! # # !!|!" $!! " % !" " " !" $!" " & !" Input Input Prediction Decoder Inference belief state(信念状態) 1. ώτͷ೴಺৘ใॲཧػߏ͕൓ө͞Ε͍ͯͳ͍ 2. fMRIデータとの相関が取れない ໰୊఺
  15. 96 !!! !"! ,ℓ "̂!!,ℓ "̂!!,ℓ$% !"!,ℓ%& "!!,ℓ$% "#"!,ℓ !"!!,ℓ$%

    $!' !!' Observation input input Error representation Belief state Representation layer Belief state 0th module Belief state 1st module $!! Observation "!',ℓ !"",ℓ "!!,ℓ !"!,ℓ%& ఏҊϞσϧ [Kuroda+, 21] 19 ɿ؍ଌ ɿ৴೦ͷਪҠ ɿະདྷ͔Βͷਪ࿦ ɿࠩ෼ͷ఻೻ ɿ༧ଌϞσϧͷߋ৽ ɿ༧ଌͷ఻೻ ɿະདྷͷ༧ଌ ɿ༧ଌੜ੒
  16. 96 ໨త • ༧ଌִؒΛ1ඵʹͨ͠ͱ͖ɺࣗવಈը૾ͷ༧ଌ͕Մೳ͔Λݕূ ઃఆ • σʔληοτɿThe KITTI dataset •

    ࣌ࠁt=6Ҏ߱Λ༧ଌ ࣮ݧ1ɿ௕͍࣌ؒ෯Ͱͷ༧ଌ 20 ֶशઃఆ train data ʢ= 9:1ʣ 28,297 test data 426 #Layer 4 Size of convolutional filter 3 ×3 (for all convolutions) # Channels From lower module 3, 48, 96, 192 Optimization algorithm Adam [Kingma+, 15]
  17. 96 • Vision meets Robotics: The KITTI Dataset • ΧʔϧεϧʔΤ޻Պେֶɺ๛ా޻ۀେֶγΧΰߍʹΑΔڞಉ࡞੒

    • ੈք࠷େن໛ͷंࡌ༻ϕϯνϚʔΫςετ ࣮ݧ1ɿσʔληοτ 21 ઃఆ ΧςΰϦʔ਺ 6छྨ fps 10 (ϑϨʔϜ / ඵ) Ξϊςʔγϣϯ 15छྨ ը૾αΠζ 1242×375 Train data 28,297 Test data 426
  18. 96 • ը૾ධՁࢦඪSSIMʹΑΔ༧ଌը૾ͷҰக౓ • Structure similarity • 𝑆𝑆𝐼𝑀 𝑥, 𝑦

    = ("#!#"$%#)("'!"$%$) (#! $$#" $$%#)('! $$'" $$%$) • xɿݩը૾, yɿ༧ଌը૾, μɿฏۉ, σɿ෼ࢄ ࣮ݧ1ɿ݁Ռ ਫ਼౓ 23 SSIM↑ ఏҊϞσϧ 0.85 PredNet 0.65
  19. 96 ໨త • ఏҊϞσϧʹΑΔਪ࿦͕ɺ࣮ࡍͷώτ೴ͱ૬͕ؔ͋Δͷ͔ ઃఆ • σʔληοτɿࣗવಈը૾σʔληοτ [Nishimoto+, 11] •

    Ϟσϧͷಛ௃දݱͱώτ೴͔Βਪఆͨ͠ಛ௃දݱΛൺֱ ࣮ݧ2ɿώτ೴ͱͷ૬ؔؔ܎ͷௐࠪ 24
  20. 96 ࣮ݧ2ɿશମਤ 25 ೴׆ಈ৘ใ ϖΞσʔλ Ϟσϧͷ ಛ௃දݱ ਪఆ͞Εͨ ಛ௃දݱ ૬ؔ܎਺

    ճؼϞσϧ ʢϦοδճؼʣ ࣗવಈը૾ ༧ଌϞσϧ [Kay+, 08][Nishimoto+, 11] [Schoenmakers+, 13]
  21. 96 PredNetʹ͓͚ΔϦοδճؼ ఏҊϞσϧʹ͓͚ΔϦοδճؼ ࣮ݧ2ɿ֓ཁਤ 26 R1 R0 R3 R2 Error

    representation 3rd module Input layer Representation layer Ridge regression Brain activity watch movie, 10min cut 10 frames/sec 2nd module 1st module 0th module Stimulus image Belief state 0th module 2nd module R1 R0 R3 R2 Ridge regression Brain activity watch movie, 10min Stimulus image 3rd module 1st module cut 10 frames/sec Stimulus image Error representatio Belief state Representatio layer R1 R0 R3 R2 Error representation 3rd module Input layer Representation layer Ridge regression Brain activity watch movie, 10min cut 10 frames/sec 2nd module 1st module 0th module Stimulus image Belief state 0th module 2nd module R1 R0 R3 R2 Ridge regression Brain activity watch movie, 10min Stimulus image 3rd module 1st module cut 10 frames/sec Stimulus image Error representation Belief state Representation layer
  22. 96 ࣮ݧ2ɿ݁Ռ 27 ૬ؔ܎਺ˢ α PredNet TD-VAE ఏҊϞσϧ 0.5 1,000

    25,000 0.5 1,000 25,000 0.5 1,000 25,000 R0 ࠷Լ૚ 0.2623 0.2971 0.3207 0.2636 0.2983 0.3285 0.2637 0.2983 0.3291 R2 0.0925 0.1459 0.1955 - - - 0.0003 0.0012 0.0016 R3 0.0254 0.1217 0.1871 - - - 0.0004 0.0009 0.0012 αɿϞσϧύϥϝʔλ R1ɿϦιʔεͷ੍໿ͰਪఆෆՄ
  23. 96 • ώτ೴಺ʹ͓͚Δ࣌ؒೝࣝΛػցֶशϞσϧͰදݱ͍ͨ͠ • ώτ೴಺ͷ֊૚ߏ଄Λ໛ͨ͠ɺॊೈͳ࣌ؒ෯Ͱͷ༧ଌ͕Ͱ͖Δ༧ଌϞσϧͷߏங ࣮ݧ1 • PredNetΑΓ΋༧ଌִ͕ؒ௕͍ը૾͕ੜ੒Ͱ͖ͨ ࣮ݧ2 •

    ࠷Լ૚ʢR0ʣͰ͸ɺώτ೴ͱ૬͕ؔݟΒΕΔ͜ͱ͕Θ͔ͬͨ • Ϟσϧ͕ώτ೴಺ͷ಺෦දݱͱ͍ۙ → ώτ೴ͷ࡞ۀϞσϧʹͳΓ͏Δ • ্Ґ૚ʹͳΔͱ૬͕ؔݟΒΕͳ͔ͬͨ • ਪ࿦ΛR0ͷΈͰߦ͍ͬͯΔͨΊ • ೴׆ಈ৘ใ͕ը૾಺ͷ؀ڥΛཧղͨ͠σʔλͱ͸ݴ͑ͳ͍ • ͋͘·Ͱ ώτ͕ݟͨ΋ͷͷ৴߸ͱɺϞσϧͷ಺෦දݱʹ૬ؔ ͕ݟΒΕͨ ͱ͍͏͜ͱʹͱͲ·Δ ·ͱΊɾߟ࡯ 28
  24. 96 29 ch5: Extraction of Motion Change Points based on

    the Physical Characteristics of Objects ը૾಺ͷ෺ମʹண໨ͨ͠ ಈ͖ͷมԽ఺நग़΁ͷऔΓ૊Έ
  25. 96 ͜͜·Ͱͷ༧ଌ • ೖྗɿը૾ • ը૾ಛ௃ྔͷ༧ଌʹͱͲ·Δ • ώτΒ͍͠༧ଌ • ೖྗશ෦Λࡉ͔͘༧ଌ͍ͯ͠ΔΘ͚Ͱ

    ͸ͳ͍ • ॏཁͳ෦෼ΛϐοΫΞοϓͯ͠༧ଌ ՝୊఺ • ը૾಺ͷঢ়گ͕ཧղͰ͖͍ͯͳ͍ • ॏཁͳ෦෼Λऔ͖ͬͯͯɺͦͷ৘ใ ʹରͯ͠ͷ༧ଌ͕Ͱ͖͍ͯͳ͍ • LLMͱͷੑೳͷൺֱ എܠɾ໨త 30
  26. 96 ώτʹΑΔ؀ڥͷཧղ • ؀ڥ಺ʢexɿը૾, ಈըͳͲʣʹ͋Δ෺ମ͕ԿͰ͋Δ͔Λೝࣝ • ೝࣝͨ͠෺ମͷಈ͖ʢexɿ଎౓, ଞͷ෺ମͱͷؔ܎ੑʣΛཧղ ݚڀ1Ͱͷ؀ڥͷཧղ •

    কདྷى͜Γͦ͏ͳը૾ʢঢ়گͷඳࣸʣΛ༧ଌͨ͠ʹ͗͢ͳ͍ • ؀ڥʹԿ͕͋Δ͔ʢ෺ମͷཧղʣɺ෺ମ͕ͲͷΑ͏ʹಈ͍͍ͯΔ͔ʢ෺ཧಛੑͷཧղʣ͕ Ͱ͖͍ͯͳ͍ • ؀ڥʹ͋Δ෺ମ΍ಈ͖Λཧղ͢Δ࢓૊Έͷ։ൃ ෺ཧಛੑ ͱ͸ 31
  27. 96 എܠɾ໨త 32 • Variational Temporal Abstraction (VTA) [Kim+, 2019]

    • ࢹ֮৘ใ͔Β؀ڥͷજࡏߏ଄Λऔಘ • ؀ڥ͕มԽ͢ΔλΠϛϯάΛநग़ • ϐΫηϧͷมԽʹண໨ • ෺ମͷ෺ཧಛੑʢ଎౓ͳͲʣΛ ߟྀͰ͖͍ͯͳ͍ 1. ෺ମؒͷؔ܎ΛάϥϑϕʔεͰදݱ 2. άϥϑͷมԽΛ΋ͱʹ؀ڥͷมԽ఺ͷநग़ ఏҊ
  28. 96 શମਤ 33 ै དྷ ͷ ख ๏ 3D໎࿏ ը૾ಛ௃ྔͷΈ

    ࣮ੈքʹ͍ͭͯ ཧղΛ͍ͯ͠ͳ͍ CLEVRER [Yi+, 19] άϥϑߏ଄ ఏ Ҋ ख ๏ ෺ମݕग़ ଎౓ɾՃ଎౓ ը૾ಛ௃ྔͳͲ มԽ఺ͷ ϑϥά Λநग़ VTA ! ! ! ! ! " " " " " ! ! ! ! # # # # # 0 1 0 0 ! ! ! " " # # # ! ! ! ! ! " " " " " ! ! ! ! # # # # # 0 1 0 0 ! ! ! ! ! " " # # # # # observation (input) observation abstraction boundry indicator temporal abstraction
  29. 96 • Temporal Abstraction • ࣌ؒํ޲ʹ͓͍ͯɼ֊૚Խ͞Ε֤ͨϨϕϧͷΞΫγϣϯΛ౷߹ Temporal Abstraction 34 ྫɿྉཧΛ͢Δ

    Ϩγϐͷબ୒ ങ͍෺ ௐཧ ುΛ༻ҙ εʔύʔʹߦ͘ ುΛ͔͖ࠞͥΔ ۩ࡐΛ੾Δ ࿹ɾख ଍ ࿹ɾख ࿹ɾख ϝϞாΛ༻ҙ ख ௐཧ ख ߴ த ௿ ϖϯΛ࣋ͭ ങ͍෺Ϧετ ͷ࡞੒ ങ͍෺Ϧετ ͷ࡞੒ ಉҰঢ়ଶ ಉҰঢ়ଶ
  30. 96 • Temporal Abstraction • ࣌ؒํ޲ʹ͓͍ͯɼ֊૚Խ͞Ε֤ͨϨϕϧͷΞΫγϣϯΛ౷߹ Temporal Abstraction 35 Ϩγϐͷબ୒

    ങ͍෺ ௐཧ ುΛ༻ҙ εʔύʔʹߦ͘ ುΛ͔͖ࠞͥΔ ۩ࡐΛ੾Δ ࿹ɾख ଍ ࿹ɾख ࿹ɾख ϝϞாΛ༻ҙ ख ख ߴ த ௿ ϖϯΛ࣋ͭ ങ͍෺Ϧετͷ࡞੒ ಉҰঢ়ଶ ಉҰঢ়ଶ ྫɿྉཧΛ͢Δ
  31. 96 Variational Temporal Abstraction [Kim+, 19] 37 ! ! !

    ! ! " " " " " # # # # # ! ! ! ! ! " " # # # # # observation (input) observation abstraction boundry indicator temporal abstraction
  32. 96 Variational Temporal Abstraction [Kim+, 19] 38 ϑϥάmͷಋೖ ! !

    ! ! ! " " " " " ! ! ! ! # # # # # 0 1 0 0 ! ! ! ! ! " " # # # # # observation (input) observation abstraction boundry indicator temporal abstraction
  33. 96 • CLEVRER [Yi+, 2020] • CoLlision Events for Video

    REpresentation and Reasoning σʔληοτ 39 ֓ཁ ಈը 20,000 ݸ (train:val:test=2:1:1) ϏσΦͷ௕͞ 5 ඵ ϑϨʔϜ਺ 128ϑϨʔϜ ܗঢ় ཱํମɾٿɾԁப ૉࡐ ϝλϧɾϥόʔ ৭ փɼ੺ɼ੨ɼ྘ɼ஡ɼਫ৭ɼࢵɼԫ৭ Πϕϯτ ग़ݱɼফࣦɼিಥ Ξϊςʔγϣϯ object id, Ґஔ, ଎౓, Ճ଎౓
  34. 96 • ؀ڥͷ෺ཧಛੑ͔Β࡞੒ͨ͠σʔληοτ Physical Training Dataset 40 ෺ମೝࣝ ෺ମͷ Ґஔ৘ใ

    άϥϑߏ଄ ଎౓ Ճ଎౓ ෺ମಉ࢜ͷ Ґஔํ޲ͷϑϥά ຒΊࠐΈ ϕΫτϧ ݁߹
  35. 96 ෺ମೝࣝ • YOLO v3 [Redmon+, 18] • ෺ମͷʨܗঢ়ʩ •

    YOLACT [Bolya+,2019] • ෺ମͷʨܗঢ়ɼ৭ʩ/ʨܗঢ়ɼ৭ɼૉࡐʩ Ґஔ৘ใ • औಘͨ͠ό΢ϯσΟϯάϘοΫεͷ ࠲ඪ͔Β෺ମͷத৺࠲ඪΛࢉग़ Physical Training Dataset − ෺ମೝࣝɾҐஔ৘ใ 41 ࣮ը૾ YOLO v3 YOLACT {෺ମ, ৭} YOLACT {෺ମ, ৭, ૉࡐ} (𝑥!, 𝑦!) (𝑥", 𝑦") 𝑐 = 𝑥, 𝑦 = ( 𝑥! + 𝑥" 2 , 𝑦! + 𝑦" 2 ) c
  36. 96 ଎౓ɾՃ଎౓ ଎౓ • 𝑣#( = (𝑥$ − 𝑥$%! )/𝑒𝑡&'()*

    • 𝑣+( = (𝑦$ − 𝑦$%! )/𝑒𝑡&'()* Ճ଎౓ • 𝑎#( = (𝑣#( − 𝑣#) )/(𝑒𝑡&'()*×𝑡) • 𝑎+( = (𝑣+( − 𝑣+) )/(𝑒𝑡&'()*×𝑡) ෺ମؒͷҐஔؔ܎ • main object = (𝑥*+,- , 𝑦*+,- ) • others = (𝑥./012 , 𝑦./012 ) • 𝑥3,44 = 𝑥./012 − 𝑥*+,- • 𝑦3,44 = 𝑦./012 − 𝑦*+,- Physical Training Dataset − ଎౓ɾՃ଎౓ɾ෺ମؒͷҐஔ 42 ϑϨʔϜؒͷܦա࣌ؒɿ𝑒𝑡!"#$% = 5/128 x main object others y ୈ2৅ݶ + − + ୈ1৅ݶ ୈ2৅ݶ − ୈ4৅ݶ ୈ3৅ݶ ୈ1৅ݶ ୈ3৅ݶ ୈ4৅ݶ x y
  37. 96 άϥϑߏ଄ • ϊʔυ৘ใ • ෺ମͷܗঢ়ɼ৭ɼૉࡐ • Ͳͷ෺ମಉ͕͓࢜ޓ͍ͷಈ͖ʹؔ༩ͯ͠ ͍Δ͔͸͜ͷ࣌఺Ͱ͸Θ͔Βͳ͍ͨΊɺ ׬શάϥϑͰ࡞੒

    ຒΊࠐΈϕΫτϧ • node2vec [Grover+, 16] • graph2vec [Narayanan+, 17] Physical Training Dataset − άϥϑߏ଄ 43 [[0.54, 0.29, 0.61…], [[0.82, 0.91, 0.15…], … [[0.14, 0.35, 0.69…]] 埋め込みベクトル
  38. 96 ໨త • ෺ମͷಈ͖ͷมԽ͔Βɺিಥ΍৔໘ͷେ͖ͳมԽͷλΠϛϯάΛऔಘͰ͖Δ͔Λݕূ ઃఆ • ਖ਼ղͷিಥɿ30ϑϨʔϜ ਓʹΑΔিಥͷೝࣝɿ32ϑϨʔϜ • ਖ਼ղൣғɿ30ʙ32ϑϨʔϜ

    • F1είΞͰࢉग़ ࣮ݧ1ɿมԽ఺நग़ϞσϧʹΑΔিಥͷݕग़ 45 ֶशઃఆ σʔλ਺ʢ8:1:1ʣ 600,000 όοναΠζ 100 ग़ྗ਺ʢϑϥά਺ʣ 80 ࠷దԽؔ਺ Adam [Kingma+, 17] Τϥʔؔ਺ KLμΠόʔδΣϯε
  39. 96 • LLMʢGPTܥʣ͸ը૾಺ͷ෺ମΛࢹ֮తɾ෺ཧతʹଊ͑ΔೳྗΛ΋͍ͬͯΔͷ͔ • ਪ࿦ʹ͓͚ΔڧΈɾऑΈ • Ϟσϧͷҧ͍ GPTͷछྨ • GPT-4o

    • GPT-4o-FT • CLEVRERΛೝࣝͰ͖ΔΑ͏ʹϑΝΠϯνϡʔχϯάͨ͠Ϟσϧ • GPT-4o mini • GPT-4 ࣮ݧ2ɿGPTΛ༻͍ͨিಥͷݕग़ 47
  40. 96 ݕূઃఆ • 100িಥ • িಥલ10ϑϨʔϜɺিಥޙ9ϑϨʔϜͷ߹ܭ20ຕΛೖྗ • ਖ਼ղσʔλɿϑϨʔϜ19 ࢹ֮ɿϑϨʔϜ21 •

    ग़ྗɿϑϨʔϜ਺ ϓϩϯϓτ • simple • ԿϑϨʔϜ໨Ͱ෺ମಉ͕࢜িಥ͢Δ͔આ໌͍ͯͩ͘͠͞ɻ • detail • #෺ମͷ৭, #෺ମͷܗ, #෺ମͷૉࡐ • #ঢ়گɿෳ਺ͷ෺ମ͕ಈ͍͓ͯΓɺ2ͭͷ෺ମ͕িಥ͠·͢ɻ • #࣭໰ɿͲͷ෺ମ͕ɺԿϑϨʔϜ໨Ͱিಥ͍ͯ͠Δ͔Λઆ໌͍ͯͩ͘͠͞ɻ ࣮ݧ2ɿGPTΛ༻͍ͨিಥͷݕग़ 48
  41. 96 ࣮ݧ2ɿGPTΛ༻͍ͨিಥͷݕग़ − ݁Ռ 49 Model Prompt macro-F1 Accuracy ෺ମ͕͍͋ͬͯΔ͔ʢ৭,

    ܗ, ૉࡐʣ simple detail GPT-4o 㾎 75.4 2.53 㾎 75.3 2.65 GPT-4o-FT 㾎 75.8 2.65 㾎 76.1 2.63 GPT-4o mini 㾎 28.5 1.35 㾎 33.3 1.58 GPT-4 㾎 0 0 㾎 0 0 มԽ఺நग़Ϟσϧ 63.5 2.75 macro-F1 • ֤িಥʹ͓͚ΔF1είΞͷฏۉ Accuracy • িಥʹؔ܎͢Δ෺ମͷछྨ͕͍͋ͬͯΔ͔Ͳ͏͔ max: 3
  42. 96 • ը૾಺ʹ͋Δ෺ମͷಈ͖͔Βɺিಥ΍ফࣦͷλΠϛϯάΛଊ͑ΒΕΔ࢓૊Έͷߏங ࣮ݧ1ɿมԽ఺நग़Ϟσϧ • ਖ਼ղσʔλͰ͋ΔΞϊςʔγϣϯϕʔεͷ݁Ռ͕࠷΋ਫ਼౓͕ߴ͍ • YOLACTͰݕ஌ͨ݁͠Ռͷํ͕ɺYOLO v3ͷ΋ͷΑΓਫ਼౓͕ߴ͍ •

    node2vecͷํ͕graph2vecΑΓਫ਼౓͕ߴ͍ • ը૾ಛ௃ྔͱ෺ཧ৘ใΛ·ͱΊͯѻ͏ͱɺը૾ಛ௃ྔ͕ϊΠζʹͳΔ ࣮ݧ2ɿ(15 • GPT-4o-FTͷ݁Ռ͕࠷΋ߴ͍ • CLEVRERΛೝࣝͰ͖ΔΑ͏ʹfinetuning͔ͨ͠Β • ΋ͱ΋ͱͷਪ࿦ೳྗ͕ߴ͍΄Ͳɺmacro-F1ͷείΞ΋ߴ͍ • ͔͠͠GPTܥ΋ը૾ಛ௃ྔͰ؀ڥΛཧղ͍ͯ͠ΔͷͰɺώτͷΑ͏ʹ؀ڥΛଊ͑ͨͱ͸ݴ͍೉͍ • มԽ఺நग़Ϟσϧ͸෺ମͷछྨͷཧղ͸ߴ͔͕ͬͨɺিಥΛ΄΅ଊ͑ΒΕΔΑ͏ʹͳͬͨΘ͚Ͱ͸ͳ͍ ·ͱΊɾߟ࡯ 50
  43. 96 51 ch6: Predictive Inference Model of the Physical Environment

    ࣮ੈքʹ͓͚Δ ෺ཧ؀ڥͷ༧ଌਪ࿦Ϟσϧ
  44. 96 ͜͜·Ͱͷݚڀ • ը૾಺ͷ෺ମͷಈ͖ʹண໨ • িಥ΍ফࣦͳͲͷλΠϛϯάΛ ଊ͑Δ͜ͱ͕Ͱ͖ΔΑ͏ʹͳͬͨ • ਓͷ࣮ੈքೝࣝͱ༧ଌ •

    ݟͨ΋ͷΛࢹ֮తɾ෺ཧతʹ༧ଌ͢Δ ՝୊఺ • ࢹ֮ɾಈ͖ͷ྆ํͷ༧ଌ͸Ͱ͖ͯ ͍ͳ͍ • LLMࣗମʹ෺ཧಛੑΛཧղ͢Δ ೳྗ͸͋Δͷ͔ എܠɾ໨త 52
  45. 96 શମਤ 53 physical training data • άϥϑߏ଄ͷຒΊࠐΈϕΫτϧ • ֤෺ମͷ଎౓ɾՃ଎౓

    • ෺ମؒͷҐஔؔ܎ CLEVRER [Yi+, 19] 2. ը૾ͷ༧ଌ (predicted image) • PredNet [Lotter+, 16] • PredRNN [Wang+, 17] • PredRNN v2 [Wang+, 21] • PreCNet [Straka+, 23] ༧ଌϞσϧͷߏங มԽ఺༧ଌ Ϟσϧ • VTA [Kim+, 19] (Variational Temporal Abstraction) 1. িಥͷλΠϛϯά ͷϑϥά Output Input Input
  46. 96 PredNet [Lotter+, 16] • େ೴ൽ࣭ʹ͓͚Δ༧ଌූ߸Խͷ ॲཧΛ໛฿ • ΤϥʔΛ֊૚తʹਪ࿦ •

    ݚڀ1Ͱ࢖༻ͨ͠༧ଌϞσϧͱಉ͡ PreCNet [Straka+, 23] • PredNetΛվྑ • ೖྗ৘ใશମΛຖճਪ࿦ ج൫ʹͳΔ༧ଌϞσϧ 54 !ℓ"# !ℓ " # ℓ"# "ℓ"# " # ℓ "ℓ $ℓ"# $ℓ ⊝ ⊝ conv Prediction Target pool conv input Error +,-ReLU subtract %$! Input Representation conv LSTM !! ℓ ⊝ ⊝ "! ℓ#$ # $ ! ℓ#$ # $ ! ℓ "! ℓ %! upsample !! ℓ#$ ⊝ ⊝ upsample conv LSTM conv +,- ReLU subtract conv input conv LSTM Representation Pediction Error +,- ReLU subtract
  47. 96 PredRNN [Wang+, 2017] • ConvLSTMΛ֊૚ʹͨ͠ܗͷ༧ଌϞσϧ • ۭؒɾ࣌ؒͷ྆ํʹHʢӅΕ૚ʣ͕ೖྗ PredRNN v2

    [Wang+, 2022] • PredRNNΛվྑͨ͠৽ͨͳ༧ଌϞσϧ • HΛೖྗ͢ΔήʔτΛ૿΍ͨ͠ ج൫ʹͳΔ༧ଌϞσϧ 55 !!"# ℓ "! !! ℓ #! $! %! ℓ #′! '′! "′! %! ℓ"# '! !!"# ℓ "! !! ℓ ⨂ ⨂ ⨂ ⨂ ⨂ Input Gate Output Gate Input Modulation Gate Forget Gate Standard Temporal Memory Spatiotemporal Memory ! = !! + !" = 1 % & '# ( − '# + $ #%! 1 % & '# ( − '# " $ #%! ( %&'()*+& ℓ,! = cos(∆!! ℓ, ∆%! ℓ) !!"# ℓ "! !! ℓ #! $! %! ℓ #′! '′! "′! %! ℓ"# '! !!"# ℓ "! !! ℓ ⨂ ⨂ ⨂ ⨂ ⨂ Input Gate Output Gate Input Modulation Gate Forget Gate Standard Temporal Memory Spatiotemporal Memory PredRNN PredRNN v2 ΦϦδφϧConvLSTM ConvLSTMʹۭ࣌ؒهԱͷ ػߏΛ௥Ճ PredRNN, v2 ͷ಺෦ߏ଄
  48. 96 PredNetϕʔεͷมԽ఺༧ଌϞσϧ 56 image data !!"_ℓ%& !!"_ℓ ⊝ ⊝ #!"_ℓ%&

    $ % !"_ℓ%& $!"_ℓ%& $ % !"_ℓ $!"_ℓ !'"_ℓ%& !'"_ℓ ⊝ ⊝ #'"_ℓ%& $ % '"_ℓ%& $'"_ℓ%& $ % '"_ℓ $'"_ℓ img img output &'((!" #!"_ℓ &'(('" )( flag output )( = 0 ∶ &'(( < . )( = 1 ∶ &'(( > . &'(( = &'((!" + &'(('" 2!" Input Error Representation Prediction time t Physical data .: threshold Difference #'"_ℓ 2'" Input physical training data image data
  49. 96 PredRNN, PredRNN v2ϕʔεͷมԽ఺༧ଌϞσϧ 57 !!_#$% ℓ'( , #!_#$% ℓ'(

    !!_#$% !!_&'( ST-$%&'!"# ℓ%& ST-$%&'!"# ℓ%' ST-$%&'!"# ℓ%( ST-$%&'!"# ℓ%) ! " !)*_#$% ! " !)*_&'( $!)*_+,- ℓ'. $!_+,- ℓ'* $!_+,- ℓ'( $!_+,- ℓ'/ !!_+,- ℓ'* !!_+,- ℓ'( !!_+,- ℓ'/ $!_+,- ℓ'. $!_#$% ℓ'. %! = ' 0 ∶ +,--! < / 1 ∶ +,--! > / time ! $!)*_#$% ℓ'. image data ()***_,-. physical data ()***_!"# +,--! = +,--!_+,- + +,--!_#$% ST-$%&',-. ℓ%& ST- $%&',-. ℓ%' ST- $%&',-. ℓ%( ST- $%&',-. ℓ%) physical training data img output !!_+,- ℓ'( , #!_+,- ℓ'( !!_+,- ℓ'/ , #!_+,- ℓ'/ !!_+,- ℓ'. , #!_+,- ℓ'. !!_+,- ℓ'* , #!_+,- ℓ'* $!_#$% ℓ'* $!_#$% ℓ'( $!_#$% ℓ'/ !!_#$% ℓ'* !!_#$% ℓ'( !!_#$% ℓ'/ !!_#$% ℓ'/ , #!_#$% ℓ'/ !!_#$% ℓ'. , #!_#$% ℓ'. !!_#$% ℓ'* , #!_#$% ℓ'*
  50. 96 PreCNetϕʔεͷมԽ఺༧ଌϞσϧ 58 !!_#$% ℓ'( !!_#$% ℓ ⊝ ⊝ "!_#$%

    ℓ'( # $ !_#$% ℓ'( ! " !_#$% ℓ "!_#$% ℓ !!_#$% Input !!_)*+ ℓ'( !!_)*+ ℓ ⊝ ⊝ "!_)*+ ℓ'( ! " !_'() ℓ*+ ! " !_'() ℓ !!_&'( Input !"##!_#$% !"##!_&'( "!,(_#$% ℓ "!,(_)*+ ℓ "!_)*+ ℓ upsample upsample !! = # 0 ∶ '())! < + 1 ∶ '())! > + #$%%! = #$%%!_#$% + #$%%!_'() time ! image data physical data physical training data img img output pool pool Error Representation Prediction
  51. 96 • ؀ڥ಺ʹ͋Δ෺ମͷিಥΛ༧ଌͰ͖Δ͔ ઃఆ • σʔληοτɿCLEVRER, Physical Training Dataset •

    ਖ਼ղͷিಥɿ30ϑϨʔϜ ਓʹΑΔিಥͷೝࣝɿ32ϑϨʔϜ • ਖ਼ղൣғɿ30ʙ32ϑϨʔϜ • ର৅ൣғɿ6ύλʔϯʢiʙviʣº10ϑϨʔϜ • F1είΞɺmacro-F1 ࣮ݧ1ɿมԽ఺༧ଌϞσϧʹΑΔিಥ༧ଌ 60
  52. 96 ࣮ݧ1ɿઃఆ 61 PredRNNɾPredRNN v2 ϕʔε PreCNetϕʔε ֶशσʔλ਺ 600,000 600,000

    ςετσʔλ਺ 80,000 80,000 ΤϙοΫ 500,000 500,000 ϨΠϠʔ਺ 4 4 νϟϯωϧ਺ 128 3, 48, 96, 192 ΧʔωϧαΠζ 5×5 - ଛࣦؔ਺ Adam [Kingma+, 17] Adam [Kingma+, 17] ֶश཰ݮਰ 0.001 0.0001 αʢมԽ఺൑ఆͷᮢ஋ʣ 5 5
  53. 96 ࣮ݧ1ɿ݁Ռ − িಥϑϥάͷਫ਼౓ 62 ൣғ i ii iii iv

    v vi macro- F1 PredNet ϕʔε 40.0 50.0 50.0 40.0 57.1 50.0 42.5 PredRNN ϕʔε 50.9 54.8 53.1 48.9 60.6 61.7 57.5 PredRNN v2ϕʔε 51.4 57.5 54.6 50.6 62.7 64.2 59.2 PreCNet ϕʔε 62.1 64.2 59.2 60.8 68.9 69.8 65.8 িಥ͢Δͱ༧ଌͨ͠ϑϥά͕ਖ਼͍͔͠Ͳ͏͔
  54. 96 • িಥલͷ෺ମͷಈ͖Λ༩͑ͨͱ͖ɺͦͷޙʹੜ͡ΔিಥΛ༧ଌͰ͖Δ͔ ݕূઃఆ • 100িಥ • িಥ͕ى͖Δ15ϑϨʔϜલʙ6ϑϨʔϜલͷܭ10ϑϨʔϜΛ༩͑Δ • ϑϨʔϜ14Ҏ߱Ͱੜ͡ΔিಥͷϑϨʔϜ਺Λ༧ଌʢਖ਼ղ20ʙ22ʣ

    ϓϩϯϓτ • simple • ԿϑϨʔϜ໨Ͱ෺ମ͕িಥ͢Δ͔༧ଌ͍ͯͩ͘͠͞ɻ • detail • #෺ମͷ৭, #෺ମͷܗ, #෺ମͷૉࡐ • #ঢ়گɿෳ਺ͷ෺ମ͕ಈ͍͓ͯΓɺ2ͭͷ෺ମ͕িಥ͠·͢ɻ • #࣭໰ɿ෺ମ͕ԿϑϨʔϜޙʹিಥ͢Δ͔Λ༧ଌ͍ͯͩ͘͠͞ɻ ࣮ݧ2ɿGPTʹΑΔ෺ମͷিಥ༧ଌ 63
  55. 96 ࣮ݧ2ɿGPTʹΑΔ෺ମͷিಥ༧ଌ − ݁Ռ 64 Model Prompt macro-F1 Accuracy ෺ମ͕͍͋ͬͯΔ͔

    simple detail GPT-4o 㾎 55.2 1.95 㾎 54.7 1.95 GPT-4o-FT 㾎 61.5 2.11 㾎 62.0 2.10 GPT-4o mini 㾎 55.8 1.68 㾎 54.5 1.65 GPT-4 㾎 0 0 㾎 0 0 มԽ఺༧ଌϞσϧ 67.5 2.47
  56. 96 • ը૾಺ʹ͋Δ෺ମͷিಥͷλΠϛϯάΛɺࢹ֮తɾ෺ཧతͳมԽ͔Β༧ଌͰ͖ΔϞσϧͷߏங ࣮ݧ1ɿมԽ఺༧ଌϞσϧ • ϕʔεͱͳΔ༧ଌϞσϧͷਫ਼౓͕ɺมԽ఺༧ଌϞσϧͷਫ਼౓ʹӨڹ • PreCNet͕࠷΋ਫ਼౓͕ߴ͍ • ༧ଌը૾ͷਫ਼౓ɺmacro-F1ͷείΞ

    • F1είΞɿߴͯ͘΋໿65 • ׬શʹিಥͷλΠϛϯάΛ͔ͭΊͨΘ͚Ͱ͸ͳ͍ • վળͷ༨஍͋Γ ࣮ݧ2ɿGPT • มԽ఺༧ଌϞσϧ͕࠷΋ਫ਼౓͕ߴ͔ͬͨ • GPTܥͰ͸ɺϑΝΠϯνϡʔχϯάͨ݁͠Ռ͕ߴ͔ͬͨ • ༧ଌʹͳΔͱҰؾʹείΞ͕Լ͕Δ • ϐΫηϧͷ༧ଌͱɺিಥ͢Δ͜ͱ͸ͭͳ͕͍ͬͯͳ͍ ·ͱΊɾߟ࡯ 65
  57. 96 67 ch7: Verbal Representation of Object Collision Prediction Based

    on Physical CommonSense Knowledge ෺ཧతৗࣝΛ;·͑ͨ ෺ମিಥ࣌ͷ༧ଌʹ͓͚Δݴޠදݱ
  58. 96 ͜͜·Ͱͷݚڀ • ը૾಺ͷ෺ମΛଊ͑ɺ ෺ମ͝ͱͷ෺ཧಛੑΛଊ͑ͨ • ෺ମಉ࢜ͷিಥͷλΠϛϯάΛ ༧ଌ͢ΔϞσϧͷߏங ՝୊఺ •

    িಥʹ͍ͭͯΛݴޠͱͯ͠ൃ࿩͠ɺ ଞऀʹ఻͑Δ͜ͱ͕Ͱ͖ͳ͍ • ࣮ੈքͷΑ͏ʹෳࡶͳ؀ڥͷಛ௃Λ ౿·͑ͨཧղ͕Ͱ͖͍ͯͳ͍ • LLMͰͷੑೳͷݕূ എܠɾ໨త 68
  59. 96 ෺ཧಛੑΛදͨ͠ άϥϑߏ଄ͷ༧ଌ શମਤ 69 physical training data ݴޠੜ੒ Ϟσϧ

    • άϥϑຒΊࠐΈϕΫτϧ • ଎౓ • Ճ଎౓ • ෺ମؒͷҐஔؔ܎ ༧ଌը૾ ੜ੒จ ੺৭ͷԁப͕྘৭ͷٿʹͿ͔ͭΔ ෺ମͷ৭ ✔ɼܗ ✔ ༧ଌͨ͠ঢ়گΛද͢จ Input ෺ཧಛੑΛදͨ͠ άϥϑߏ଄ͷ༧ଌ ༧ଌͰ͖Δ΋ͷ มԽ఺ ༧ଌϞσϧ ϕʔεͱͳΔϞσϧ • PredNet [Lotter+,16] • PredRNN [Wang+, 17] • PredRNN v2 [Wang+, 21] • PreCNet [Straka+, 23] ෺ཧతৗࣝͷ৚݅ ʢྫʣ৚݅ 29 • ෺ମAͷॏ͞͸ܰ͘ɼ෺ମBͷॏ͞͸ॏ͍ • ෺ମAͷ଎͞͸଎͘ɼ෺ମBͷ଎͞͸஗͍ • চ͕βϥβϥ͍ͯ͠Δ ෺ཧతৗࣝΛؚΉ จষੜ੒Ϟσϧ ࠶ੜ੒ͨ͠จ ੺৭ͷԁப͕྘৭ͷٿʹ੎͍Α͘ িಥͯ͠ɺ྘৭ͷٿ͕ԕ͘ʹ஄͖ ඈ͹͞ΕΔɻ ৗࣝΛؚΉৄࡉͳจ ෺ମAʹ੺৭ͷԁப ෺ମBʹ྘৭ͷٿ ੜ੒จ ௥Ճ CLEVRER [Yi+, 19]
  60. 96 ໨త • ࣮ੈքͱݴޠΛ݁ͼ͚ͭͯཧղ͢ΔͨΊʹɺมԽ఺༧ଌϞσϧͰ༧ଌͨ͠಺༰Λੜ੒ ઃఆ • TransformerͷDecoder෦෼Λ࢖༻ ࣮ݧ1-1ɿ༧ଌ಺༰Λݴޠੜ੒ 71 ֶशઃఆ

    ϖΞσʔλ਺ 219,303 ʢ9จ * 24,367ճͷিಥʣ ςετσʔλ਺ 10,965 όοναΠζ 8 ӅΕ૚ 512 ଛࣦؔ਺ Adam [Kingma+, 17]
  61. 96 • 9छྨͷςϯϓϨʔτ • িಥͨ͠ 2ͭͷ෺ମ • ʮʨփ, ੺, ੨,

    ྘, ஡, ਫ, ࢵ, ԫʩ৭ͷ ʨٿ, ԁப, ཱํମʩʯ ࣮ݧ1-1ɿςϯϓϨʔτ࡞੒ 72 • ʮAͱB͕ۙͮ͘ʯ • ʮA͕Bʹۙͮ͘ʯ • ʮB͕Aʹۙͮ͘ʯ িಥલʢ5ϑϨʔϜલʣ • ʮAͱB͕Ϳ͔ͭΔʯ • ʮA͕Bʹ͸͔͡ΕΔʯ • ʮB͕Aʹ͸͔͡ΕΔʯ িಥ • ʮAͱB͕཭ΕΔʯ • ʮA͔ΒB͕཭ΕΔʯ • ʮB͔ΒA͕཭ΕΔʯ িಥޙʢ5ϑϨʔϜޙʣ ʮ੨৭ͷٿͱփ৭ͷٿ͕Ϳ͔ͭΔʯ ʮ੨৭ͷٿ͕փ৭ͷٿʹ͸͔͡ΕΔʯ ʮփ৭ͷٿ͕੨৭ͷٿʹ͸͔͡ΕΔʯ িಥ িಥલ ʮ੨৭ͷٿͱփ৭ͷٿ͕ۙͮ͘ʯ ʮ੨৭ͷٿ͕փ৭ͷٿʹۙͮ͘ʯ ʮփ৭ͷٿ͕੨৭ͷٿʹۙͮ͘ʯ িಥޙ ʮ੨৭ͷٿͱփ৭ͷٿ͕཭ΕΔʯ ʮ੨৭ͷٿ͔Βփ৭ͷٿ͕཭ΕΔʯ ʮփ৭ͷٿ͔Β੨৭ͷٿ͕཭ΕΔʯ จষςϯϓϨʔτྫɿিಥ͢Δ෺ମʢ੨৭ͷٿɾփ৭ͷٿʣ 5ϑϨʔϜ 5ϑϨʔϜ
  62. 96 ࣮ݧ1-1ɿݴޠੜ੒Ϟσϧ 73 ֶशࡁΈ DecoderϞσϧ ༧ଌ಺༰Λ ࣔͨ͠ੜ੒จ pred graph embedding

    input Decoder Softmax <bos> w1 w2 wt <eos> … w1 w2 wt … Transformer DecoderֶशϞσϧ text ϖΞσʔλ Linear graph embedding train: 219,303 ϖΞ test: 10,965 ݸ
  63. 96 ࣮ݧ1-1ɿ݁Ռ − ੜ੒ྫ1 74 ൣғ i ৭ ܗ ਖ਼ղจ

    • ʮ྘৭ͷٿͱ੺৭ͷԁப͕Ϳ͔ͭΔʯ “Green sphere and red cylinder collide.” • ʮ྘৭ͷٿ͕੺৭ͷԁபʹ͸͔͡ΕΔʯ “Green sphere is repulsed by red cylinder.” • ʮ੺৭ͷԁப͕྘৭ͷٿʹ͸͔͡ΕΔʯ “Red cylinder is repulsed by green sphere.” PredNet ϕʔε ʮ྘৭ͷԁப͕੺৭ͷԁபʹ͸͔͡ΕΔʯ “Green cylinder is repulsed by red cylinder.” ✔ ✘ PredRNN ϕʔε ʮ྘৭ͷԁபͱ੺৭ͷԁப͕Ϳ͔ͭΔʯ “Green cylinder and red cylinder collide.” ✔ ✘ PredRNN v2ϕʔε ʮ྘৭ͷٿ͕੺৭ͷԁபʹ͸͔͡ΕΔʯ “Red cylinder is repulsed by green sphere.” ✔ ✔ PreCNet ϕʔε ʮ྘৭ͷٿ͕੺৭ͷԁபʹ͸͔͡ΕΔʯ “Red cylinder is repulsed by green sphere.” ✔ ✔
  64. 96 ࣮ݧ1-1ɿ݁Ռ − ੜ੒ྫ2 75 ൣғ vi ৭ ܗ ਖ਼ղจ

    • ʮਫ৭ͷཱํମͱਫ৭ͷԁப͕Ϳ͔ͭΔʯ “Cyan cube and cyan cylinder collide.” • ʮਫ৭ͷཱํମ͕ਫ৭ͷԁபʹ͸͔͡ΕΔʯ “Cyan cube is repulsed by cyan cylinder. ” • ʮਫ৭ͷԁப͕ਫ৭ͷཱํମʹ͸͔͡ΕΔʯ “Cyan cylinder is repulsed by cyan cube. ” PredNet ϕʔε ਫ৭ͷཱํମ͕੨৭ͷٿʹͿ͔ͭΔ “Cyan cube is repulsed by blue sphere. ” ✘ ✘ PredRNN ϕʔε ਫ৭ͷཱํମ͕੨৭ͷٿʹͿ͔ͭΔ “Cyan cube is repulsed by blue sphere. ” ✘ ✘ PredRNN v2ϕʔε ਫ৭ͷཱํମ͕ਫ৭ͷٿʹͿ͔ͭΔ “Cyan cube is repulsed by cyan sphere. ” ✔ ✘ PreCNet ϕʔε ਫ৭ͷཱํମ͕ਫ৭ͷԁபʹͿ͔ͭΔ “Cyan cube is repulsed by cyan cylinder. ” ✔ ✔
  65. 96 ࣮ݧ1-1ɿ݁Ռ − ධՁࢦඪʹΑΔਫ਼౓ൺֱ 76 ϕʔεϞσϧ είΞ BLEU@2↑ BLEU@3↑ BLEU@4↑

    METEOR↑ CIDEr↑ PredNet ϕʔε ӳ 80.3 63.0 56.3 68.8 72.9 ೔ 79.7 74.5 68.8 70.2 72.4 PredRNN ϕʔε ӳ 84.3 66.8 59.1 72.6 74.6 ೔ 82.5 76.1 73.4 73.5 75.1 PredRNN v2 ϕʔε ӳ 86.2 72.4 62.7 75.9 78.3 ೔ 85.9 78.9 75.7 77.6 78.2 PreCNet ϕʔε ӳ 90.6 77.1 67.9 78.1 80.3 ೔ 88.3 80.6 79.2 80.4 81.2
  66. 96 • GPTΛ༻͍ͯɺ2ͭͷ෺ମ͕িಥ͢Δͱ༧ଌͨ͠৔໘ΛݴޠͰੜ੒ ݕূઃఆ • 100িಥ • ը૾Λ༩͑ͨͱ͖ɺিಥೝࣝ→ݴޠੜ੒·ͰμΠϨΫτͰߦ͏ ૬ҧ఺ •

    ఏҊϞσϧɿিಥ༧ଌͱݴޠੜ੒ͷϞσϧ͕ผ • LLMɿશͯಉ͡ਪ࿦աఔ ࣮ݧ1-2ɿGPTʹΑΔ෺ମͷিಥ༧ଌͷݴޠੜ੒ 77
  67. 96 ࣮ݧ1-2ɿGPTʹΑΔ෺ମͷিಥ༧ଌͷݴޠੜ੒ 78 Model Prompt ੜ੒ྫ Accuracy ෺ମ͕͍͋ͬͯΔ͔ simple detail

    GPT-4o 㾎 ਫ৭ͷٿମͱ੺৭ͷԁப͕িಥ͢Δɻ 1.94 㾎 ਫ৭ͷٿମ͕੺৭ͷԁபʹͿ͔ͭΔ 1.95 GPT-4o-FT 㾎 ਫ৭ͷٿ͕ࠨ͔Β஡৭ͷٿʹিಥ͢Δɻ 2.10 㾎 ਫ৭ͷٿͱ஡৭ͷٿ͕Ϳ͔ͭΔɻ 2.06 GPT-4o mini 㾎 ྘৭ͷٿମ͕੺৭ͷԁபʹ޲͔ͬͯస͕Γিಥ͢Δɻ 1.63 㾎 ྘৭ͷٿମ͕੺৭ͷԁபʹ޲͔ͬͯస͕Γিಥ͢Δɻ 1.69 GPT-4 㾎 − 0 㾎 − 0 มԽ఺༧ଌϞσϧ ஡৭ͷٿ͕੺৭ͷԁபʹিಥ͢Δɻ 2.44 • ੺৭ͷԁபͱ஡৭ͷٿ͕Ϳ͔ͭΔ • ੺৭ͷԁப͕஡৭ͷٿʹ͸͔͡ΕΔ • ஡৭ͷٿ͕੺৭ͷԁபʹ͸͔͡ΕΔ ਖ਼ղจ
  68. 96 ࣮ݧ1-2ɿGPTʹΑΔ෺ମͷিಥ༧ଌͷݴޠੜ੒ 79 Model Prompt BLEU@2↑ BLEU@3↑ BLEU@4↑ METEOR CIDEr

    simple detail GPT-4o 㾎 39.3 32.6 26.4 30.2 36.7 㾎 44.5 36.9 28.5 30.1 38.6 GPT-4o-FT 㾎 27.2 20.4 14.3 29.5 30.8 㾎 38.7 29.2 27.1 31.4 31.3 GPT-4o mini 㾎 36.1 28.7 21.3 30.1 33.3 㾎 36.2 28.7 22.7 31.1 32.6 GPT-4 㾎 − − − − − 㾎 − − − − − มԽ఺༧ଌϞσϧ 88.3 80.6 79.2 80.4 81.2
  69. 96 • ෺ཧతಛੑͷ༧ଌຒΊࠐΈදݱ͔Βɺݴޠੜ੒ϞσϧΛհͯ͠িಥঢ়گΛੜ੒ ࣮ݧ1-1ɿมԽ఺༧ଌϞσϧ • ॏͳΓ߹ͬͨ෺ମͷਫ਼౓ͷӨڹ • ෺ମೝࣝͷࣦഊ͕ݴޠੜ੒΁Өڹ͕ݟΒΕΔ • ج൫ͱͳΔ༧ଌϞσϧͷҧ͍

    • ༧ଌϞσϧͷਫ਼౓͕ݴޠੜ੒ͷਫ਼౓ʹ΋Өڹ • ςϯϓϨʔτΛֶश͍ͤͯ͞ΔͨΊɺදݱͷ෯͸ͤ·͍ ࣮ݧ1-2ɿGPT • จষΛੜ੒͢Δ ͱ͍͏؍఺Ͱ͸ͲͷϞσϧ΋ཧղͰ͖Δจষ͕ੜ੒Ͱ͖ͨ • ͔͠͠ɺ෺ମͷछྨ͕ؒҧ͍ͬͯΔ͜ͱͰɺධՁࢦඪͷείΞ͕௿͘ͳͬͨ ࣮ݧ1ɿ·ͱΊɾߟ࡯ 80
  70. 96 ࣮ݧ2-1ɿ؀ڥͷ෺ཧৗࣝΛՃ͑ͨͱ͖ͷݴޠੜ੒ 82 ؀ڥɾ෺ମͷঢ়ଶ 1. চ͕πϧπϧ͍ͯ͠Δ ෺ମA͕ॏ͘ɺ෺ମB͸͍ܰ 2. চ͕βϥβϥ͍ͯ͠Δ ෺ମA͕ॏ͘ɺ෺ମB͸͍ܰ

    … 45. ෺ମA͸ܰ͘ɺ෺ମB͸ॏ͍ ෺ཧৗࣝͷϦετҰཡ ৗࣝ1〜45 ͔Βબ୒ ৗࣝɿ19 • চ͕πϧπϧ͍ͯ͠Δ • ෺ମAͷॏ͕͞େ͖͘ɺ ෺ମBͷॏ͞͸খ͍͞ • ෺ମA͸஗͘ɺ෺ମB͸଎͍ T5 Ϋϥ΢υιʔγϯά ྘৭ͷٿ͸੎͍Α͘੺৭ͷԁபʹিಥͯ͠ɺ ੺৭ͷԁப͸ԕ͘ʹ஄͖ඈ͹͞ΕΔɻ
  71. 96 • 2ͭͷ෺ମͷিಥ༧ଌ৔໘ʹ͍ͭͯɺ؀ڥ΍෺ମʹؔ͢Δ෺ཧৗࣝΛ༩͑ͨͱ͖ɺ িಥঢ়گΛΑΓৄࡉʹݴޠͰੜ੒͢Δ ࣮ݧ2-1ɿ؀ڥͷ෺ཧৗࣝ 83 ෺ମͷঢ়ଶʢ࣭ྔʣ ෺ମͷঢ়ଶʢεϐʔυʣ 1. ෺ମAͱ෺ମBͷ࣭ྔ͕౳͍͠

    2. ෺ମAͷ࣭ྔ͕େ͖͍ ෺ମBͷ࣭ྔ͕খ͍͞ 3. ෺ମAͷ࣭ྔ͕খ͍͞ ෺ମBͷ࣭ྔ͕େ͖͍ 4. ͳ͠ 1. ෺ମAͱ෺ମBͷεϐʔυ͕౳͍͠ 2. ෺ମAͷεϐʔυ͕଎͍ ෺ମBͷεϐʔυ͕஗͍ 3. ෺ମAͷεϐʔυ͕஗͍ ෺ମBͷεϐʔυ͕଎͍ 4. ͳ͠ 4ύλʔϯ 4ύλʔϯ ྆ํͳ͠Λআ֎ → 15ύλʔϯ ؀ڥ 1. চ͕πϧπϧ͍ͯ͠Δ 2. চ͕βϥβϥ͍ͯ͠Δ 3. ͳ͠
  72. 96 • T5 (Text-To-Text Transfer Transformer) • TransformerϕʔεͷϞσϧ • ຋༁ɺ࣭໰Ԡ౴ɺ෼ྨɺཁ໿ͳͲ༷ʑͳλεΫͰ׆༻

    • ೖྗ͞ΕͨςΩετʹରͯ͠ɺશͯͷλεΫͰςΩετͰग़ྗ • ࢖༻ͨ͠ࣄલֶशϞσϧT5 • sonoisa/t5-base-japanese • megagonlabs/t5-base-japanese-web • nlp-waseda/comet-t5-base-Japanese • ֶश • ೖྗɿিಥʹؔ͢Δจষɺ৚݅ • ग़ྗɿΫϥ΢υιʔγϯάͰऩूͨ͠5จ͔ΒϥϯμϜͰબΜͩ1จ ࣮ݧ2-1ɿT5Ϟσϧ 84
  73. 96 ࣮ݧ2-1ɿT5ͷֶशઃఆ 85 ઃఆ Data (6:1:1) 2,000 Learning rate 5×10%4

    batch size 32 Epoch 100 optimization AdamW [Loshchilov+, 17] loss function cross entropy
  74. 96 ࣮ݧ2-1ɿT5Ͱͷ݁Ռ 86 Model Epoch BLEU↑ ROUGE-2↑ ROUGE-L↑ sonoisa/ t5-base-japanese

    81 95.2 64.2 74.6 megagonlabs/ t5-base-japanese-web 93 81.6 56.6 67.7 nlp-waseda/ comet-t5-base-japanese 98 80.9 56.2 67.4
  75. 96 ࣮ݧ2-1ɿ݁Ռ − ੜ੒ྫ1 87 มԽ఺༧ଌϞσϧʹ͓͚Δੜ੒จɿ੺৭ͷԁப͕྘৭ͷٿʹ͸͔͡ΕΔ ෺ମAɿ੺৭ͷԁப ෺ମBɿ྘৭ͷٿ চ ࣭ྔ

    ଎͞ ෺ཧৗࣝΛ;·͑ͨੜ੒จ ਓखʹΑΔਖ਼ղྫ πϧπϧ A = B A = B ੺৭ͷԁபͱ྘৭ͷٿ͸িಥͯ͠ɺ྆ํͱ΋ ൓ରͷํ޲΁஄͖ඈ͹͞ΕΔɻ ੺৭ͷԁப͕ಉ͡଎౓ͷ྘৭ͷٿʹͿ͔ͭͬ ͯɺ྘৭ͷٿ͸ԕ͘ʹ௓Ͷฦ͞ΕΔɻ πϧπϧ − A < B ྘৭ͷٿ͸੎͍Α͘੺৭ͷԁபʹিಥͯ͠ɺ ੺৭ͷԁப͸ԕ͘ʹ஄͖ඈ͹͞ΕΔɻ ੺৭ͷԁபͱ྘৭ͷٿ͕Ϳ͔ͭΓɺ੺৭ͷԁ ப͕গ͠஄͔Εɺ྘৭ͷٿ͕গ͠஄͔ΕΔɻ βϥβϥ A < B A > B ੺৭ͷԁப͔͕྘৭ͷٿʹ੎͍Α͘িಥͯ͠ɺ ྘৭ͷٿ͕ԕ͘ʹ஄͖ඈ͹͞ΕΔɻ ੺৭ͷԁப͸੎͍Α͘྘৭ͷٿʹͿ͔ͭͬͯɺ ੺৭ͷԁப͸΄Μͷগ͠௓Ͷฦ͞ΕΔɻ − A > B A < B ྘৭ͷٿ͸੎͍Α͘੺৭ͷԁபʹিಥͯ͠ɺ ྘৭ͷٿ͸஄͖ඈ͹͞ΕΔɻ ੺৭ͷԁப΁྘৭ͷٿ͕Ϳ͔ͭΓɺ྘৭ͷٿ ͕஄͔ΕΔɻ
  76. 96 ࣮ݧ2-1ɿ݁Ռ − ੜ੒ྫ2 88 มԽ఺༧ଌϞσϧʹ͓͚Δੜ੒จɿਫ৭ͷཱํମ͕ਫ৭ͷԁபʹ͸͔͡ΕΔ ෺ମAɿਫ৭ͷཱํମ ෺ମBɿਫ৭ͷԁப চ ࣭ྔ

    ଎͞ ෺ཧৗࣝΛ;·͑ͨੜ੒จ ਓखʹΑΔਖ਼ղྫ − A = B − ਫ৭ͷཱํମͱਫ৭ͷԁப͸িಥͯ͠ɺ྆ํ ͱ΋൓ରͷํ޲΁஄͖ඈ͹͞ΕΔɻ ਫ৭ͷԁப͸ਫ৭ͷཱํମʹিಥͯ͠ɺਫ৭ ͷཱํମ͸஄͖ඈ͹͞ΕΔɻ πϧπϧ A > B A < B ਫ৭ͷཱํମ͕੎͍Α͘ਫ৭ͷԁபʹিಥ͠ ͯɺਫ৭ͷཱํମ͕ԕ͘ʹ஄͖ඈ͹͞ΕΔɻ ਫ৭ͷཱํମ͸ਫ৭ͷԁபʹܹ͘͠িಥ͞Ε ͯɺਫ৭ͷཱํମ͸͚ͩ͢͜͠͸͔͡ΕΔɻ βϥβϥ − A = B ਫ৭ͷཱํମͱਫ৭ͷԁப͸িಥͯ͠ɺ྆ํ ͱ΋൓ର ͷํ޲΁஄͖ඈ͹͞ΕΔɻ ਫ৭ͷԁபͱਫ৭ͷཱํମ͕িಥͯ͠ਫ৭ͷ ԁபͱਫ৭ͷཱํମ͕ಉ͡ఔ౓௓ͶฦΔɻ − A = B A > B ਫ৭ͷཱํମ͸੎͍Α͘ਫ৭ͷԁபʹিಥ͠ ͯɺਫ৭ͷԁப͸ԕ͘ʹ஄͖ඈ͹͞ΕΔɻ ਫ৭ͷཱํମ͸Ώͬ͘Γͱਫ৭ͷԁப͸ૣ͍ ଎͞Ͱ൓ରͷํ޲ʹඈ͹͞ΕΔɻ
  77. 96 ࣮ݧ2-1ɿ݁Ռ − ධՁࢦඪʹΑΔਫ਼౓ൺֱ 89 ϕʔεͱͳΔ Ϟσϧ BLEU@4 BERTScore BLEURT

    ROUGE ؚҙ શจ ؚҙ ෼ׂ G-EVAL- 4o PredNet ϕʔε 37.3 68.2 30.3 34.4 43.2 64.5 80.3 PredRNN ϕʔε 43.6 74.7 36.1 41.8 48.7 69.7 83.1 PredRNN v2 ϕʔε 46.5 79.5 45.6 49.6 56.1 75.3 88.5 PreCNet ϕʔε 55.8 82.2 49.7 56.4 67.9 80.2 92.4
  78. 96 ࣮ݧ2-2ɿGPTΛ༻͍ͨ෺ཧৗࣝΛؚΉিಥঢ়گͷઆ໌ 90 Model ੜ੒ྫ ਖ਼ղྫ • ஡৭ͷٿͱ੺৭ͷԁப͸൓ରํ޲ʹେ͖͘ඈ͹͞ΕΔɻ • ͓ޓ͍ʹͿ͔ͭΓɺ੺৭ͷԁப͸গ͠ζϨͯ஡৭ͷۄ͸গ͠స͕͍ͬͯ͘ɻ

    GPT-4o ਫ৭ͷٿମ͕଎͘੺৭ͷԁபʹিಥ͠ɺਫ৭ͷٿମ͸βϥβϥͨ͠চ্Ͱେ͖͘͸ ͔͡Εɺ੺৭ͷԁப͸Θ͔ͣʹಈ͘ɻ GPT-4o-FT ਫ৭ͷٿ͕஡৭ͷٿʹͿ͔ͭΓɺਫ৭ͷٿ͕஡৭ͷٿʹ͸͔͡ΕΔɻ GPT-4o mini ྘৭ͷٿମ͕βϥβϥͨ͠চΛ׈Γͳ͕Β੺৭ͷԁபʹ޲͔ͬͯస͕Γɺিಥͷॠ ؒʹٿମͷ଎౓͸ٸܹʹݮগ͠ɺԁப͸ͦͷॏ͞ʹΑΓΘ͔ͣʹಈ͖͕஗͘ͳΔ͕ɺ ٿମͷ଎͕͞༏ҐͰ͋ΔͨΊɺٿମ͸ԁபΛԡ͠ग़͢Α͏ʹస͕Γଓ͚ɺԁப͸લ ํʹগ͠స͕Δ͔ɺΘ͔ͣʹճస͠ͳ͕ΒਐΉɻ GPT-4 − มԽ఺༧ଌϞσϧ ஡৭ͷٿ͕੺৭ͷԁபʹ੎͍Α͘িಥͯ͠ɺ ஡৭ͷٿ͕ԕ͘ʹ஄͖ඈ͹͞ΕΔɻ • চ͕βϥβϥ • ஡৭ͷٿͷํ͕੺৭ͷԁபΑΓ͍ܰ • ஡৭ͷํ͕੺৭ͷԁபΑΓ଎͍ ෺ཧ ৗࣝ
  79. 96 ࣮ݧ2-2ɿGPTΛ༻͍ͨ෺ཧৗࣝΛؚΉিಥঢ়گͷઆ໌ 91 Model BLEU@4↑ BERTScore↑ BLEURT↑ ROUGE↑ 含意 全⽂↑

    含意 分割↑ G-EVAL↑ GPT-4o 31.3 54.5 32.3 32.3 48.5 55.6 65.4 GPT-4o-FT 25.6 59.3 27.8 28.7 52.3 61.4 66.7 GPT-4o mini 16.7 49.2 19.6 13.9 48.9 63.8 63.3 GPT-4 − − − − − − − มԽ఺༧ଌϞσϧ 55.8 82.2 49.7 56.4 67.9 80.2 92.4
  80. 96 • 2ͭͷ෺ମͷিಥ༧ଌ৔໘ʹ͍ͭͯɺ؀ڥ΍෺ମʹؔ͢Δ෺ཧৗࣝΛ༩͑ͨͱ͖ɺিಥঢ়گΛ ΑΓৄࡉʹݴޠͰੜ੒ ࣮ݧ2-1ɿݴޠੜ੒ྫ • ܾ·ͬͨܕʢͱͯ΋ɺ੎͍Α͘ɺԕ͘ʹɺΘ͔ͣʹʣͳͲΛֶश͍ͯ͠ΔͨΊɺ දݱͷ෯͸ͤ·͍ • ਓखʹΑΔจষ͸ఔ౓දݱͷछྨ͕ଟ͍

    ධՁࢦඪʹΑΔਫ਼౓ൺֱ • ୯ޠҰக౓ͰͷධՁɿ௿ • ಺༰ͷؚҙͰͷධՁɿߴ ࣮ݧ2-2ɿGPTΛ༻͍࣮ͨݧ • จষΛੜ੒͢Δ͜ͱ ͸ಘҙ • ධՁࢦඪͷείΞɿ໿65.5 • ෺ମͷೝࣝ΍༧ଌ͕͏·͘Ͱ͖͍ͯͳ͍͜ͱ͕είΞͷࠩʢ໿20ʣͱͯ͠දΕͨ ࣮ݧ2ɿ·ͱΊɾߟ࡯ 92
  81. 96 ఏҊݚڀͷ·ͱΊ 1. ೚ҙͷ࣌ؒ෯Ͱͷ༧ଌ 1. PredNetͱTD-AEΛ૊Έ߹Θͤɺώτ೴ͷ֊૚ߏ଄Λ΋ͬͨ೚ҙͷ࣌ؒ෯Ͱ༧ଌՄೳͳϞσϧ  ը૾಺ͷ෺ମͷಈ͖ͷཧղ 1. ը૾಺ͷ؀ڥʹ͓͚Δ෺ମͷಈ͖Λଊ͑ɺ෺ମͷ෺ཧಛੑͷมԽ͔Βɺ෺ମಉ͕࢜িಥ͢Δ

    λΠϛϯάΛଊ͑ΒΕΔ࢓૊ΈͷఏҊ  ը૾಺ͷ෺ମͷিಥͷλΠϛϯάͷ༧ଌ 1. ը૾಺ͷ؀ڥͷ෺ମಉ͕࢜িಥ͢ΔλΠϛϯάΛɺ෺ମͷಈ͖΍ࢹ֮৘ใͷมԽ͔Β ༧ଌͰ͖ΔϞσϧͷߏங  িಥঢ়گʹؔ͢Δݴޠੜ੒ 1. ؀ڥ಺ͷ෺ମͷ༧ଌͨ͠িಥঢ়گΛݴޠͰੜ੒͠ɺ࣮ੈքͱݴޠΛ݁ͼ͚ͭͨ 2. ؀ڥͷ৚݅Λ௥Ճͨ͠ͱ͖ɺ؀ڥͷಛੑΛ౿·͑ͨিಥΛઆ໌Ͱ͖ΔݴޠϞσϧʹΑͬͯɺিಥ ঢ়گΛΑΓৄࡉʹઆ໌Ͱ͖ΔΑ͏ʹͳͬͨ 94
  82. 96 ߩݙ • ৽ͨͳώτ೴಺ͷ࡞ۀϞσϧ • ώτ೴಺ͷ֊૚ߏ଄Λදݱͨ͠༧ଌϞσϧͱɺ೚ҙͷ࣌ؒ෯Ͱ༧ଌ͕Ͱ͖Δਂ૚ੜ੒ֶश ϞσϧΛ૊Έ߹Θͤͨ • গ͠ઌͷ༧ଌʢ໿1ඵʣ͕Ͱ͖ΔΑ͏ʹͳͬͨ •

    ώτͷΑ͏ʹࢹ͔֮Βଊ͑ͨ৘ใΛ΋ͱʹɺ෺ମͷಈ͖Λ༧ଌͰ͖ΔϞσϧ • ը૾ಛ௃ྔ΍෺ཧγϛϡϨʔλʔͱ͍ܾͬͨΊΒΕͨ਺஋Ͱ͸ͳ͘ɺ ༩͑ΒΕͨը૾ʢώτͷࢹ֮৘ใʹ૬౰ʣʹ͋Δ෺ମͷಈ͖Λଊ͑ɺকདྷͷিಥΛ ༧ଌͰ͖ΔΑ͏ʹͳͬͨ • ؀ڥ৘ใΛߟྀͨ͠ɺিಥঢ়گΛΑΓৄࡉʹઆ໌Ͱ͖Δ࢓૊Έ • ༧ଌͨ͠িಥ಺༰ΛଞऀʹݴޠΛհͯ͠ڞ༗Ͱ͖ΔΑ͏ʹͳͬͨ 95
  83. 96 ࠓޙͷ՝୊ • ௕ظ༧ଌͷՄೳੑ • ݚڀ1ɿ1ඵఔ౓ • ώτͷ༧ଌɿ࣍ͷ೔ɺܭըͳͲ • ࣮ੈքʹ͍ۙσʔληοτͷར༻

    • CLEVRERɿ୯७ͳσʔλ • ंࡌΧϝϥͰͷσʔλɺώτ͕෦԰ͷதΛಈ͍͍ͯΔσʔλ • େن໛ݴޠϞσϧࣗମͷ෺ཧಛੑͷཧղ • LLMΛγεςϜʹ૊ΈࠐΜ࣮ͩੈքཧղͷදݱ 96