Slide 1

Slide 1 text

Real-world Understanding based on Predictions with Physical Properties ʢ෺ཧಛੑΛ;·͑ͨ༧ଌʹج࣮ͮ͘ੈքཧղʣ ͓஡ͷਫঁࢠେֶ খྛҰ࿠ݚڀࣨ ത࢜ޙظ՝ఔ3೥ ࠇా ክᣦ ത࢜࿦จୈ4ճ৹ࠪձɾެௌձ 2025.2.10

Slide 2

Slide 2 text

96 ࿦จߏ੒ • ݚڀഎܠ • ch1ɿΠϯτϩμΫγϣϯ • ຊݚڀʹ͓͚Δؔ࿈ݚڀ • ch2ɿχϡʔϥϧωοτϫʔΫ • ch3ɿ࣮ੈքཧղɾݴޠϞσϧ • ఏҊݚڀ • ch4ɿώτ೴಺ͷ༧ଌූ߸ԽΛ໛ͨ͠ਂ૚ੜ੒ֶशϞσϧͷߏங • ch5ɿ෺ཧ؀ڥʹ͓͚ΔมԽ఺நग़ͷऔΓ૊Έ • ch6ɿ࣮ੈքʹ͓͚Δ෺ཧ؀ڥͷ༧ଌਪ࿦Ϟσϧ • ch7ɿ෺ཧతৗࣝΛ;·͑ͨ෺ମিಥ࣌ͷ༧ଌʹ͓͚Δݴޠදݱ • ݁࿦ • ch8ɿ݁࿦ 2

Slide 3

Slide 3 text

96 ໨࣍ • ݚڀഎܠ • ch1ɿΠϯτϩμΫγϣϯ • ຊݚڀʹ͓͚Δؔ࿈ݚڀ • ch2ɿχϡʔϥϧωοτϫʔΫ • ch3ɿ࣮ੈքཧղɾݴޠϞσϧ • ఏҊݚڀ • ch4ɿώτ೴಺ͷ༧ଌූ߸ԽΛ໛ͨ͠ਂ૚ੜ੒ֶशϞσϧͷߏங • ch5ɿ෺ཧ؀ڥʹ͓͚ΔมԽ఺நग़ͷऔΓ૊Έ • ch6ɿ࣮ੈքʹ͓͚Δ෺ཧ؀ڥͷ༧ଌਪ࿦Ϟσϧ • ch7ɿ෺ཧతৗࣝΛ;·͑ͨ෺ମিಥ࣌ͷ༧ଌʹ͓͚Δݴޠදݱ • ݁࿦ • ch8ɿ݁࿦ 3

Slide 4

Slide 4 text

96 4 ch1: Introduction ΠϯτϩμΫγϣϯ

Slide 5

Slide 5 text

96 ώτʹΑΔ࣮ੈքཧղͱ༧ଌ ώτʹΑΔ࣮ੈքͷଊ͑ํ • ෺ମ͕ͳʹͰ͋Δ͔ͷཧղ • ෺ମͷಈ͖ͷཧղ • ಈ͖ͷ༧ଌ ྫ • ಓ࿏ͷԣஅ • ࣗಈंͷӡస • ৗʹ༧ଌʢ΍ਪྔʣΛ͠ͳ͕Βੜ׆ ώτʹΑΔ༧ଌ • ॠؒతͳ༧ଌʢ୹ظؒʣ • ҼՌతͳ༧ଌʢ௕ظؒʣ 5 ૊Έ߹Θͤͨ༧ଌΛৗʹߦ͍ͬͯΔ

Slide 6

Slide 6 text

96 ػցʹΑΔ࣮ੈքͷཧղͱ༧ଌ ώτʹয఺Λ͋ͯͨݚڀ • AI, ػցֶशͷଆ໘ͰͷཧղɿχϡʔϥϧωοτϫʔΫ, BMIʢϒϨΠϯϚγϯΠϯλʔϑΣʔεʣ • ҩֶతͳཧղɿfMRI • ಺໘తͳཧղɿ৺ཧֶ ػցͱώτ͕ڞଘ͢Δੈք • ૒ํ޲ͷཧղ • ώτ͕ओମͱͳͬͯ࢖͏΋ͷ / ώτͷੜ׆Λࢧ͑Δ΋ͷ ػցʹ͓͚ΔώτΒ͠͞ͷදݱ • ۙ೥LLMͷ։ൃ͕Ί͟·͍͠ • ݴޠੜ੒͕Ͱ͖Δ΋ͷ͔ΒɺώτΒ͠͞΍ώτͷࢥߟʹয఺͕౰ͯΒΕ͖͍ͯͯΔ 6

Slide 7

Slide 7 text

96 എܠɿػցʹΑΔώτΒ͍͠ਪ࿦ɾ༧ଌΛ໨ࢦͯ͠ ػցֶशʹΑΔ࣮ੈքཧղɾ༧ଌݚڀͷॏཁੑ • ࣗಈӡసɺհޢ༻ϩϘοτͱ͍ͬͨਓؒͷੜ׆Λࢧԉ͢Δػೳ • ੈքϞσϧ [Ha+, 18] • ؍ଌͨ͠؀ڥͷܹ͔ࢗΒώτ೴಺Ͱͦͷ؀ڥΛϞσϧԽ͠ɺγϛϡϨʔγϣϯ͢Δ͜ͱͰ੒ཱ • AIݚڀ͕ࠓޙऔΓ૊Ή΂͖3ͭͷ՝୊ͷ1ͭ [LeCun, 23] • ʮػց͸͍͔ʹͯ͠؍࡯͔ΒੈքΛදݱͯ͠༧ଌΛ͠ɺߦಈ͢Δ͜ͱΛֶͿ͜ͱ͕Ͱ͖Δͷ͔ʯΛࣔࠦ • าߦऀͷΘ͔ͣͳಈ͖ͷҙຯ߹͍Λ൑அͰ͖͍ͯͳ͍ ༧ଌݚڀ • ࣌ܥྻ༧ଌ • ҼՌਪ࿦༧ଌ • ϚϧνϞʔμϧ༧ଌ • ໨ʹͨ͠؀ڥΛɺώτͷΑ͏ʹཧղɾ༧ଌɾઆ໌͕Ͱ͖Δ࢓૊Έ 7

Slide 8

Slide 8 text

96 ໨తɿػցʹΑΔώτΒ͍͠ਪ࿦ɾ༧ଌΛ໨ࢦͯ͠ 8 • ෺ମͷ࣍ͷಈ͖Λ༧ଌ͠ɺߦಈΛܾఆ • গ͠ઌͷ༧ଌΛͯ͠ܭըΛཱͯΔ • ΍ΓऔΓ΍؍࡯͔ΒഎܠͳͲΛֶͿ • ग़དྷࣄͷॏཁͳ఺͕େࣄ • ࣮ੈքͱݴޠͷ݁ͼ͖ͭ ώτͷ࣮ੈքཧղɾ༧ଌ • ஞ࣍తͳʢܾ·ͬͨ୹ִ͍ؒͷʣ༧ଌ • ը૾ʢࢹ֮ʹ૬౰ʣΛ༧ଌ • ը૾ಛ௃ྔͷ༧ଌ = ࣮ੈքͷ༧ଌ • ෺ମͷ෺ཧ๏ଇΛ΋ͱʹͨ͠༧ଌ͕೉͍͠ • ෺ମʹ͍ͭͯͷʮ෺ཧಛੑΛཧղ͠ɺ༧ଌ͢Δʯ ͜ͱͱɺݴޠ͕݁ͼ͍͍ͭͯͳ͍ ܭࢉػʹΑΔ༧ଌ ༷ʑͳ࣌ؒ෯Ͱ༧ଌΛߦ͑Δώτ೴಺ͷ֊૚ߏ଄Λ໛ͨ͠ਂ૚ੜ੒ֶशϞσϧͷߏங ը૾಺ͷ෺ମͷ෺ཧಛੑͷมԽ͔Βɺಈ͖͕େ͖͘มΘΔมԽ఺Λநग़Ͱ͖Δ࢓૊Έ มԽ఺ʢিಥʣΛ༧ଌ͢ΔมԽ఺༧ଌϞσϧͷఏҊ ࣮ੈքͱݴޠΛ݁ͼ͚ͭΔͨΊʹɺ༧ଌ಺༰ΛݴޠͰදݱ ෺ཧతৗࣝΛ༩͑ͨͱ͖ɺ؀ڥͷಛੑΛ;·͑ͯɺΑΓৄࡉͳจষͰදݱ ໨త

Slide 9

Slide 9 text

96 9 ఏҊݚڀ ch4-ch7

Slide 10

Slide 10 text

96 ఏҊݚڀ 10 ώτ೴಺ͷ༧ଌූ߸ԽΛ໛ͨ͠ਂ૚ੜ੒ֶशϞσϧͷߏங chapter. 4 ը૾಺ͷ෺ମʹண໨ͨ͠ಈ͖ͷมԽ఺நग़΁ͷऔΓ૊Έ ෺ཧ؀ڥͷ༧ଌਪ࿦Ϟσϧ ࣮ੈք؀ڥͷ෺ཧಛੑʹண໨ͨ͠ݴޠهड़ chapter. 5 chapter. 6 chapter. 7

Slide 11

Slide 11 text

96 11 ch4: A Deep Generative Model Imitating Predictive Coding in Human Brain ώτ೴಺ͷ༧ଌූ߸ԽΛ໛ͨ͠ ਂ૚ੜ੒ֶशϞσϧͷߏங

Slide 12

Slide 12 text

96 എܠɾ໨త 12 ώτ೴಺ʹ͓͚Δ࣌ؒೝࣝΛػցֶशϞσϧͰදݱ͍ͨ͠ PredNet [Lotter+, 2016] ώτͷΑ͏ʹ༷ʑͳ࣌ؒ෯Ͱ ະདྷΛ༧ଌͰ͖ͳ͍ ώτ೴಺ͷ֊૚ߏ଄Λ໛฿ TD-VAE [Gregor+, 2018] ೚ҙͷ࣌ؒ෯Ͱ ະདྷΛ༧ଌՄೳ ώτ೴಺ʹ͓͚Δ৘ใॲཧ ػߏΛ൓ө͍ͯ͠ͳ͍ ༧ଌΛର৅ͱͨ͠୅දతͳઌߦݚڀ 1. ώτͷ೴಺৘ใॲཧػߏʹΑΓ͍ۙϞσϧΛߏங 2. ࣮ࡍͷώτ೴಺ͱͷ૬ؔͷௐࠪ ఏҊ

Slide 13

Slide 13 text

96 શମਤ 13 fMRIσʔλ ೴׆ಈ৘ใ ࣗવಈը૾ ༧ଌ ग़ྗ ը૾ ༧ଌϞσϧ ೴ ͷ ֊ ૚ త ͳ ߏ ଄ ॊ ೈ ͳ ࣌ ؒ ෯ Ͱ ͷ ༧ ଌ ॊೈͳ࣌ؒ෯Ͱকདྷͷ ग़དྷࣄΛ༧ଌՄೳʹ͢Δ

Slide 14

Slide 14 text

96 • ਂ૚ֶशΛ༻͍ͨɺಈը૾͔Β࣍ͷը૾Λ༧ଌ͢Δݚڀ • େ೴ൽ࣭ʹ͓͚Δ༧ଌූ߸ԽͷॲཧΛ໛฿ • ೴಺ͷ৘ใॲཧػߏΛදݱ PredNet [Lotter+, 16] 14 ༧ଌූ߸Խͱ͸ • ༧ଌ஋ͱ؍ଌ஋ͷޡࠩΛࢉग़ • ޡࠩΛϘτϜΞοϓʹ఻ୡ • ޡࠩΛ࠷খԽ͢Δ༧ଌ஋Λग़ྗ • ༧ଌ஋Λτοϓμ΢ϯʹ఻ୡ ࣮ը૾ ༧ଌը૾ ࣌ࠁt →

Slide 15

Slide 15 text

96 PredNet [Lotter+, 16] 15 Ϟδϡʔϧͷߏ଄ 1. ༧ଌ෯͕ҰఆͰ͋Δ͜ͱ 2. ༧ଌ෯͕খ͍͞ʢfMRIσʔλͱಉظ͕ͱΕͳ͍ʣ ໰୊఺

Slide 16

Slide 16 text

96 • Temporal Difference Variational Auto-Encoder • ಈը૾͔Β೚ҙεςοϓઌͷը૾Λ༧ଌ͢Δݚڀ • ৴೦ঢ়ଶΛಋೖ • POMDPʢ෦෼؍ଌϚϧίϑܾఆաఔʣͷ৴೦ঢ়ଶʹ૬౰ • ॊೈͳ࣌ؒ෯Ͱͷ༧ଌ TD-VAE [Gregor+, 18] 16 ࣌ࠁt → ༧ଌը૾2 ༧ଌը૾1

Slide 17

Slide 17 text

96 TD-VAE [Gregor+, 18] 17 !!! !!" !!! !!" " " !! # # !!|!" $!! " % !" " " !" $!" " & !" Input Input Prediction Decoder Inference

Slide 18

Slide 18 text

96 TD-VAE [Gregor+, 18] 18 !!! !!" !!! !!" " " !! # # !!|!" $!! " % !" " " !" $!" " & !" Input Input Prediction Decoder Inference belief state(信念状態) 1. ώτͷ೴಺৘ใॲཧػߏ͕൓ө͞Ε͍ͯͳ͍ 2. fMRIデータとの相関が取れない ໰୊఺

Slide 19

Slide 19 text

96 !!! !"! ,ℓ "̂!!,ℓ "̂!!,ℓ$% !"!,ℓ%& "!!,ℓ$% "#"!,ℓ !"!!,ℓ$% $!' !!' Observation input input Error representation Belief state Representation layer Belief state 0th module Belief state 1st module $!! Observation "!',ℓ !"",ℓ "!!,ℓ !"!,ℓ%& ఏҊϞσϧ [Kuroda+, 21] 19 ɿ؍ଌ ɿ৴೦ͷਪҠ ɿະདྷ͔Βͷਪ࿦ ɿࠩ෼ͷ఻೻ ɿ༧ଌϞσϧͷߋ৽ ɿ༧ଌͷ఻೻ ɿະདྷͷ༧ଌ ɿ༧ଌੜ੒

Slide 20

Slide 20 text

96 ໨త • ༧ଌִؒΛ1ඵʹͨ͠ͱ͖ɺࣗવಈը૾ͷ༧ଌ͕Մೳ͔Λݕূ ઃఆ • σʔληοτɿThe KITTI dataset • ࣌ࠁt=6Ҏ߱Λ༧ଌ ࣮ݧ1ɿ௕͍࣌ؒ෯Ͱͷ༧ଌ 20 ֶशઃఆ train data ʢ= 9:1ʣ 28,297 test data 426 #Layer 4 Size of convolutional filter 3 ×3 (for all convolutions) # Channels From lower module 3, 48, 96, 192 Optimization algorithm Adam [Kingma+, 15]

Slide 21

Slide 21 text

96 • Vision meets Robotics: The KITTI Dataset • ΧʔϧεϧʔΤ޻Պେֶɺ๛ా޻ۀେֶγΧΰߍʹΑΔڞಉ࡞੒ • ੈք࠷େن໛ͷंࡌ༻ϕϯνϚʔΫςετ ࣮ݧ1ɿσʔληοτ 21 ઃఆ ΧςΰϦʔ਺ 6छྨ fps 10 (ϑϨʔϜ / ඵ) Ξϊςʔγϣϯ 15छྨ ը૾αΠζ 1242×375 Train data 28,297 Test data 426

Slide 22

Slide 22 text

96 ࣮ݧ1ɿ݁Ռ − ༧ଌը૾ྫ 22 ࣮ը૾ PredNet ఏҊϞσϧ ࣌ؒ 1ඵޙͷ༧ଌ݁Ռͷੜ੒ →࣌ؒ t

Slide 23

Slide 23 text

96 • ը૾ධՁࢦඪSSIMʹΑΔ༧ଌը૾ͷҰக౓ • Structure similarity • 𝑆𝑆𝐼𝑀 𝑥, 𝑦 = ("#!#"$%#)("'!"$%$) (#! $$#" $$%#)('! $$'" $$%$) • xɿݩը૾, yɿ༧ଌը૾, μɿฏۉ, σɿ෼ࢄ ࣮ݧ1ɿ݁Ռ ਫ਼౓ 23 SSIM↑ ఏҊϞσϧ 0.85 PredNet 0.65

Slide 24

Slide 24 text

96 ໨త • ఏҊϞσϧʹΑΔਪ࿦͕ɺ࣮ࡍͷώτ೴ͱ૬͕ؔ͋Δͷ͔ ઃఆ • σʔληοτɿࣗવಈը૾σʔληοτ [Nishimoto+, 11] • Ϟσϧͷಛ௃දݱͱώτ೴͔Βਪఆͨ͠ಛ௃දݱΛൺֱ ࣮ݧ2ɿώτ೴ͱͷ૬ؔؔ܎ͷௐࠪ 24

Slide 25

Slide 25 text

96 ࣮ݧ2ɿશମਤ 25 ೴׆ಈ৘ใ ϖΞσʔλ Ϟσϧͷ ಛ௃දݱ ਪఆ͞Εͨ ಛ௃දݱ ૬ؔ܎਺ ճؼϞσϧ ʢϦοδճؼʣ ࣗવಈը૾ ༧ଌϞσϧ [Kay+, 08][Nishimoto+, 11] [Schoenmakers+, 13]

Slide 26

Slide 26 text

96 PredNetʹ͓͚ΔϦοδճؼ ఏҊϞσϧʹ͓͚ΔϦοδճؼ ࣮ݧ2ɿ֓ཁਤ 26 R1 R0 R3 R2 Error representation 3rd module Input layer Representation layer Ridge regression Brain activity watch movie, 10min cut 10 frames/sec 2nd module 1st module 0th module Stimulus image Belief state 0th module 2nd module R1 R0 R3 R2 Ridge regression Brain activity watch movie, 10min Stimulus image 3rd module 1st module cut 10 frames/sec Stimulus image Error representatio Belief state Representatio layer R1 R0 R3 R2 Error representation 3rd module Input layer Representation layer Ridge regression Brain activity watch movie, 10min cut 10 frames/sec 2nd module 1st module 0th module Stimulus image Belief state 0th module 2nd module R1 R0 R3 R2 Ridge regression Brain activity watch movie, 10min Stimulus image 3rd module 1st module cut 10 frames/sec Stimulus image Error representation Belief state Representation layer

Slide 27

Slide 27 text

96 ࣮ݧ2ɿ݁Ռ 27 ૬ؔ܎਺ˢ α PredNet TD-VAE ఏҊϞσϧ 0.5 1,000 25,000 0.5 1,000 25,000 0.5 1,000 25,000 R0 ࠷Լ૚ 0.2623 0.2971 0.3207 0.2636 0.2983 0.3285 0.2637 0.2983 0.3291 R2 0.0925 0.1459 0.1955 - - - 0.0003 0.0012 0.0016 R3 0.0254 0.1217 0.1871 - - - 0.0004 0.0009 0.0012 αɿϞσϧύϥϝʔλ R1ɿϦιʔεͷ੍໿ͰਪఆෆՄ

Slide 28

Slide 28 text

96 • ώτ೴಺ʹ͓͚Δ࣌ؒೝࣝΛػցֶशϞσϧͰදݱ͍ͨ͠ • ώτ೴಺ͷ֊૚ߏ଄Λ໛ͨ͠ɺॊೈͳ࣌ؒ෯Ͱͷ༧ଌ͕Ͱ͖Δ༧ଌϞσϧͷߏங ࣮ݧ1 • PredNetΑΓ΋༧ଌִ͕ؒ௕͍ը૾͕ੜ੒Ͱ͖ͨ ࣮ݧ2 • ࠷Լ૚ʢR0ʣͰ͸ɺώτ೴ͱ૬͕ؔݟΒΕΔ͜ͱ͕Θ͔ͬͨ • Ϟσϧ͕ώτ೴಺ͷ಺෦දݱͱ͍ۙ → ώτ೴ͷ࡞ۀϞσϧʹͳΓ͏Δ • ্Ґ૚ʹͳΔͱ૬͕ؔݟΒΕͳ͔ͬͨ • ਪ࿦ΛR0ͷΈͰߦ͍ͬͯΔͨΊ • ೴׆ಈ৘ใ͕ը૾಺ͷ؀ڥΛཧղͨ͠σʔλͱ͸ݴ͑ͳ͍ • ͋͘·Ͱ ώτ͕ݟͨ΋ͷͷ৴߸ͱɺϞσϧͷ಺෦දݱʹ૬ؔ ͕ݟΒΕͨ ͱ͍͏͜ͱʹͱͲ·Δ ·ͱΊɾߟ࡯ 28

Slide 29

Slide 29 text

96 29 ch5: Extraction of Motion Change Points based on the Physical Characteristics of Objects ը૾಺ͷ෺ମʹண໨ͨ͠ ಈ͖ͷมԽ఺நग़΁ͷऔΓ૊Έ

Slide 30

Slide 30 text

96 ͜͜·Ͱͷ༧ଌ • ೖྗɿը૾ • ը૾ಛ௃ྔͷ༧ଌʹͱͲ·Δ • ώτΒ͍͠༧ଌ • ೖྗશ෦Λࡉ͔͘༧ଌ͍ͯ͠ΔΘ͚Ͱ ͸ͳ͍ • ॏཁͳ෦෼ΛϐοΫΞοϓͯ͠༧ଌ ՝୊఺ • ը૾಺ͷঢ়گ͕ཧղͰ͖͍ͯͳ͍ • ॏཁͳ෦෼Λऔ͖ͬͯͯɺͦͷ৘ใ ʹରͯ͠ͷ༧ଌ͕Ͱ͖͍ͯͳ͍ • LLMͱͷੑೳͷൺֱ എܠɾ໨త 30

Slide 31

Slide 31 text

96 ώτʹΑΔ؀ڥͷཧղ • ؀ڥ಺ʢexɿը૾, ಈըͳͲʣʹ͋Δ෺ମ͕ԿͰ͋Δ͔Λೝࣝ • ೝࣝͨ͠෺ମͷಈ͖ʢexɿ଎౓, ଞͷ෺ମͱͷؔ܎ੑʣΛཧղ ݚڀ1Ͱͷ؀ڥͷཧղ • কདྷى͜Γͦ͏ͳը૾ʢঢ়گͷඳࣸʣΛ༧ଌͨ͠ʹ͗͢ͳ͍ • ؀ڥʹԿ͕͋Δ͔ʢ෺ମͷཧղʣɺ෺ମ͕ͲͷΑ͏ʹಈ͍͍ͯΔ͔ʢ෺ཧಛੑͷཧղʣ͕ Ͱ͖͍ͯͳ͍ • ؀ڥʹ͋Δ෺ମ΍ಈ͖Λཧղ͢Δ࢓૊Έͷ։ൃ ෺ཧಛੑ ͱ͸ 31

Slide 32

Slide 32 text

96 എܠɾ໨త 32 • Variational Temporal Abstraction (VTA) [Kim+, 2019] • ࢹ֮৘ใ͔Β؀ڥͷજࡏߏ଄Λऔಘ • ؀ڥ͕มԽ͢ΔλΠϛϯάΛநग़ • ϐΫηϧͷมԽʹண໨ • ෺ମͷ෺ཧಛੑʢ଎౓ͳͲʣΛ ߟྀͰ͖͍ͯͳ͍ 1. ෺ମؒͷؔ܎ΛάϥϑϕʔεͰදݱ 2. άϥϑͷมԽΛ΋ͱʹ؀ڥͷมԽ఺ͷநग़ ఏҊ

Slide 33

Slide 33 text

96 શମਤ 33 ै དྷ ͷ ख ๏ 3D໎࿏ ը૾ಛ௃ྔͷΈ ࣮ੈքʹ͍ͭͯ ཧղΛ͍ͯ͠ͳ͍ CLEVRER [Yi+, 19] άϥϑߏ଄ ఏ Ҋ ख ๏ ෺ମݕग़ ଎౓ɾՃ଎౓ ը૾ಛ௃ྔͳͲ มԽ఺ͷ ϑϥά Λநग़ VTA ! ! ! ! ! " " " " " ! ! ! ! # # # # # 0 1 0 0 ! ! ! " " # # # ! ! ! ! ! " " " " " ! ! ! ! # # # # # 0 1 0 0 ! ! ! ! ! " " # # # # # observation (input) observation abstraction boundry indicator temporal abstraction

Slide 34

Slide 34 text

96 • Temporal Abstraction • ࣌ؒํ޲ʹ͓͍ͯɼ֊૚Խ͞Ε֤ͨϨϕϧͷΞΫγϣϯΛ౷߹ Temporal Abstraction 34 ྫɿྉཧΛ͢Δ Ϩγϐͷબ୒ ങ͍෺ ௐཧ ುΛ༻ҙ εʔύʔʹߦ͘ ುΛ͔͖ࠞͥΔ ۩ࡐΛ੾Δ ࿹ɾख ଍ ࿹ɾख ࿹ɾख ϝϞாΛ༻ҙ ख ௐཧ ख ߴ த ௿ ϖϯΛ࣋ͭ ങ͍෺Ϧετ ͷ࡞੒ ങ͍෺Ϧετ ͷ࡞੒ ಉҰঢ়ଶ ಉҰঢ়ଶ

Slide 35

Slide 35 text

96 • Temporal Abstraction • ࣌ؒํ޲ʹ͓͍ͯɼ֊૚Խ͞Ε֤ͨϨϕϧͷΞΫγϣϯΛ౷߹ Temporal Abstraction 35 Ϩγϐͷબ୒ ങ͍෺ ௐཧ ುΛ༻ҙ εʔύʔʹߦ͘ ುΛ͔͖ࠞͥΔ ۩ࡐΛ੾Δ ࿹ɾख ଍ ࿹ɾख ࿹ɾख ϝϞாΛ༻ҙ ख ख ߴ த ௿ ϖϯΛ࣋ͭ ങ͍෺Ϧετͷ࡞੒ ಉҰঢ়ଶ ಉҰঢ়ଶ ྫɿྉཧΛ͢Δ

Slide 36

Slide 36 text

96 Variational Temporal Abstraction [Kim+, 19] 36 ੨͍ಓΛา͍ͨͱ͖ ੺͍ಓΛา͍ͨͱ͖ શΠϕϯτ େࣄͳՕॴ ʢมԽ఺ʣ શΠϕϯτ େࣄͳՕॴ ʢมԽ఺ʣ

Slide 37

Slide 37 text

96 Variational Temporal Abstraction [Kim+, 19] 37 ! ! ! ! ! " " " " " # # # # # ! ! ! ! ! " " # # # # # observation (input) observation abstraction boundry indicator temporal abstraction

Slide 38

Slide 38 text

96 Variational Temporal Abstraction [Kim+, 19] 38 ϑϥάmͷಋೖ ! ! ! ! ! " " " " " ! ! ! ! # # # # # 0 1 0 0 ! ! ! ! ! " " # # # # # observation (input) observation abstraction boundry indicator temporal abstraction

Slide 39

Slide 39 text

96 • CLEVRER [Yi+, 2020] • CoLlision Events for Video REpresentation and Reasoning σʔληοτ 39 ֓ཁ ಈը 20,000 ݸ (train:val:test=2:1:1) ϏσΦͷ௕͞ 5 ඵ ϑϨʔϜ਺ 128ϑϨʔϜ ܗঢ় ཱํମɾٿɾԁப ૉࡐ ϝλϧɾϥόʔ ৭ փɼ੺ɼ੨ɼ྘ɼ஡ɼਫ৭ɼࢵɼԫ৭ Πϕϯτ ग़ݱɼফࣦɼিಥ Ξϊςʔγϣϯ object id, Ґஔ, ଎౓, Ճ଎౓

Slide 40

Slide 40 text

96 • ؀ڥͷ෺ཧಛੑ͔Β࡞੒ͨ͠σʔληοτ Physical Training Dataset 40 ෺ମೝࣝ ෺ମͷ Ґஔ৘ใ άϥϑߏ଄ ଎౓ Ճ଎౓ ෺ମಉ࢜ͷ Ґஔํ޲ͷϑϥά ຒΊࠐΈ ϕΫτϧ ݁߹

Slide 41

Slide 41 text

96 ෺ମೝࣝ • YOLO v3 [Redmon+, 18] • ෺ମͷʨܗঢ়ʩ • YOLACT [Bolya+,2019] • ෺ମͷʨܗঢ়ɼ৭ʩ/ʨܗঢ়ɼ৭ɼૉࡐʩ Ґஔ৘ใ • औಘͨ͠ό΢ϯσΟϯάϘοΫεͷ ࠲ඪ͔Β෺ମͷத৺࠲ඪΛࢉग़ Physical Training Dataset − ෺ମೝࣝɾҐஔ৘ใ 41 ࣮ը૾ YOLO v3 YOLACT {෺ମ, ৭} YOLACT {෺ମ, ৭, ૉࡐ} (𝑥!, 𝑦!) (𝑥", 𝑦") 𝑐 = 𝑥, 𝑦 = ( 𝑥! + 𝑥" 2 , 𝑦! + 𝑦" 2 ) c

Slide 42

Slide 42 text

96 ଎౓ɾՃ଎౓ ଎౓ • 𝑣#( = (𝑥$ − 𝑥$%! )/𝑒𝑡&'()* • 𝑣+( = (𝑦$ − 𝑦$%! )/𝑒𝑡&'()* Ճ଎౓ • 𝑎#( = (𝑣#( − 𝑣#) )/(𝑒𝑡&'()*×𝑡) • 𝑎+( = (𝑣+( − 𝑣+) )/(𝑒𝑡&'()*×𝑡) ෺ମؒͷҐஔؔ܎ • main object = (𝑥*+,- , 𝑦*+,- ) • others = (𝑥./012 , 𝑦./012 ) • 𝑥3,44 = 𝑥./012 − 𝑥*+,- • 𝑦3,44 = 𝑦./012 − 𝑦*+,- Physical Training Dataset − ଎౓ɾՃ଎౓ɾ෺ମؒͷҐஔ 42 ϑϨʔϜؒͷܦա࣌ؒɿ𝑒𝑡!"#$% = 5/128 x main object others y ୈ2৅ݶ + − + ୈ1৅ݶ ୈ2৅ݶ − ୈ4৅ݶ ୈ3৅ݶ ୈ1৅ݶ ୈ3৅ݶ ୈ4৅ݶ x y

Slide 43

Slide 43 text

96 άϥϑߏ଄ • ϊʔυ৘ใ • ෺ମͷܗঢ়ɼ৭ɼૉࡐ • Ͳͷ෺ମಉ͕͓࢜ޓ͍ͷಈ͖ʹؔ༩ͯ͠ ͍Δ͔͸͜ͷ࣌఺Ͱ͸Θ͔Βͳ͍ͨΊɺ ׬શάϥϑͰ࡞੒ ຒΊࠐΈϕΫτϧ • node2vec [Grover+, 16] • graph2vec [Narayanan+, 17] Physical Training Dataset − άϥϑߏ଄ 43 [[0.54, 0.29, 0.61…], [[0.82, 0.91, 0.15…], … [[0.14, 0.35, 0.69…]] 埋め込みベクトル

Slide 44

Slide 44 text

96 ࣮ݧ1 • ఏҊ͢ΔมԽ఺நग़ϞσϧΛ༻͍ͨͱ͖ɺিಥͷλΠϛϯάΛநग़Ͱ͖Δ͔ ࣮ݧ2 • LLMΛ༻͍ͨͱ͖ͷਫ਼౓ • GPT-4o • GPT-4o-FT • GPT-4o mini • GPT-4 ࣮ݧɿিಥͷλΠϛϯάͷநग़ 44

Slide 45

Slide 45 text

96 ໨త • ෺ମͷಈ͖ͷมԽ͔Βɺিಥ΍৔໘ͷେ͖ͳมԽͷλΠϛϯάΛऔಘͰ͖Δ͔Λݕূ ઃఆ • ਖ਼ղͷিಥɿ30ϑϨʔϜ ਓʹΑΔিಥͷೝࣝɿ32ϑϨʔϜ • ਖ਼ղൣғɿ30ʙ32ϑϨʔϜ • F1είΞͰࢉग़ ࣮ݧ1ɿมԽ఺நग़ϞσϧʹΑΔিಥͷݕग़ 45 ֶशઃఆ σʔλ਺ʢ8:1:1ʣ 600,000 όοναΠζ 100 ग़ྗ਺ʢϑϥά਺ʣ 80 ࠷దԽؔ਺ Adam [Kingma+, 17] Τϥʔؔ਺ KLμΠόʔδΣϯε

Slide 46

Slide 46 text

96 ࣮ݧ1ɿมԽ఺நग़ϞσϧʹΑΔিಥͷݕग़ − ݁Ռ 46

Slide 47

Slide 47 text

96 • LLMʢGPTܥʣ͸ը૾಺ͷ෺ମΛࢹ֮తɾ෺ཧతʹଊ͑ΔೳྗΛ΋͍ͬͯΔͷ͔ • ਪ࿦ʹ͓͚ΔڧΈɾऑΈ • Ϟσϧͷҧ͍ GPTͷछྨ • GPT-4o • GPT-4o-FT • CLEVRERΛೝࣝͰ͖ΔΑ͏ʹϑΝΠϯνϡʔχϯάͨ͠Ϟσϧ • GPT-4o mini • GPT-4 ࣮ݧ2ɿGPTΛ༻͍ͨিಥͷݕग़ 47

Slide 48

Slide 48 text

96 ݕূઃఆ • 100িಥ • িಥલ10ϑϨʔϜɺিಥޙ9ϑϨʔϜͷ߹ܭ20ຕΛೖྗ • ਖ਼ղσʔλɿϑϨʔϜ19 ࢹ֮ɿϑϨʔϜ21 • ग़ྗɿϑϨʔϜ਺ ϓϩϯϓτ • simple • ԿϑϨʔϜ໨Ͱ෺ମಉ͕࢜িಥ͢Δ͔આ໌͍ͯͩ͘͠͞ɻ • detail • #෺ମͷ৭, #෺ମͷܗ, #෺ମͷૉࡐ • #ঢ়گɿෳ਺ͷ෺ମ͕ಈ͍͓ͯΓɺ2ͭͷ෺ମ͕িಥ͠·͢ɻ • #࣭໰ɿͲͷ෺ମ͕ɺԿϑϨʔϜ໨Ͱিಥ͍ͯ͠Δ͔Λઆ໌͍ͯͩ͘͠͞ɻ ࣮ݧ2ɿGPTΛ༻͍ͨিಥͷݕग़ 48

Slide 49

Slide 49 text

96 ࣮ݧ2ɿGPTΛ༻͍ͨিಥͷݕग़ − ݁Ռ 49 Model Prompt macro-F1 Accuracy ෺ମ͕͍͋ͬͯΔ͔ʢ৭, ܗ, ૉࡐʣ simple detail GPT-4o 㾎 75.4 2.53 㾎 75.3 2.65 GPT-4o-FT 㾎 75.8 2.65 㾎 76.1 2.63 GPT-4o mini 㾎 28.5 1.35 㾎 33.3 1.58 GPT-4 㾎 0 0 㾎 0 0 มԽ఺நग़Ϟσϧ 63.5 2.75 macro-F1 • ֤িಥʹ͓͚ΔF1είΞͷฏۉ Accuracy • িಥʹؔ܎͢Δ෺ମͷछྨ͕͍͋ͬͯΔ͔Ͳ͏͔ max: 3

Slide 50

Slide 50 text

96 • ը૾಺ʹ͋Δ෺ମͷಈ͖͔Βɺিಥ΍ফࣦͷλΠϛϯάΛଊ͑ΒΕΔ࢓૊Έͷߏங ࣮ݧ1ɿมԽ఺நग़Ϟσϧ • ਖ਼ղσʔλͰ͋ΔΞϊςʔγϣϯϕʔεͷ݁Ռ͕࠷΋ਫ਼౓͕ߴ͍ • YOLACTͰݕ஌ͨ݁͠Ռͷํ͕ɺYOLO v3ͷ΋ͷΑΓਫ਼౓͕ߴ͍ • node2vecͷํ͕graph2vecΑΓਫ਼౓͕ߴ͍ • ը૾ಛ௃ྔͱ෺ཧ৘ใΛ·ͱΊͯѻ͏ͱɺը૾ಛ௃ྔ͕ϊΠζʹͳΔ ࣮ݧ2ɿ(15 • GPT-4o-FTͷ݁Ռ͕࠷΋ߴ͍ • CLEVRERΛೝࣝͰ͖ΔΑ͏ʹfinetuning͔ͨ͠Β • ΋ͱ΋ͱͷਪ࿦ೳྗ͕ߴ͍΄Ͳɺmacro-F1ͷείΞ΋ߴ͍ • ͔͠͠GPTܥ΋ը૾ಛ௃ྔͰ؀ڥΛཧղ͍ͯ͠ΔͷͰɺώτͷΑ͏ʹ؀ڥΛଊ͑ͨͱ͸ݴ͍೉͍ • มԽ఺நग़Ϟσϧ͸෺ମͷछྨͷཧղ͸ߴ͔͕ͬͨɺিಥΛ΄΅ଊ͑ΒΕΔΑ͏ʹͳͬͨΘ͚Ͱ͸ͳ͍ ·ͱΊɾߟ࡯ 50

Slide 51

Slide 51 text

96 51 ch6: Predictive Inference Model of the Physical Environment ࣮ੈքʹ͓͚Δ ෺ཧ؀ڥͷ༧ଌਪ࿦Ϟσϧ

Slide 52

Slide 52 text

96 ͜͜·Ͱͷݚڀ • ը૾಺ͷ෺ମͷಈ͖ʹண໨ • িಥ΍ফࣦͳͲͷλΠϛϯάΛ ଊ͑Δ͜ͱ͕Ͱ͖ΔΑ͏ʹͳͬͨ • ਓͷ࣮ੈքೝࣝͱ༧ଌ • ݟͨ΋ͷΛࢹ֮తɾ෺ཧతʹ༧ଌ͢Δ ՝୊఺ • ࢹ֮ɾಈ͖ͷ྆ํͷ༧ଌ͸Ͱ͖ͯ ͍ͳ͍ • LLMࣗମʹ෺ཧಛੑΛཧղ͢Δ ೳྗ͸͋Δͷ͔ എܠɾ໨త 52

Slide 53

Slide 53 text

96 શମਤ 53 physical training data • άϥϑߏ଄ͷຒΊࠐΈϕΫτϧ • ֤෺ମͷ଎౓ɾՃ଎౓ • ෺ମؒͷҐஔؔ܎ CLEVRER [Yi+, 19] 2. ը૾ͷ༧ଌ (predicted image) • PredNet [Lotter+, 16] • PredRNN [Wang+, 17] • PredRNN v2 [Wang+, 21] • PreCNet [Straka+, 23] ༧ଌϞσϧͷߏங มԽ఺༧ଌ Ϟσϧ • VTA [Kim+, 19] (Variational Temporal Abstraction) 1. িಥͷλΠϛϯά ͷϑϥά Output Input Input

Slide 54

Slide 54 text

96 PredNet [Lotter+, 16] • େ೴ൽ࣭ʹ͓͚Δ༧ଌූ߸Խͷ ॲཧΛ໛฿ • ΤϥʔΛ֊૚తʹਪ࿦ • ݚڀ1Ͱ࢖༻ͨ͠༧ଌϞσϧͱಉ͡ PreCNet [Straka+, 23] • PredNetΛվྑ • ೖྗ৘ใશମΛຖճਪ࿦ ج൫ʹͳΔ༧ଌϞσϧ 54 !ℓ"# !ℓ " # ℓ"# "ℓ"# " # ℓ "ℓ $ℓ"# $ℓ ⊝ ⊝ conv Prediction Target pool conv input Error +,-ReLU subtract %$! Input Representation conv LSTM !! ℓ ⊝ ⊝ "! ℓ#$ # $ ! ℓ#$ # $ ! ℓ "! ℓ %! upsample !! ℓ#$ ⊝ ⊝ upsample conv LSTM conv +,- ReLU subtract conv input conv LSTM Representation Pediction Error +,- ReLU subtract

Slide 55

Slide 55 text

96 PredRNN [Wang+, 2017] • ConvLSTMΛ֊૚ʹͨ͠ܗͷ༧ଌϞσϧ • ۭؒɾ࣌ؒͷ྆ํʹHʢӅΕ૚ʣ͕ೖྗ PredRNN v2 [Wang+, 2022] • PredRNNΛվྑͨ͠৽ͨͳ༧ଌϞσϧ • HΛೖྗ͢ΔήʔτΛ૿΍ͨ͠ ج൫ʹͳΔ༧ଌϞσϧ 55 !!"# ℓ "! !! ℓ #! $! %! ℓ #′! '′! "′! %! ℓ"# '! !!"# ℓ "! !! ℓ ⨂ ⨂ ⨂ ⨂ ⨂ Input Gate Output Gate Input Modulation Gate Forget Gate Standard Temporal Memory Spatiotemporal Memory ! = !! + !" = 1 % & '# ( − '# + $ #%! 1 % & '# ( − '# " $ #%! ( %&'()*+& ℓ,! = cos(∆!! ℓ, ∆%! ℓ) !!"# ℓ "! !! ℓ #! $! %! ℓ #′! '′! "′! %! ℓ"# '! !!"# ℓ "! !! ℓ ⨂ ⨂ ⨂ ⨂ ⨂ Input Gate Output Gate Input Modulation Gate Forget Gate Standard Temporal Memory Spatiotemporal Memory PredRNN PredRNN v2 ΦϦδφϧConvLSTM ConvLSTMʹۭ࣌ؒهԱͷ ػߏΛ௥Ճ PredRNN, v2 ͷ಺෦ߏ଄

Slide 56

Slide 56 text

96 PredNetϕʔεͷมԽ఺༧ଌϞσϧ 56 image data !!"_ℓ%& !!"_ℓ ⊝ ⊝ #!"_ℓ%& $ % !"_ℓ%& $!"_ℓ%& $ % !"_ℓ $!"_ℓ !'"_ℓ%& !'"_ℓ ⊝ ⊝ #'"_ℓ%& $ % '"_ℓ%& $'"_ℓ%& $ % '"_ℓ $'"_ℓ img img output &'((!" #!"_ℓ &'(('" )( flag output )( = 0 ∶ &'(( < . )( = 1 ∶ &'(( > . &'(( = &'((!" + &'(('" 2!" Input Error Representation Prediction time t Physical data .: threshold Difference #'"_ℓ 2'" Input physical training data image data

Slide 57

Slide 57 text

96 PredRNN, PredRNN v2ϕʔεͷมԽ఺༧ଌϞσϧ 57 !!_#$% ℓ'( , #!_#$% ℓ'( !!_#$% !!_&'( ST-$%&'!"# ℓ%& ST-$%&'!"# ℓ%' ST-$%&'!"# ℓ%( ST-$%&'!"# ℓ%) ! " !)*_#$% ! " !)*_&'( $!)*_+,- ℓ'. $!_+,- ℓ'* $!_+,- ℓ'( $!_+,- ℓ'/ !!_+,- ℓ'* !!_+,- ℓ'( !!_+,- ℓ'/ $!_+,- ℓ'. $!_#$% ℓ'. %! = ' 0 ∶ +,--! < / 1 ∶ +,--! > / time ! $!)*_#$% ℓ'. image data ()***_,-. physical data ()***_!"# +,--! = +,--!_+,- + +,--!_#$% ST-$%&',-. ℓ%& ST- $%&',-. ℓ%' ST- $%&',-. ℓ%( ST- $%&',-. ℓ%) physical training data img output !!_+,- ℓ'( , #!_+,- ℓ'( !!_+,- ℓ'/ , #!_+,- ℓ'/ !!_+,- ℓ'. , #!_+,- ℓ'. !!_+,- ℓ'* , #!_+,- ℓ'* $!_#$% ℓ'* $!_#$% ℓ'( $!_#$% ℓ'/ !!_#$% ℓ'* !!_#$% ℓ'( !!_#$% ℓ'/ !!_#$% ℓ'/ , #!_#$% ℓ'/ !!_#$% ℓ'. , #!_#$% ℓ'. !!_#$% ℓ'* , #!_#$% ℓ'*

Slide 58

Slide 58 text

96 PreCNetϕʔεͷมԽ఺༧ଌϞσϧ 58 !!_#$% ℓ'( !!_#$% ℓ ⊝ ⊝ "!_#$% ℓ'( # $ !_#$% ℓ'( ! " !_#$% ℓ "!_#$% ℓ !!_#$% Input !!_)*+ ℓ'( !!_)*+ ℓ ⊝ ⊝ "!_)*+ ℓ'( ! " !_'() ℓ*+ ! " !_'() ℓ !!_&'( Input !"##!_#$% !"##!_&'( "!,(_#$% ℓ "!,(_)*+ ℓ "!_)*+ ℓ upsample upsample !! = # 0 ∶ '())! < + 1 ∶ '())! > + #$%%! = #$%%!_#$% + #$%%!_'() time ! image data physical data physical training data img img output pool pool Error Representation Prediction

Slide 59

Slide 59 text

96 ࣮ݧ1 • ఏҊ͢ΔมԽ఺༧ଌϞσϧ4छྨΛ༻͍ͯɺিಥͷλΠϛϯάΛ༧ଌͰ͖Δ͔ ࣮ݧ2 • LLMΛ༻͍ͨͱ͖ͷਫ਼౓ • GPT-4o • GPT-4o-FT • GPT-4o mini • GPT-4 ࣮ݧɿ෺ମͷিಥ༧ଌ 59

Slide 60

Slide 60 text

96 • ؀ڥ಺ʹ͋Δ෺ମͷিಥΛ༧ଌͰ͖Δ͔ ઃఆ • σʔληοτɿCLEVRER, Physical Training Dataset • ਖ਼ղͷিಥɿ30ϑϨʔϜ ਓʹΑΔিಥͷೝࣝɿ32ϑϨʔϜ • ਖ਼ղൣғɿ30ʙ32ϑϨʔϜ • ର৅ൣғɿ6ύλʔϯʢiʙviʣº10ϑϨʔϜ • F1είΞɺmacro-F1 ࣮ݧ1ɿมԽ఺༧ଌϞσϧʹΑΔিಥ༧ଌ 60

Slide 61

Slide 61 text

96 ࣮ݧ1ɿઃఆ 61 PredRNNɾPredRNN v2 ϕʔε PreCNetϕʔε ֶशσʔλ਺ 600,000 600,000 ςετσʔλ਺ 80,000 80,000 ΤϙοΫ 500,000 500,000 ϨΠϠʔ਺ 4 4 νϟϯωϧ਺ 128 3, 48, 96, 192 ΧʔωϧαΠζ 5×5 - ଛࣦؔ਺ Adam [Kingma+, 17] Adam [Kingma+, 17] ֶश཰ݮਰ 0.001 0.0001 αʢมԽ఺൑ఆͷᮢ஋ʣ 5 5

Slide 62

Slide 62 text

96 ࣮ݧ1ɿ݁Ռ − িಥϑϥάͷਫ਼౓ 62 ൣғ i ii iii iv v vi macro- F1 PredNet ϕʔε 40.0 50.0 50.0 40.0 57.1 50.0 42.5 PredRNN ϕʔε 50.9 54.8 53.1 48.9 60.6 61.7 57.5 PredRNN v2ϕʔε 51.4 57.5 54.6 50.6 62.7 64.2 59.2 PreCNet ϕʔε 62.1 64.2 59.2 60.8 68.9 69.8 65.8 িಥ͢Δͱ༧ଌͨ͠ϑϥά͕ਖ਼͍͔͠Ͳ͏͔

Slide 63

Slide 63 text

96 • িಥલͷ෺ମͷಈ͖Λ༩͑ͨͱ͖ɺͦͷޙʹੜ͡ΔিಥΛ༧ଌͰ͖Δ͔ ݕূઃఆ • 100িಥ • িಥ͕ى͖Δ15ϑϨʔϜલʙ6ϑϨʔϜલͷܭ10ϑϨʔϜΛ༩͑Δ • ϑϨʔϜ14Ҏ߱Ͱੜ͡ΔিಥͷϑϨʔϜ਺Λ༧ଌʢਖ਼ղ20ʙ22ʣ ϓϩϯϓτ • simple • ԿϑϨʔϜ໨Ͱ෺ମ͕িಥ͢Δ͔༧ଌ͍ͯͩ͘͠͞ɻ • detail • #෺ମͷ৭, #෺ମͷܗ, #෺ମͷૉࡐ • #ঢ়گɿෳ਺ͷ෺ମ͕ಈ͍͓ͯΓɺ2ͭͷ෺ମ͕িಥ͠·͢ɻ • #࣭໰ɿ෺ମ͕ԿϑϨʔϜޙʹিಥ͢Δ͔Λ༧ଌ͍ͯͩ͘͠͞ɻ ࣮ݧ2ɿGPTʹΑΔ෺ମͷিಥ༧ଌ 63

Slide 64

Slide 64 text

96 ࣮ݧ2ɿGPTʹΑΔ෺ମͷিಥ༧ଌ − ݁Ռ 64 Model Prompt macro-F1 Accuracy ෺ମ͕͍͋ͬͯΔ͔ simple detail GPT-4o 㾎 55.2 1.95 㾎 54.7 1.95 GPT-4o-FT 㾎 61.5 2.11 㾎 62.0 2.10 GPT-4o mini 㾎 55.8 1.68 㾎 54.5 1.65 GPT-4 㾎 0 0 㾎 0 0 มԽ఺༧ଌϞσϧ 67.5 2.47

Slide 65

Slide 65 text

96 • ը૾಺ʹ͋Δ෺ମͷিಥͷλΠϛϯάΛɺࢹ֮తɾ෺ཧతͳมԽ͔Β༧ଌͰ͖ΔϞσϧͷߏங ࣮ݧ1ɿมԽ఺༧ଌϞσϧ • ϕʔεͱͳΔ༧ଌϞσϧͷਫ਼౓͕ɺมԽ఺༧ଌϞσϧͷਫ਼౓ʹӨڹ • PreCNet͕࠷΋ਫ਼౓͕ߴ͍ • ༧ଌը૾ͷਫ਼౓ɺmacro-F1ͷείΞ • F1είΞɿߴͯ͘΋໿65 • ׬શʹিಥͷλΠϛϯάΛ͔ͭΊͨΘ͚Ͱ͸ͳ͍ • վળͷ༨஍͋Γ ࣮ݧ2ɿGPT • มԽ఺༧ଌϞσϧ͕࠷΋ਫ਼౓͕ߴ͔ͬͨ • GPTܥͰ͸ɺϑΝΠϯνϡʔχϯάͨ݁͠Ռ͕ߴ͔ͬͨ • ༧ଌʹͳΔͱҰؾʹείΞ͕Լ͕Δ • ϐΫηϧͷ༧ଌͱɺিಥ͢Δ͜ͱ͸ͭͳ͕͍ͬͯͳ͍ ·ͱΊɾߟ࡯ 65

Slide 66

Slide 66 text

96 67 ch7: Verbal Representation of Object Collision Prediction Based on Physical CommonSense Knowledge ෺ཧతৗࣝΛ;·͑ͨ ෺ମিಥ࣌ͷ༧ଌʹ͓͚Δݴޠදݱ

Slide 67

Slide 67 text

96 ͜͜·Ͱͷݚڀ • ը૾಺ͷ෺ମΛଊ͑ɺ ෺ମ͝ͱͷ෺ཧಛੑΛଊ͑ͨ • ෺ମಉ࢜ͷিಥͷλΠϛϯάΛ ༧ଌ͢ΔϞσϧͷߏங ՝୊఺ • িಥʹ͍ͭͯΛݴޠͱͯ͠ൃ࿩͠ɺ ଞऀʹ఻͑Δ͜ͱ͕Ͱ͖ͳ͍ • ࣮ੈքͷΑ͏ʹෳࡶͳ؀ڥͷಛ௃Λ ౿·͑ͨཧղ͕Ͱ͖͍ͯͳ͍ • LLMͰͷੑೳͷݕূ എܠɾ໨త 68

Slide 68

Slide 68 text

96 ෺ཧಛੑΛදͨ͠ άϥϑߏ଄ͷ༧ଌ શମਤ 69 physical training data ݴޠੜ੒ Ϟσϧ • άϥϑຒΊࠐΈϕΫτϧ • ଎౓ • Ճ଎౓ • ෺ମؒͷҐஔؔ܎ ༧ଌը૾ ੜ੒จ ੺৭ͷԁப͕྘৭ͷٿʹͿ͔ͭΔ ෺ମͷ৭ ✔ɼܗ ✔ ༧ଌͨ͠ঢ়گΛද͢จ Input ෺ཧಛੑΛදͨ͠ άϥϑߏ଄ͷ༧ଌ ༧ଌͰ͖Δ΋ͷ มԽ఺ ༧ଌϞσϧ ϕʔεͱͳΔϞσϧ • PredNet [Lotter+,16] • PredRNN [Wang+, 17] • PredRNN v2 [Wang+, 21] • PreCNet [Straka+, 23] ෺ཧతৗࣝͷ৚݅ ʢྫʣ৚݅ 29 • ෺ମAͷॏ͞͸ܰ͘ɼ෺ମBͷॏ͞͸ॏ͍ • ෺ମAͷ଎͞͸଎͘ɼ෺ମBͷ଎͞͸஗͍ • চ͕βϥβϥ͍ͯ͠Δ ෺ཧతৗࣝΛؚΉ จষੜ੒Ϟσϧ ࠶ੜ੒ͨ͠จ ੺৭ͷԁப͕྘৭ͷٿʹ੎͍Α͘ িಥͯ͠ɺ྘৭ͷٿ͕ԕ͘ʹ஄͖ ඈ͹͞ΕΔɻ ৗࣝΛؚΉৄࡉͳจ ෺ମAʹ੺৭ͷԁப ෺ମBʹ྘৭ͷٿ ੜ੒จ ௥Ճ CLEVRER [Yi+, 19]

Slide 69

Slide 69 text

96 ࣮ݧ1-1 • มԽ఺༧ଌϞσϧ4छྨͰ༧ଌͨ͠িಥ಺༰Λ΋ͱʹݴޠੜ੒ ࣮ݧ1-2 • LLMΛ༻͍ͯিಥͷ༧ଌ͔Βݴޠੜ੒ • GPT-4o • GPT-4o-FT • GPT-4o mini • GPT-4 ࣮ݧ1ɿিಥ༧ଌͷݴޠੜ੒ 70

Slide 70

Slide 70 text

96 ໨త • ࣮ੈքͱݴޠΛ݁ͼ͚ͭͯཧղ͢ΔͨΊʹɺมԽ఺༧ଌϞσϧͰ༧ଌͨ͠಺༰Λੜ੒ ઃఆ • TransformerͷDecoder෦෼Λ࢖༻ ࣮ݧ1-1ɿ༧ଌ಺༰Λݴޠੜ੒ 71 ֶशઃఆ ϖΞσʔλ਺ 219,303 ʢ9จ * 24,367ճͷিಥʣ ςετσʔλ਺ 10,965 όοναΠζ 8 ӅΕ૚ 512 ଛࣦؔ਺ Adam [Kingma+, 17]

Slide 71

Slide 71 text

96 • 9छྨͷςϯϓϨʔτ • িಥͨ͠ 2ͭͷ෺ମ • ʮʨփ, ੺, ੨, ྘, ஡, ਫ, ࢵ, ԫʩ৭ͷ ʨٿ, ԁப, ཱํମʩʯ ࣮ݧ1-1ɿςϯϓϨʔτ࡞੒ 72 • ʮAͱB͕ۙͮ͘ʯ • ʮA͕Bʹۙͮ͘ʯ • ʮB͕Aʹۙͮ͘ʯ িಥલʢ5ϑϨʔϜલʣ • ʮAͱB͕Ϳ͔ͭΔʯ • ʮA͕Bʹ͸͔͡ΕΔʯ • ʮB͕Aʹ͸͔͡ΕΔʯ িಥ • ʮAͱB͕཭ΕΔʯ • ʮA͔ΒB͕཭ΕΔʯ • ʮB͔ΒA͕཭ΕΔʯ িಥޙʢ5ϑϨʔϜޙʣ ʮ੨৭ͷٿͱփ৭ͷٿ͕Ϳ͔ͭΔʯ ʮ੨৭ͷٿ͕փ৭ͷٿʹ͸͔͡ΕΔʯ ʮփ৭ͷٿ͕੨৭ͷٿʹ͸͔͡ΕΔʯ িಥ িಥલ ʮ੨৭ͷٿͱփ৭ͷٿ͕ۙͮ͘ʯ ʮ੨৭ͷٿ͕փ৭ͷٿʹۙͮ͘ʯ ʮփ৭ͷٿ͕੨৭ͷٿʹۙͮ͘ʯ িಥޙ ʮ੨৭ͷٿͱփ৭ͷٿ͕཭ΕΔʯ ʮ੨৭ͷٿ͔Βփ৭ͷٿ͕཭ΕΔʯ ʮփ৭ͷٿ͔Β੨৭ͷٿ͕཭ΕΔʯ จষςϯϓϨʔτྫɿিಥ͢Δ෺ମʢ੨৭ͷٿɾփ৭ͷٿʣ 5ϑϨʔϜ 5ϑϨʔϜ

Slide 72

Slide 72 text

96 ࣮ݧ1-1ɿݴޠੜ੒Ϟσϧ 73 ֶशࡁΈ DecoderϞσϧ ༧ଌ಺༰Λ ࣔͨ͠ੜ੒จ pred graph embedding input Decoder Softmax w1 w2 wt … w1 w2 wt … Transformer DecoderֶशϞσϧ text ϖΞσʔλ Linear graph embedding train: 219,303 ϖΞ test: 10,965 ݸ

Slide 73

Slide 73 text

96 ࣮ݧ1-1ɿ݁Ռ − ੜ੒ྫ1 74 ൣғ i ৭ ܗ ਖ਼ղจ • ʮ྘৭ͷٿͱ੺৭ͷԁப͕Ϳ͔ͭΔʯ “Green sphere and red cylinder collide.” • ʮ྘৭ͷٿ͕੺৭ͷԁபʹ͸͔͡ΕΔʯ “Green sphere is repulsed by red cylinder.” • ʮ੺৭ͷԁப͕྘৭ͷٿʹ͸͔͡ΕΔʯ “Red cylinder is repulsed by green sphere.” PredNet ϕʔε ʮ྘৭ͷԁப͕੺৭ͷԁபʹ͸͔͡ΕΔʯ “Green cylinder is repulsed by red cylinder.” ✔ ✘ PredRNN ϕʔε ʮ྘৭ͷԁபͱ੺৭ͷԁப͕Ϳ͔ͭΔʯ “Green cylinder and red cylinder collide.” ✔ ✘ PredRNN v2ϕʔε ʮ྘৭ͷٿ͕੺৭ͷԁபʹ͸͔͡ΕΔʯ “Red cylinder is repulsed by green sphere.” ✔ ✔ PreCNet ϕʔε ʮ྘৭ͷٿ͕੺৭ͷԁபʹ͸͔͡ΕΔʯ “Red cylinder is repulsed by green sphere.” ✔ ✔

Slide 74

Slide 74 text

96 ࣮ݧ1-1ɿ݁Ռ − ੜ੒ྫ2 75 ൣғ vi ৭ ܗ ਖ਼ղจ • ʮਫ৭ͷཱํମͱਫ৭ͷԁப͕Ϳ͔ͭΔʯ “Cyan cube and cyan cylinder collide.” • ʮਫ৭ͷཱํମ͕ਫ৭ͷԁபʹ͸͔͡ΕΔʯ “Cyan cube is repulsed by cyan cylinder. ” • ʮਫ৭ͷԁப͕ਫ৭ͷཱํମʹ͸͔͡ΕΔʯ “Cyan cylinder is repulsed by cyan cube. ” PredNet ϕʔε ਫ৭ͷཱํମ͕੨৭ͷٿʹͿ͔ͭΔ “Cyan cube is repulsed by blue sphere. ” ✘ ✘ PredRNN ϕʔε ਫ৭ͷཱํମ͕੨৭ͷٿʹͿ͔ͭΔ “Cyan cube is repulsed by blue sphere. ” ✘ ✘ PredRNN v2ϕʔε ਫ৭ͷཱํମ͕ਫ৭ͷٿʹͿ͔ͭΔ “Cyan cube is repulsed by cyan sphere. ” ✔ ✘ PreCNet ϕʔε ਫ৭ͷཱํମ͕ਫ৭ͷԁபʹͿ͔ͭΔ “Cyan cube is repulsed by cyan cylinder. ” ✔ ✔

Slide 75

Slide 75 text

96 ࣮ݧ1-1ɿ݁Ռ − ධՁࢦඪʹΑΔਫ਼౓ൺֱ 76 ϕʔεϞσϧ είΞ BLEU@2↑ BLEU@3↑ BLEU@4↑ METEOR↑ CIDEr↑ PredNet ϕʔε ӳ 80.3 63.0 56.3 68.8 72.9 ೔ 79.7 74.5 68.8 70.2 72.4 PredRNN ϕʔε ӳ 84.3 66.8 59.1 72.6 74.6 ೔ 82.5 76.1 73.4 73.5 75.1 PredRNN v2 ϕʔε ӳ 86.2 72.4 62.7 75.9 78.3 ೔ 85.9 78.9 75.7 77.6 78.2 PreCNet ϕʔε ӳ 90.6 77.1 67.9 78.1 80.3 ೔ 88.3 80.6 79.2 80.4 81.2

Slide 76

Slide 76 text

96 • GPTΛ༻͍ͯɺ2ͭͷ෺ମ͕িಥ͢Δͱ༧ଌͨ͠৔໘ΛݴޠͰੜ੒ ݕূઃఆ • 100িಥ • ը૾Λ༩͑ͨͱ͖ɺিಥೝࣝ→ݴޠੜ੒·ͰμΠϨΫτͰߦ͏ ૬ҧ఺ • ఏҊϞσϧɿিಥ༧ଌͱݴޠੜ੒ͷϞσϧ͕ผ • LLMɿશͯಉ͡ਪ࿦աఔ ࣮ݧ1-2ɿGPTʹΑΔ෺ମͷিಥ༧ଌͷݴޠੜ੒ 77

Slide 77

Slide 77 text

96 ࣮ݧ1-2ɿGPTʹΑΔ෺ମͷিಥ༧ଌͷݴޠੜ੒ 78 Model Prompt ੜ੒ྫ Accuracy ෺ମ͕͍͋ͬͯΔ͔ simple detail GPT-4o 㾎 ਫ৭ͷٿମͱ੺৭ͷԁப͕িಥ͢Δɻ 1.94 㾎 ਫ৭ͷٿମ͕੺৭ͷԁபʹͿ͔ͭΔ 1.95 GPT-4o-FT 㾎 ਫ৭ͷٿ͕ࠨ͔Β஡৭ͷٿʹিಥ͢Δɻ 2.10 㾎 ਫ৭ͷٿͱ஡৭ͷٿ͕Ϳ͔ͭΔɻ 2.06 GPT-4o mini 㾎 ྘৭ͷٿମ͕੺৭ͷԁபʹ޲͔ͬͯస͕Γিಥ͢Δɻ 1.63 㾎 ྘৭ͷٿମ͕੺৭ͷԁபʹ޲͔ͬͯస͕Γিಥ͢Δɻ 1.69 GPT-4 㾎 − 0 㾎 − 0 มԽ఺༧ଌϞσϧ ஡৭ͷٿ͕੺৭ͷԁபʹিಥ͢Δɻ 2.44 • ੺৭ͷԁபͱ஡৭ͷٿ͕Ϳ͔ͭΔ • ੺৭ͷԁப͕஡৭ͷٿʹ͸͔͡ΕΔ • ஡৭ͷٿ͕੺৭ͷԁபʹ͸͔͡ΕΔ ਖ਼ղจ

Slide 78

Slide 78 text

96 ࣮ݧ1-2ɿGPTʹΑΔ෺ମͷিಥ༧ଌͷݴޠੜ੒ 79 Model Prompt BLEU@2↑ BLEU@3↑ BLEU@4↑ METEOR CIDEr simple detail GPT-4o 㾎 39.3 32.6 26.4 30.2 36.7 㾎 44.5 36.9 28.5 30.1 38.6 GPT-4o-FT 㾎 27.2 20.4 14.3 29.5 30.8 㾎 38.7 29.2 27.1 31.4 31.3 GPT-4o mini 㾎 36.1 28.7 21.3 30.1 33.3 㾎 36.2 28.7 22.7 31.1 32.6 GPT-4 㾎 − − − − − 㾎 − − − − − มԽ఺༧ଌϞσϧ 88.3 80.6 79.2 80.4 81.2

Slide 79

Slide 79 text

96 • ෺ཧతಛੑͷ༧ଌຒΊࠐΈදݱ͔Βɺݴޠੜ੒ϞσϧΛհͯ͠িಥঢ়گΛੜ੒ ࣮ݧ1-1ɿมԽ఺༧ଌϞσϧ • ॏͳΓ߹ͬͨ෺ମͷਫ਼౓ͷӨڹ • ෺ମೝࣝͷࣦഊ͕ݴޠੜ੒΁Өڹ͕ݟΒΕΔ • ج൫ͱͳΔ༧ଌϞσϧͷҧ͍ • ༧ଌϞσϧͷਫ਼౓͕ݴޠੜ੒ͷਫ਼౓ʹ΋Өڹ • ςϯϓϨʔτΛֶश͍ͤͯ͞ΔͨΊɺදݱͷ෯͸ͤ·͍ ࣮ݧ1-2ɿGPT • จষΛੜ੒͢Δ ͱ͍͏؍఺Ͱ͸ͲͷϞσϧ΋ཧղͰ͖Δจষ͕ੜ੒Ͱ͖ͨ • ͔͠͠ɺ෺ମͷछྨ͕ؒҧ͍ͬͯΔ͜ͱͰɺධՁࢦඪͷείΞ͕௿͘ͳͬͨ ࣮ݧ1ɿ·ͱΊɾߟ࡯ 80

Slide 80

Slide 80 text

96 ࣮ݧ2-1 • มԽ఺༧ଌϞσϧ4छྨͰ༧ଌͨ͠িಥ಺༰ͷจষʹରͯ͠ɺ؀ڥͷ෺ཧৗࣝ Λ༩͑ͨͱ͖ʹΑΓৄࡉͳઆ໌จͷੜ੒ ࣮ݧ2-2 • LLMΛ༻͍ͨিಥͷ༧ଌ͔Βৄࡉͳઆ໌จͷੜ੒ • GPT-4o • GPT-4o-FT • GPT-4o mini • GPT-4 ࣮ݧ2ɿ؀ڥͷ෺ཧৗࣝΛՃ͑ͨͱ͖ͷݴޠੜ੒ 81

Slide 81

Slide 81 text

96 ࣮ݧ2-1ɿ؀ڥͷ෺ཧৗࣝΛՃ͑ͨͱ͖ͷݴޠੜ੒ 82 ؀ڥɾ෺ମͷঢ়ଶ 1. চ͕πϧπϧ͍ͯ͠Δ ෺ମA͕ॏ͘ɺ෺ମB͸͍ܰ 2. চ͕βϥβϥ͍ͯ͠Δ ෺ମA͕ॏ͘ɺ෺ମB͸͍ܰ … 45. ෺ମA͸ܰ͘ɺ෺ମB͸ॏ͍ ෺ཧৗࣝͷϦετҰཡ ৗࣝ1〜45 ͔Βબ୒ ৗࣝɿ19 • চ͕πϧπϧ͍ͯ͠Δ • ෺ମAͷॏ͕͞େ͖͘ɺ ෺ମBͷॏ͞͸খ͍͞ • ෺ମA͸஗͘ɺ෺ମB͸଎͍ T5 Ϋϥ΢υιʔγϯά ྘৭ͷٿ͸੎͍Α͘੺৭ͷԁபʹিಥͯ͠ɺ ੺৭ͷԁப͸ԕ͘ʹ஄͖ඈ͹͞ΕΔɻ

Slide 82

Slide 82 text

96 • 2ͭͷ෺ମͷিಥ༧ଌ৔໘ʹ͍ͭͯɺ؀ڥ΍෺ମʹؔ͢Δ෺ཧৗࣝΛ༩͑ͨͱ͖ɺ িಥঢ়گΛΑΓৄࡉʹݴޠͰੜ੒͢Δ ࣮ݧ2-1ɿ؀ڥͷ෺ཧৗࣝ 83 ෺ମͷঢ়ଶʢ࣭ྔʣ ෺ମͷঢ়ଶʢεϐʔυʣ 1. ෺ମAͱ෺ମBͷ࣭ྔ͕౳͍͠ 2. ෺ମAͷ࣭ྔ͕େ͖͍ ෺ମBͷ࣭ྔ͕খ͍͞ 3. ෺ମAͷ࣭ྔ͕খ͍͞ ෺ମBͷ࣭ྔ͕େ͖͍ 4. ͳ͠ 1. ෺ମAͱ෺ମBͷεϐʔυ͕౳͍͠ 2. ෺ମAͷεϐʔυ͕଎͍ ෺ମBͷεϐʔυ͕஗͍ 3. ෺ମAͷεϐʔυ͕஗͍ ෺ମBͷεϐʔυ͕଎͍ 4. ͳ͠ 4ύλʔϯ 4ύλʔϯ ྆ํͳ͠Λআ֎ → 15ύλʔϯ ؀ڥ 1. চ͕πϧπϧ͍ͯ͠Δ 2. চ͕βϥβϥ͍ͯ͠Δ 3. ͳ͠

Slide 83

Slide 83 text

96 • T5 (Text-To-Text Transfer Transformer) • TransformerϕʔεͷϞσϧ • ຋༁ɺ࣭໰Ԡ౴ɺ෼ྨɺཁ໿ͳͲ༷ʑͳλεΫͰ׆༻ • ೖྗ͞ΕͨςΩετʹରͯ͠ɺશͯͷλεΫͰςΩετͰग़ྗ • ࢖༻ͨ͠ࣄલֶशϞσϧT5 • sonoisa/t5-base-japanese • megagonlabs/t5-base-japanese-web • nlp-waseda/comet-t5-base-Japanese • ֶश • ೖྗɿিಥʹؔ͢Δจষɺ৚݅ • ग़ྗɿΫϥ΢υιʔγϯάͰऩूͨ͠5จ͔ΒϥϯμϜͰબΜͩ1จ ࣮ݧ2-1ɿT5Ϟσϧ 84

Slide 84

Slide 84 text

96 ࣮ݧ2-1ɿT5ͷֶशઃఆ 85 ઃఆ Data (6:1:1) 2,000 Learning rate 5×10%4 batch size 32 Epoch 100 optimization AdamW [Loshchilov+, 17] loss function cross entropy

Slide 85

Slide 85 text

96 ࣮ݧ2-1ɿT5Ͱͷ݁Ռ 86 Model Epoch BLEU↑ ROUGE-2↑ ROUGE-L↑ sonoisa/ t5-base-japanese 81 95.2 64.2 74.6 megagonlabs/ t5-base-japanese-web 93 81.6 56.6 67.7 nlp-waseda/ comet-t5-base-japanese 98 80.9 56.2 67.4

Slide 86

Slide 86 text

96 ࣮ݧ2-1ɿ݁Ռ − ੜ੒ྫ1 87 มԽ఺༧ଌϞσϧʹ͓͚Δੜ੒จɿ੺৭ͷԁப͕྘৭ͷٿʹ͸͔͡ΕΔ ෺ମAɿ੺৭ͷԁப ෺ମBɿ྘৭ͷٿ চ ࣭ྔ ଎͞ ෺ཧৗࣝΛ;·͑ͨੜ੒จ ਓखʹΑΔਖ਼ղྫ πϧπϧ A = B A = B ੺৭ͷԁபͱ྘৭ͷٿ͸িಥͯ͠ɺ྆ํͱ΋ ൓ରͷํ޲΁஄͖ඈ͹͞ΕΔɻ ੺৭ͷԁப͕ಉ͡଎౓ͷ྘৭ͷٿʹͿ͔ͭͬ ͯɺ྘৭ͷٿ͸ԕ͘ʹ௓Ͷฦ͞ΕΔɻ πϧπϧ − A < B ྘৭ͷٿ͸੎͍Α͘੺৭ͷԁபʹিಥͯ͠ɺ ੺৭ͷԁப͸ԕ͘ʹ஄͖ඈ͹͞ΕΔɻ ੺৭ͷԁபͱ྘৭ͷٿ͕Ϳ͔ͭΓɺ੺৭ͷԁ ப͕গ͠஄͔Εɺ྘৭ͷٿ͕গ͠஄͔ΕΔɻ βϥβϥ A < B A > B ੺৭ͷԁப͔͕྘৭ͷٿʹ੎͍Α͘িಥͯ͠ɺ ྘৭ͷٿ͕ԕ͘ʹ஄͖ඈ͹͞ΕΔɻ ੺৭ͷԁப͸੎͍Α͘྘৭ͷٿʹͿ͔ͭͬͯɺ ੺৭ͷԁப͸΄Μͷগ͠௓Ͷฦ͞ΕΔɻ − A > B A < B ྘৭ͷٿ͸੎͍Α͘੺৭ͷԁபʹিಥͯ͠ɺ ྘৭ͷٿ͸஄͖ඈ͹͞ΕΔɻ ੺৭ͷԁப΁྘৭ͷٿ͕Ϳ͔ͭΓɺ྘৭ͷٿ ͕஄͔ΕΔɻ

Slide 87

Slide 87 text

96 ࣮ݧ2-1ɿ݁Ռ − ੜ੒ྫ2 88 มԽ఺༧ଌϞσϧʹ͓͚Δੜ੒จɿਫ৭ͷཱํମ͕ਫ৭ͷԁபʹ͸͔͡ΕΔ ෺ମAɿਫ৭ͷཱํମ ෺ମBɿਫ৭ͷԁப চ ࣭ྔ ଎͞ ෺ཧৗࣝΛ;·͑ͨੜ੒จ ਓखʹΑΔਖ਼ղྫ − A = B − ਫ৭ͷཱํମͱਫ৭ͷԁப͸িಥͯ͠ɺ྆ํ ͱ΋൓ରͷํ޲΁஄͖ඈ͹͞ΕΔɻ ਫ৭ͷԁப͸ਫ৭ͷཱํମʹিಥͯ͠ɺਫ৭ ͷཱํମ͸஄͖ඈ͹͞ΕΔɻ πϧπϧ A > B A < B ਫ৭ͷཱํମ͕੎͍Α͘ਫ৭ͷԁபʹিಥ͠ ͯɺਫ৭ͷཱํମ͕ԕ͘ʹ஄͖ඈ͹͞ΕΔɻ ਫ৭ͷཱํମ͸ਫ৭ͷԁபʹܹ͘͠িಥ͞Ε ͯɺਫ৭ͷཱํମ͸͚ͩ͢͜͠͸͔͡ΕΔɻ βϥβϥ − A = B ਫ৭ͷཱํମͱਫ৭ͷԁப͸িಥͯ͠ɺ྆ํ ͱ΋൓ର ͷํ޲΁஄͖ඈ͹͞ΕΔɻ ਫ৭ͷԁபͱਫ৭ͷཱํମ͕িಥͯ͠ਫ৭ͷ ԁபͱਫ৭ͷཱํମ͕ಉ͡ఔ౓௓ͶฦΔɻ − A = B A > B ਫ৭ͷཱํମ͸੎͍Α͘ਫ৭ͷԁபʹিಥ͠ ͯɺਫ৭ͷԁப͸ԕ͘ʹ஄͖ඈ͹͞ΕΔɻ ਫ৭ͷཱํମ͸Ώͬ͘Γͱਫ৭ͷԁப͸ૣ͍ ଎͞Ͱ൓ରͷํ޲ʹඈ͹͞ΕΔɻ

Slide 88

Slide 88 text

96 ࣮ݧ2-1ɿ݁Ռ − ධՁࢦඪʹΑΔਫ਼౓ൺֱ 89 ϕʔεͱͳΔ Ϟσϧ BLEU@4 BERTScore BLEURT ROUGE ؚҙ શจ ؚҙ ෼ׂ G-EVAL- 4o PredNet ϕʔε 37.3 68.2 30.3 34.4 43.2 64.5 80.3 PredRNN ϕʔε 43.6 74.7 36.1 41.8 48.7 69.7 83.1 PredRNN v2 ϕʔε 46.5 79.5 45.6 49.6 56.1 75.3 88.5 PreCNet ϕʔε 55.8 82.2 49.7 56.4 67.9 80.2 92.4

Slide 89

Slide 89 text

96 ࣮ݧ2-2ɿGPTΛ༻͍ͨ෺ཧৗࣝΛؚΉিಥঢ়گͷઆ໌ 90 Model ੜ੒ྫ ਖ਼ղྫ • ஡৭ͷٿͱ੺৭ͷԁப͸൓ରํ޲ʹେ͖͘ඈ͹͞ΕΔɻ • ͓ޓ͍ʹͿ͔ͭΓɺ੺৭ͷԁப͸গ͠ζϨͯ஡৭ͷۄ͸গ͠స͕͍ͬͯ͘ɻ GPT-4o ਫ৭ͷٿମ͕଎͘੺৭ͷԁபʹিಥ͠ɺਫ৭ͷٿମ͸βϥβϥͨ͠চ্Ͱେ͖͘͸ ͔͡Εɺ੺৭ͷԁப͸Θ͔ͣʹಈ͘ɻ GPT-4o-FT ਫ৭ͷٿ͕஡৭ͷٿʹͿ͔ͭΓɺਫ৭ͷٿ͕஡৭ͷٿʹ͸͔͡ΕΔɻ GPT-4o mini ྘৭ͷٿମ͕βϥβϥͨ͠চΛ׈Γͳ͕Β੺৭ͷԁபʹ޲͔ͬͯస͕Γɺিಥͷॠ ؒʹٿମͷ଎౓͸ٸܹʹݮগ͠ɺԁப͸ͦͷॏ͞ʹΑΓΘ͔ͣʹಈ͖͕஗͘ͳΔ͕ɺ ٿମͷ଎͕͞༏ҐͰ͋ΔͨΊɺٿମ͸ԁபΛԡ͠ग़͢Α͏ʹస͕Γଓ͚ɺԁப͸લ ํʹগ͠స͕Δ͔ɺΘ͔ͣʹճస͠ͳ͕ΒਐΉɻ GPT-4 − มԽ఺༧ଌϞσϧ ஡৭ͷٿ͕੺৭ͷԁபʹ੎͍Α͘িಥͯ͠ɺ ஡৭ͷٿ͕ԕ͘ʹ஄͖ඈ͹͞ΕΔɻ • চ͕βϥβϥ • ஡৭ͷٿͷํ͕੺৭ͷԁபΑΓ͍ܰ • ஡৭ͷํ͕੺৭ͷԁபΑΓ଎͍ ෺ཧ ৗࣝ

Slide 90

Slide 90 text

96 ࣮ݧ2-2ɿGPTΛ༻͍ͨ෺ཧৗࣝΛؚΉিಥঢ়گͷઆ໌ 91 Model BLEU@4↑ BERTScore↑ BLEURT↑ ROUGE↑ 含意 全⽂↑ 含意 分割↑ G-EVAL↑ GPT-4o 31.3 54.5 32.3 32.3 48.5 55.6 65.4 GPT-4o-FT 25.6 59.3 27.8 28.7 52.3 61.4 66.7 GPT-4o mini 16.7 49.2 19.6 13.9 48.9 63.8 63.3 GPT-4 − − − − − − − มԽ఺༧ଌϞσϧ 55.8 82.2 49.7 56.4 67.9 80.2 92.4

Slide 91

Slide 91 text

96 • 2ͭͷ෺ମͷিಥ༧ଌ৔໘ʹ͍ͭͯɺ؀ڥ΍෺ମʹؔ͢Δ෺ཧৗࣝΛ༩͑ͨͱ͖ɺিಥঢ়گΛ ΑΓৄࡉʹݴޠͰੜ੒ ࣮ݧ2-1ɿݴޠੜ੒ྫ • ܾ·ͬͨܕʢͱͯ΋ɺ੎͍Α͘ɺԕ͘ʹɺΘ͔ͣʹʣͳͲΛֶश͍ͯ͠ΔͨΊɺ දݱͷ෯͸ͤ·͍ • ਓखʹΑΔจষ͸ఔ౓දݱͷछྨ͕ଟ͍ ධՁࢦඪʹΑΔਫ਼౓ൺֱ • ୯ޠҰக౓ͰͷධՁɿ௿ • ಺༰ͷؚҙͰͷධՁɿߴ ࣮ݧ2-2ɿGPTΛ༻͍࣮ͨݧ • จষΛੜ੒͢Δ͜ͱ ͸ಘҙ • ධՁࢦඪͷείΞɿ໿65.5 • ෺ମͷೝࣝ΍༧ଌ͕͏·͘Ͱ͖͍ͯͳ͍͜ͱ͕είΞͷࠩʢ໿20ʣͱͯ͠දΕͨ ࣮ݧ2ɿ·ͱΊɾߟ࡯ 92

Slide 92

Slide 92 text

96 93 ݁࿦ ch8: Conclusions

Slide 93

Slide 93 text

96 ఏҊݚڀͷ·ͱΊ 1. ೚ҙͷ࣌ؒ෯Ͱͷ༧ଌ 1. PredNetͱTD-AEΛ૊Έ߹Θͤɺώτ೴ͷ֊૚ߏ଄Λ΋ͬͨ೚ҙͷ࣌ؒ෯Ͱ༧ଌՄೳͳϞσϧ ը૾಺ͷ෺ମͷಈ͖ͷཧղ 1. ը૾಺ͷ؀ڥʹ͓͚Δ෺ମͷಈ͖Λଊ͑ɺ෺ମͷ෺ཧಛੑͷมԽ͔Βɺ෺ମಉ͕࢜িಥ͢Δ λΠϛϯάΛଊ͑ΒΕΔ࢓૊ΈͷఏҊ ը૾಺ͷ෺ମͷিಥͷλΠϛϯάͷ༧ଌ 1. ը૾಺ͷ؀ڥͷ෺ମಉ͕࢜িಥ͢ΔλΠϛϯάΛɺ෺ମͷಈ͖΍ࢹ֮৘ใͷมԽ͔Β ༧ଌͰ͖ΔϞσϧͷߏங িಥঢ়گʹؔ͢Δݴޠੜ੒ 1. ؀ڥ಺ͷ෺ମͷ༧ଌͨ͠িಥঢ়گΛݴޠͰੜ੒͠ɺ࣮ੈքͱݴޠΛ݁ͼ͚ͭͨ 2. ؀ڥͷ৚݅Λ௥Ճͨ͠ͱ͖ɺ؀ڥͷಛੑΛ౿·͑ͨিಥΛઆ໌Ͱ͖ΔݴޠϞσϧʹΑͬͯɺিಥ ঢ়گΛΑΓৄࡉʹઆ໌Ͱ͖ΔΑ͏ʹͳͬͨ 94

Slide 94

Slide 94 text

96 ߩݙ • ৽ͨͳώτ೴಺ͷ࡞ۀϞσϧ • ώτ೴಺ͷ֊૚ߏ଄Λදݱͨ͠༧ଌϞσϧͱɺ೚ҙͷ࣌ؒ෯Ͱ༧ଌ͕Ͱ͖Δਂ૚ੜ੒ֶश ϞσϧΛ૊Έ߹Θͤͨ • গ͠ઌͷ༧ଌʢ໿1ඵʣ͕Ͱ͖ΔΑ͏ʹͳͬͨ • ώτͷΑ͏ʹࢹ͔֮Βଊ͑ͨ৘ใΛ΋ͱʹɺ෺ମͷಈ͖Λ༧ଌͰ͖ΔϞσϧ • ը૾ಛ௃ྔ΍෺ཧγϛϡϨʔλʔͱ͍ܾͬͨΊΒΕͨ਺஋Ͱ͸ͳ͘ɺ ༩͑ΒΕͨը૾ʢώτͷࢹ֮৘ใʹ૬౰ʣʹ͋Δ෺ମͷಈ͖Λଊ͑ɺকདྷͷিಥΛ ༧ଌͰ͖ΔΑ͏ʹͳͬͨ • ؀ڥ৘ใΛߟྀͨ͠ɺিಥঢ়گΛΑΓৄࡉʹઆ໌Ͱ͖Δ࢓૊Έ • ༧ଌͨ͠িಥ಺༰ΛଞऀʹݴޠΛհͯ͠ڞ༗Ͱ͖ΔΑ͏ʹͳͬͨ 95

Slide 95

Slide 95 text

96 ࠓޙͷ՝୊ • ௕ظ༧ଌͷՄೳੑ • ݚڀ1ɿ1ඵఔ౓ • ώτͷ༧ଌɿ࣍ͷ೔ɺܭըͳͲ • ࣮ੈքʹ͍ۙσʔληοτͷར༻ • CLEVRERɿ୯७ͳσʔλ • ंࡌΧϝϥͰͷσʔλɺώτ͕෦԰ͷதΛಈ͍͍ͯΔσʔλ • େن໛ݴޠϞσϧࣗମͷ෺ཧಛੑͷཧղ • LLMΛγεςϜʹ૊ΈࠐΜ࣮ͩੈքཧղͷදݱ 96