チュートリアル：世界モデル

νϡʔτϦΞϧɿੈքϞσϧ ౻٢߂࿱ʢத෦େֶɾػց஌֮ϩϘςΟΫεݚڀάϧʔϓʣ IUUQNQSHKQ percept action Actor World Model Intrinsic cost
Perception Short-term memory configurator Critic Cost Figure 2: A system architecture for autonomous intelligence. All modules in this model are as- sumed to be “di↵erentiable”, in that a module feeding into another one (through an arrow connecting them) can get gradient estimates of the cost’s scalar output with respect to its own output. The conﬁgurator module takes inputs (not represented for clarity) from all other modules and conﬁgures them to perform the task at hand. The perception module estimates the current state of the world. The world model module predicts possible future world states as a function of imagined actions sequences proposed by the actor. The cost module computes a single scalar output called “energy” that measures the level of dis- comfort of the agent. It is composed of two sub-modules, the intrinsic cost, which is immutable (not trainable) and computes the immediate energy of the current state (pain, pleasure, hunger, etc), and "TZTUFNBSDIJUFDUVSFGPSBVUPOPNPVTJOUFMMJHFODF -F$VO "1BUI5PXBSET"VUPOPNPVT.BDIJOF*OUFMMJHFODF

w ؀ڥͷৼΔ෣͍Λֶश͠ɺະདྷͷঢ়ଶΛ༧ଌͰ͖Δ"*ͷ಺෦Ϟσϧ w ੈքϞσϧ͕͋ΔͱԿ͕Ͱ͖Δʁ ະདྷ༧ଌɿߦಈΛى͜͢લʹ݁ՌΛ༧ଌͰ͖Δ Ծ૝ମݧͰֶशɿ࣮؀ڥΛ࢖ΘͣϞσϧ಺ͰߦಈΛֶश গͳ͍σʔλͰ൚༻Խɿ؀ڥͷߏ଄Λཧղ͢ΔͨΊ৽͍͠ঢ়گʹ΋ରԠ͠΍͍͢
ੈքϞσϧʢ8PSME.PEFMTʣͱ͸ʁ ྫɿਓ͕ؒϘʔϧΛΩϟον͢Δͱ͖ ϘʔϧͷيಓΛ೴಺ͰʮγϛϡϨʔγϣϯʯͯ͠ɺखΛ ग़͢৔ॴΛܾΊ͍ͯΔ ˠਓؒ͸಄ͷதʹʮ෺ཧͷ஌ࣝʯΛ͍࣋ͬͯΔ ੈքϞσϧΛ࣋ͭ AI ΋ಉ͡࢓૊Έ աڈͷܦݧ͔Β؀ڥΛֶश͠ɺʮ࣍ʹԿ͕ى͜Δ͔ʯΛ ༧ଌͯ͠ɺ࠷దͳߦಈΛܾఆ͢Δ ˠχϡʔϥϧωοτ͕ʮੈքͷ๏ଇʯΛ಺෦ʹ֫ಘ͢Δ

w طଘͷ"*ͷ՝୊Λղܾ͠ɺΑΓਓؒʹ͍ۙ஌ೳΛ࣮ݱ͢Δ伴ͱͯ͠஫໨ ͳͥࠓɺੈքϞσϧͳͷ͔ ՝୊ طଘͷ"*๊͕͑Δน ⁞ڧԽֶशɿ๲େͳࢼߦࡨޡ͕ඞཁ ࣮؀ڥͰͷࢼߦ͸ةݥɾߴίετ αϯϓϧޮ཰͕ۃΊͯ௿͍  ໛฿ֶशɿະ஌ͷঢ়گʹऑ͍
ڭࢣσʔλͷ෼෍֎Ͱࣦഊ ؀ڥͷҼՌΛཧղ͍ͯ͠ͳ͍ ⁠--.ɿ਎ମੑΛ൐͏λεΫ͸ࠔ೉ ݴޠʹΑΔදݱʹด͍ͯ͡Δ ਎ମੑɾߦಈΛ൐͏λεΫ͸ࠔ೉ ղܾ ੈքϞσϧ͕੾Γ୓͘΋ͷ 㾎૝૾ͷதͰֶशͰ͖Δ Ϟσϧ಺ͰԾ૝ମݧΛੜ੒ ࣮؀ڥࢼߦΛେ෯࡟ݮ 㾎؀ڥͷߏ଄Λཧղ͢Δ ҼՌؔ܎Λ಺෦දݱͱͯ֫͠ಘ ະ஌γφϦΦ΁ͷ൚Խੑ͕޲্ 㾎਎ମੑͷ͋Δ஌ೳ΁ ෺ཧੈքͰͷ༧ଌɾߦಈ͕Մೳ ࣗಈӡసɾϩϘςΟΫε΁Ԡ༻

w ೥ͷ8PSME.PEFMT࿦จ͔ΒɺԠ༻ྖҬ΁ͱ޿͕ΔൃలͷྲྀΕ ੈքϞσϧͷݚڀܥේ 8PSME.PEFMT )B4DINJEIVCFS ເͷதͰֶश͢Δ 7.$ߏ଄ΛఏҊ
1MB/FU )BGOFSFUBM 344.ΛఏҊ જࡏۭؒͰϓϥϯχϯά %SFBNFS )BGOFSFUBM જࡏۭؒͰϙϦγʔֶश 7 7΁ൃల .*-& )VFUBM 8BZWF ࣗಈӡసʹੈքϞσϧ Λॳಋೖ %SJWF%SFBNFS 8BOHFUBM ӡసಈըΛੜ੒͢Δ ੜ੒ܕੈքϞσϧ ～ *7+&1" "TTSBOFUBM .FUB જࡏۭؒͰ༧ଌ͢Δ ඇੜ੒ܕΞϓϩʔν -F8PSME.PEFM .BFTFUBM ܰྔͰ҆ఆֶशՄೳͳ +&1"ܕੈքϞσϧ 1. جૅݚڀϑΣʔζ 2. ࣗಈӡస΁ͷԠ༻ 3. JEPAܥ࠷৽ݚڀ ݚڀͷτϨϯυ ᶃ؍ଌۭؒˠજࡏۭؒ ѹॖදݱͰͷޮ཰తͳֶश ᶄήʔϜˠݱ࣮ੈք΁ ࣗಈӡసɾϩϘςΟΫε΁Ԡ༻ ᶅੜ੒ܕˠ༧ଌܕ΁ ܰྔɾ҆ఆͳ+&1"ܥͷొ৔

w ੈքϞσϧͷ֓೦ΛॳΊͯਂ૚ֶशʹ࣋ͪࠐΜͩ࿦จ w ఏҊɿ7r.r$ͷϞδϡʔϧߏ੒ 7 7JTJPO ɿ7"&Ͱը૾Λ௿࣍ݩͷજࡏϕΫτϧ ʹѹॖ
. .FNPSZ ɿ.%/3//Ͱ࣍ͷ[ͷ֬཰෼෍Λ༧ଌ $ $POUSPMMFS ɿ୯૚ઢܗϞσϧͰߦಈΛग़ྗ ˠ7.͸ڭࢣͳ͠ɺ$ͷΈਐԽઓུʢ$."&4ʣͰ࠷దԽ w ੒Ռ ᶃ$BS3BDJOHW ౰࣌ͷ4P5"ੑೳΛୡ੒ ϐΫηϧ͔Β௚઀ϋϯυϧૢ࡞Λֶश ᶄ7J[%PPN5BLF$PWFS ࣮؀ڥΛҰ੾࢖ΘͣɺੈքϞσϧ಺ʢʹເʣͷγϛϡϨʔγϣϯ͚ͩͰֶश ˠ࣮؀ڥʹసҠͯ͠΋ߴੑೳ z 8PSME.PEFMT<)B4DINJEIVCFS /FVS*14> Large RNNs are highly expressive models that can learn rich spatial and temporal representations of data. However, many model-free RL methods in the literature often only use small neural networks with few parameters. The RL algorithm is often bottlenecked by the credit assignment problem, which makes it hard for traditional RL algorithms to learn millions of weights of a large model, hence in practice, smaller networks are used as they iterate faster to a good policy during training. Ideally, we would like to be able to efficiently train large RNN-based agents. The backpropagation algorithm (Lin- nainmaa, 1970; Kelley, 1960; Werbos, 1982) can be used to train large neural networks efficiently. In this work we look at training a large neural network1 to tackle RL tasks, by dividing the agent into a large world model and a small controller model. We first train a large neural network to learn a model of the agent’s world in an unsupervised manner, and then train the smaller controller model to learn to perform a task using this world model. A small controller lets the training algorithm focus on the credit assignment problem on a small search space, while not sacrificing capacity and expressiveness via the larger world model. By training the agent through the lens of its world model, we show that it 1Typical model-free RL models have in the order of 103 to 106 model parameters. We look at training models in the order of 107 parameters, which is still rather small compared to state-of- nent that compresses what it sees into a small representative code. It also has a memory component that makes predictions about future codes based on historical information. Finally, our agent has a decision-making component that de- cides what actions to take based only on the representations created by its vision and memory components. Figure 4. Our agent consists of three components that work closely together: Vision (V), Memory (M), and Controller (C) 2.1. VAE (V) Model The environment provides our agent with a high dimensional input observation at each time step. This input is usually 7JTJPO 7 .FNPSZ . BOE$POUSPMMFS $

ຊνϡʔτϦΞϧͷߏ੒ ੈքϞσϧͱ͸ ੈքϞσϧͷجຊ֓೦ͱ ୅දతͳΞʔΩςΫνϟ 8PSME.PEFMT1MB/FU 344.%SFBNFS ࣗಈӡసʹ͓͚Δ
ੈքϞσϧ ӡసγʔϯ΁ͷੈքϞσϧͷ ద༻ͱಈըੜ੒ܕϞσϧ .*-& %SJWF%SFBNFS %SJWF8PSME7-" -F8PSME.PEFM ܰྔ͔ͭ҆ఆͨ͠ +&1"ܕੈքϞσϧͷ࠷৽ݚڀ ैདྷख๏ͱ՝୊ ϞσϧΞʔΩςΫνϟ ଛࣦܭࢉධՁ࣮ݧ

w 8PSME.PEFMT<)B4DINJEIVCFS /FVS*14> ΤʔδΣϯτͷऔΓר͘؀ڥΛɼ؍ଌ͔ΒͷֶशʹΑͬͯϞ σϧͱͯ֫͠ಘ͢Δ࿮૊ΈΛఏҊ w ΞʔΩςΫνϟ 7JTJPO
ߴ࣍ݩͷը૾σʔλΛίϯύΫτͳ௿࣍ݩදݱʹม׵ .FNPSZ աڈɼݱࡏͷঢ়ଶΛݩʹະདྷͷঢ়ଶΛֶश $POUSPMMFS 7JTJPOͱ.FNPSZΛجʹ࠷దͳߦಈΛग़ྗ 8PSME.PEFMTͷΞʔΩςΫνϟ patial and temporal representations of data. However, model-free RL methods in the literature often only mall neural networks with few parameters. The RL ithm is often bottlenecked by the credit assignment em, which makes it hard for traditional RL algorithms arn millions of weights of a large model, hence in ce, smaller networks are used as they iterate faster to d policy during training. y, we would like to be able to efficiently train large -based agents. The backpropagation algorithm (Lin- maa, 1970; Kelley, 1960; Werbos, 1982) can be used to arge neural networks efficiently. In this work we look ining a large neural network1 to tackle RL tasks, by ng the agent into a large world model and a small con- r model. We first train a large neural network to learn a l of the agent’s world in an unsupervised manner, and rain the smaller controller model to learn to perform k using this world model. A small controller lets the ng algorithm focus on the credit assignment problem mall search space, while not sacrificing capacity and ssiveness via the larger world model. By training the through the lens of its world model, we show that it code. It also has a memory component that makes predictions about future codes based on historical information. Finally, our agent has a decision-making component that de- cides what actions to take based only on the representations created by its vision and memory components. Figure 4. Our agent consists of three components that work closely together: Vision (V), Memory (M), and Controller (C) 7JTJPO 7 .FNPSZ . BOE$POUSPMMFS $

w 7JTJPO ߴ࣍ݩͷը૾σʔλΛίϯύΫτͳ௿࣍ݩදݱʹม׵ w 7BSJBUJPOBM"VUPFODPEFS 7"& Λ࢖༻ ΤʔδΣϯτͷࢹ֮৘ใΛજࡏۭؒʹѹॖ
.FNPSZ෦ʹ͓͍ͯະདྷͷঢ়ଶΛ༧ଌ͢Δج൫ͱͯ͠࢖༻ w 7"&ͷར఺ ඪ४ͷ"VUPFODPEFS "& ͱҟͳΓϥϯμϜੑΛߟྀՄೳ ৽͍͠σʔλͷόϦΤʔγϣϯΛੜ੒Մೳ 8PSME.PEFMTͷΞʔΩςΫνϟɿ7JTJPO patial and temporal representations of data. However, model-free RL methods in the literature often only mall neural networks with few parameters. The RL ithm is often bottlenecked by the credit assignment em, which makes it hard for traditional RL algorithms arn millions of weights of a large model, hence in ce, smaller networks are used as they iterate faster to d policy during training. y, we would like to be able to efficiently train large -based agents. The backpropagation algorithm (Lin- maa, 1970; Kelley, 1960; Werbos, 1982) can be used to arge neural networks efficiently. In this work we look ining a large neural network1 to tackle RL tasks, by ng the agent into a large world model and a small con- r model. We first train a large neural network to learn a l of the agent’s world in an unsupervised manner, and rain the smaller controller model to learn to perform k using this world model. A small controller lets the ng algorithm focus on the credit assignment problem mall search space, while not sacrificing capacity and ssiveness via the larger world model. By training the through the lens of its world model, we show that it code. It also has a memory component that makes predictions about future codes based on historical information. Finally, our agent has a decision-making component that de- cides what actions to take based only on the representations created by its vision and memory components. Figure 4. Our agent consists of three components that work closely together: Vision (V), Memory (M), and Controller (C) 7JTJPO 7 .FNPSZ . BOE$POUSPMMFS $

w .FNPSZ աڈɼݱࡏͷঢ়ଶΛݩʹະདྷͷঢ়ଶΛֶश w 3//Λ࢖༻ જࡏۭؒΛར༻ͯ͠ະདྷͷঢ়ଶΛ༧ଌ w .%/
ࠞ߹ີ౓ωοτϫʔΫ Λಋೖ ʮෳ਺ͷ͋ΓಘΔະདྷʯΛ֬཰తʹ༧ଌՄೳ ࣍ͷજࡏঢ়ଶ͕ͲͷΑ͏ͳ෼෍ʹͳΔ͔ͷ֬཰Λग़ྗ 8PSME.PEFMTͷΞʔΩςΫνϟɿ.FNPSZ patial and temporal representations of data. However, model-free RL methods in the literature often only mall neural networks with few parameters. The RL ithm is often bottlenecked by the credit assignment em, which makes it hard for traditional RL algorithms arn millions of weights of a large model, hence in ce, smaller networks are used as they iterate faster to d policy during training. y, we would like to be able to efficiently train large -based agents. The backpropagation algorithm (Lin- maa, 1970; Kelley, 1960; Werbos, 1982) can be used to arge neural networks efficiently. In this work we look ining a large neural network1 to tackle RL tasks, by ng the agent into a large world model and a small con- r model. We first train a large neural network to learn a l of the agent’s world in an unsupervised manner, and rain the smaller controller model to learn to perform k using this world model. A small controller lets the ng algorithm focus on the credit assignment problem mall search space, while not sacrificing capacity and ssiveness via the larger world model. By training the through the lens of its world model, we show that it code. It also has a memory component that makes predictions about future codes based on historical information. Finally, our agent has a decision-making component that de- cides what actions to take based only on the representations created by its vision and memory components. Figure 4. Our agent consists of three components that work closely together: Vision (V), Memory (M), and Controller (C) 7JTJPO 7 .FNPSZ . BOE$POUSPMMFS $ 𝑃 ( 𝑧 𝑡 +1 𝑎 𝑡 , 𝑧 𝑡 , h 𝑡 )

w $POUSPMMFS 7JTJPOͱ.FNPSZΛجʹ࠷దͳߦಈΛग़ྗ w ୯७ͳઢܗϞσϧΛ࢖༻ 7JTJPOͱ.FNPSZͷ৘ใΛೖྗ
࠷దͳߦಈΛܾఆ 8PSME.PEFMTͷΞʔΩςΫνϟɿ$POUSPMMFS patial and temporal representations of data. However, model-free RL methods in the literature often only mall neural networks with few parameters. The RL ithm is often bottlenecked by the credit assignment em, which makes it hard for traditional RL algorithms arn millions of weights of a large model, hence in ce, smaller networks are used as they iterate faster to d policy during training. y, we would like to be able to efficiently train large -based agents. The backpropagation algorithm (Lin- maa, 1970; Kelley, 1960; Werbos, 1982) can be used to arge neural networks efficiently. In this work we look ining a large neural network1 to tackle RL tasks, by ng the agent into a large world model and a small con- r model. We first train a large neural network to learn a l of the agent’s world in an unsupervised manner, and rain the smaller controller model to learn to perform k using this world model. A small controller lets the ng algorithm focus on the credit assignment problem mall search space, while not sacrificing capacity and ssiveness via the larger world model. By training the through the lens of its world model, we show that it code. It also has a memory component that makes predictions about future codes based on historical information. Finally, our agent has a decision-making component that de- cides what actions to take based only on the representations created by its vision and memory components. Figure 4. Our agent consists of three components that work closely together: Vision (V), Memory (M), and Controller (C) 7JTJPO 7 .FNPSZ . BOE$POUSPMMFS $

8PSME.PEFMT͕Ͱ͖Δ͜ͱ w ͭͷϞδϡʔϧΛ׬શʹ෼཭ͯ͠ॱ൪ʹֶश 7 7JTJPO ɿ7"&Ͱը૾ˠજࡏϕΫτϧ ʹѹॖʢڭࢣͳ͠ʣ
. .FNPSZ ɿ.%/3//Ͱʮ࣍ͷ ͷ֬཰෼෍ʯΛ༧ଌʢڭࢣͳ͠ɺߦಈ৚݅෇͖ʣ $ $POUSPMMFS ɿ ͱ3//ӅΕঢ়ଶ ͔ΒߦಈΛग़͢୯૚ઢܗϞσϧ ʢΘ͔ͣ਺ඦύϥϝʔλΛਐԽઓུ$."&4Ͱ࠷దԽʣ $."&4 $PWBSJBODF.BUSJY"EBQUBUJPO&WPMVUJPO4USBUFHZʣɿڞ෼ࢄߦྻదԠਐԽઓུ w ʮເͷதʯͰ࿅शˠͦͷ··ݱ࣮ͰӡసͰ͖Δ ੈքϞσϧ͕ੜ੒͢ΔԾ૝؀ڥʢເʣͰ$POUSPMMFSΛֶश ೖྗ͸ ʢࠓͷܠ৭ʣ ʢهԱʣɺग़ྗ͸ߦಈ ˠ%PPNͰ͸ݱ࣮؀ڥΛεςοϓ΋࢖Θͣʹֶश͠ɺ ࣮؀ڥʹసҠͯ͠੒ޭ z z z h z h a Figure 6. RNN with a Mixture Density Network output layer. The MDN outputs the parameters of a mixture of Gaussian distribution used to sample a prediction of the next latent vector z. In our approach, we approximate p(z) as a mixture of Gaus- sian distribution, and train the RNN to output the probability distribution of the next latent vector zt+1 given the current and past information made available to it. More specifically, the RNN will model P(zt+1 | at, zt, ht), where at is the action taken at time t and ht is the hidden state of the RNN at time t. During sampling, we can adjust a temperature parameter ⌧ to control model uncertainty, as done in (Ha & Eck, 2017) – we will find adjusting ⌧ to be useful for training our controller later on. This approach is known as a Mixture Density Net- work (Bishop, 1994) combined with a RNN (MDN-RNN) (Graves, 2013; Ha, 2017a), and has been applied in the past for sequence generation problems such as generating handwriting (Graves, 2013) and sketches (Ha & Eck, 2017). at = Wc [zt ht] + bc (1) In this linear model, Wc and bc are the weight matrix and bias vector that maps the concatenated input vector [zt ht] to the output action vector at . 2.4. Putting V, M, and C Together The following flow diagram illustrates how V, M, and C interacts with the environment: Figure 8. Flow diagram of our Agent model. The raw observation is first processed by V at each time step t to produce zt . The input into C is this latent vector zt concatenated with M’s hidden state ht at each time step. C will then output an action vector at for motor control, and will affect the environment. M will then take the current zt and action at as an input to update its own hidden state to produce ht+1 to be used at time t + 1. ࣮ݧͰࣔͨ͠ޮՌʢ࿦จ5BCMFTrʣ $BS3BDJOHW %PPN5BLF$PWFSW ˞খ͞ͳ$POUSPMMFSʴ಺෦ϞσϧͰɼࢹ֮ೖྗ͔ΒํࡦֶशɾసҠΛ࣮ূ

w 8PSME.PEFMTͷʮ෼཭ֶशʯΛ&&ʢFOEUPFOEʣֶशʹਐԽ 8PSME.PEFMTɿ7ɾ.ɾ$Λಠཱͯ͠ॱ൪ʹֶश 1MB/FUɿը૾͔Βજࡏঢ়ଶɾະདྷ༧ଌɾใु༧ଌΛͭͷ໨తؔ਺Ͱ ·ͱΊֶͯश ˠϞδϡʔϧؒͷෆ੔߹͕ͳ͘ͳΓɺ༧ଌਫ਼౓͕޲্ w ΋͏ͭͷҧ͍ɿϙϦγʔΛ࣋ͨͣɺͦͷ৔Ͱϓϥϯχϯά
8PSME.PEFMTɿֶशࡁΈ$POUSPMMFS͕ߦಈΛଈग़ྗ 1MB/FUɿຖεςοϓɺજࡏۭؒͰະདྷΛγϛϡϨʔτ͠ߦಈྻΛ୳ࡧ w ΞʔΩςΫνϟʢ͢΂ͯજࡏۭؒͰಈ࡞ʣ જࡏμΠφϛΫεϞσϧɿߦಈΛ৚݅ʹະདྷͷঢ়ଶΛ༧ଌ ใुϞσϧɿ֤ঢ়ଶ͔ΒใुΛ༧ଌ ϓϥϯφʔɿ༧ଌใुΛ࠷େԽ͢ΔߦಈྻΛ୳ࡧʢ$&.ʣ %FFQ1MBOOJOH/FUXPSL 1MB/FU <)BGOFS *$.-> %FFQ1MBOOJOH/FUXPSL 344.

ະདྷ༧ଌϞσϧɿ344. 3FDVSSFOU4UBUF4QBDF.PEFM w 1MB/FUͷ৺ଁ෦ʹʮજࡏμΠφϛΫεϞσϧʯͷਖ਼ମ͕344. ʮߦಈΛ৚݅ʹະདྷͷঢ়ଶΛ༧ଌʯΛ୲͏த֩Ϟδϡʔϧ ը૾͔Βજࡏঢ়ଶΛ࡞ΓɺજࡏۭؒͰະདྷΛ༧ଌ͢Δ෦෼ͦͷ΋ͷ w
ैདྷख๏ͱ՝୊ ܾఆ࿦తϞσϧʢ3//ʣɿಉ͡ೖྗͳΒৗʹಉ͡ग़ྗˠະདྷͷ෼ذΛදݱෆՄ ֬཰తϞσϧʢ44.ʣɿਖ਼ن෼෍͔ΒαϯϓϦϯάͰ࣍ঢ়ଶܾఆˠ௕ظతͳهԱʹऑ఺ w 344.͸ܾఆ࿦త3//ͱ֬཰త44.Λ౷߹ͨ͠ঢ়ଶۭؒϞσϧ ܾఆ࿦తϝϞϦߋ৽ɿ(36Ͱաڈͷঢ়ଶɾߦಈΛهԱ ֬཰తঢ়ଶͷੜ੒ɿϝϞϦ͔Β֬཰෼෍Λग़ྗˍαϯϓϦϯά ؍ଌͱใुͷੜ੒ɿঢ়ଶ ͔Βը૾ͱใुΛ෮ݩ ˠܾఆ࿦తهԱ ʴ֬཰తঢ়ଶ Ͱ௕ظهԱͱෆ࣮֬ੑΛཱ྆ (ht , st ) (ht ) (st ) ht = f(ht−1 , st−1 , at−1 ) st ∼ p(st |ht ) 3FDVSSFOU4UBUF4QBDF.PEFM

%SFBNFS<)BGOFS *$-3> w 1MB/FUͱಉ͡344.Λ࢖͍ͭͭɺʮϓϥϯχϯάˠϙϦγʔֶशʯ΁ਐԽ ؀ڥμΠφϛΫε͸લεϥΠυͷ344.Ͱֶशʢڞ௨ͷ౔୆ʣ ҧ͍͸ߦಈͷܾΊ͔ͨʹ%SFBNFS͸ϙϦγʔΛ௚઀ֶश
૝૾্ͷي੻ʢJNBHJOBUJPOʣͷதͰڧԽֶश w 1MB/FUͷ՝୊ʢલʑεϥΠυͷ෮शʣ ߦಈΛͦͷ৔Ͱ࠷దԽʢϓϥϯχϯάϕʔεɺ$&.ʣ ຖεςοϓ࠷దԽ͕ඞཁˠਪ࿦͕஗͍ɺ௕ظܭը΋ۤख w %SFBNFSͷղܾ જࡏۭؒͰະདྷΛ༧ଌ͠ɺϙϦγʔΛֶशʢ"DUPS$SJUJDʣ $SJUJD͕૝૾ͷ஍ฏઢͷઌ·ͰՁ஋Λݟੵ΋Δˠ௕ظܭըʹڧ͍ ਪ࿦࣌͸ϙϦγʔͷΈˠϓϥϯχϯάෆཁͰߴ଎ complex behaviors. While learning world models from high-dimensional sensory inputs is becoming feasible through deep learning, there are many potential ways for deriving behaviors from them. We present Dreamer, a reinforcement learning agent that solves long-horizon tasks from images purely by latent imagination. We efficiently learn behaviors by propagating analytic gradients of learned state values back through trajectories imagined in the compact state space of a learned world model. On 20 challenging visual control tasks, Dreamer exceeds existing approaches in data-efficiency, computation time, and final performance. 1 INTRODUCTION Value and Action Learned by Latent Imagination Dataset of Experience Learned Latent Dynamics Figure 1: Dreamer Intelligent agents can achieve goals in complex environments even though they never encounter the exact same situation twice. This ability requires building representations of the world from past experience that enable generalization to novel situations. World models offer an explicit way to represent an agent’s knowledge about the world in a parametric model that can make predictions about the future. When the sensory inputs are high-dimensional images, latent dynamics models can abstract observations to predict forward in compact state spaces (Watter et al., 2015; Oh et al., 2017; Gregor et al., 2019). Compared to predictions in image space, latent states have a small memory footprint that enables imagining thousands of trajectories in parallel. Learning effective latent dynamics models is becoming feasible through advances in deep learning and latent variable models (Krishnan et al., 2015; Karl et al., 2016; Doerr et al., 2018; Buesing et al., 2018). Behaviors can be derived from dynamics models in many ways. Often, imagined rewards are maximized with a parametric policy (Sutton, 1991; Ha and Schmidhuber, 2018; Zhang et al., 2019) or by online planning (Chua et al., 2018; Hafner et al., 2018). However, considering only rewards within a fixed imagination horizon results in shortsighted behaviors (Wang et al., 2019). Moreover, prior work commonly resorts to derivative-free optimization for robustness to model errors (Ebert et al., 2017; Chua et al., 2018; Parmas et al., 2019), rather than leveraging analytic gradients offered by neural network dynamics (Henaff et al., 2019; Srinivas et al., 2018). We present Dreamer, an agent that learns long-horizon behaviors from images purely by latent imagination. A novel actor critic algorithm accounts for rewards beyond the imagination horizon while making efficient use of arXiv:1912.01603v3 [cs.LG] 17 Mar 2020 ᶃܦݧσʔλͷऩू ᶄજࡏμΠφϛΫεͷֶश ᶅજࡏۭؒͰͷ૝૾ʹΑΔ ɹํࡦɾՁ஋ͷֶश

%SFBNFSͷൃలɿ7ˠ7ˠ7 w જࡏۭؒͰϙϦγʔΛֶश͢Δ%SFBNFS ੈ୅ʹΘͨΔվྑΛܦͯ൚༻Խ %SFBNFS *$-3 7ʢॳ୅ʣ ओͳߩݙ
જࡏۭؒͰϙϦγʔΛ௚઀ֶश જࡏදݱ Ψ΢ε෼෍ʢ࿈ଓʣ ର৅λεΫ %FFQ.JOE$POUSPM4VJUF ಛ௃ 344.ϕʔεʗ૝૾ͰBDUPSDSJUJD %SFBNFS7 *$-3 7ʢ཭ࢄԽʣ ओͳߩݙ ཭ࢄજࡏม਺ʴ,-CBMBODJOH જࡏදݱ ΧςΰϦΧϧʢºʣ ର৅λεΫ "UBSJʢਓؒ௒͑ୡ੒ʣ ಛ௃ ཭ࢄදݱͰදݱྗͱ҆ఆੑ͕޲্ %SFBNFS7 BS9JW 7ʢ൚༻Խʣ ओͳߩݙ TZNMPH༧ଌʴݻఆϋΠύϥ જࡏදݱ ΧςΰϦΧϧʴTZNMPH ର৅λεΫ λεΫʢ୯Ұઃఆʣ ಛ௃ .JOFDSBGUͰμΠϠΛ֫ಘ͢Δॳͷ"* ˠ ˠ ʮ཭ࢄදݱͷಋೖʯͱʮεέʔϧෆมͳֶशʯˠޙͷࣗಈӡసɾ+&1"ܥͷજࡏۭؒઃܭʹӨڹ

·ͱΊɿੈքϞσϧͱ͸ w ੈքϞσϧ͸ʮ؀ڥͷৼΔ෣͍Λ಺෦ʹ֫ಘͨ͠༧ଌثʯ ਓ͕ؒ಄ͷதͰγϛϡϨʔγϣϯ͢Δͷͱಉ͡࢓૊ΈΛ"*಺Ͱ࣮ݱ͢Δ͜ͱͰɺະདྷ༧ଌͱԾ૝ମ ݧֶशΛՄೳ w ୅දతΞʔΩςΫνϟ͸7JTJPO .FNPSZ
$POUSPMMFS 8PSME.PEFMT 7"& 3// .-1 Λى఺ʹɺ1MB/FU͕344.Ͱજࡏۭؒ༧ଌΛ࣮ݱɺ%SFBNFS Ͱજࡏۭؒ಺ϙϦγʔֶशΛୡ੒ w ਐԽͷຊ࣭͸ʮ؍ଌۭؒˠજࡏۭؒʯʮϓϥϯχϯάˠϙϦγʔֶशʯ ܭࢉޮ཰ͱֶश҆ఆੑΛߴΊͨ݁ՌɺήʔϜ͔Βݱ࣮ੈքʢࣗಈӡసɾϩϘοτʣ΁ͷԠ༻͕Մೳ ʹ

ͳͥࣗಈӡసʹੈքϞσϧ͕ඞཁ͔ w ैདྷͷࣗಈӡసٕज़͕௚໘͢Δͭͷຊ࣭త՝୊ΛੈքϞσϧ͕ղܾ ةݥγφϦΦͷرগੑ ࣄނɾٸϒϨʔΩɾาߦऀඈͼग़͠౳ͷ ϨΞΠϕϯτ͕࣮૸ߦσʔλʹগͳ͍ ˠੈքϞσϧ͕ະܦݧγφϦΦΛੜ੒ σʔλ֦ுɾϩϯάςʔϧରԠʹߩݙ ࣮ंࢼߦͷϦεΫ
ڧԽֶशͰ࣮ंΛ૸ΒͤΔͷ͸ ࣄނϦεΫɾίετͷ྆໘Ͱඇݱ࣮త ˠੈքϞσϧ಺ͰԾ૝૸ߦΛ࣮ߦ ϙϦγʔධՁɾվળΛ҆શʹ ໛฿ֶशͷݶք ઐ໳ՈσʔλΛਅࣅΔ͚ͩͰ͸ ؀ڥͷҼՌΛཧղͤͣະ஌ঢ়گͰࣦഊ ˠੈքϞσϧ͕؀ڥͷಈྗֶΛ֫ಘ ෼෍֎γφϦΦ΁ͷ൚ԽΛڧԽ ༧ଌͱܭըͷ౷߹ ैདྷͷ&&͸஌֮ˠߦಈΛ௚݁͠ɺ ະདྷ༧ଌͷ಺෦දݱΛ࣋ͨͳ͍ ˠੈքϞσϧ͕ʮ༧ଌ ܭըʯΛ౷߹ ղऍੑͱઌಡΈೳྗΛ֫ಘ

.*-&<)V /FVS*14>ʢ8BZWFʣ w 344.Ͱӡస؀ڥΛֶशͭͭ͠ɺΤΩεύʔτσʔλ͔ΒϙϦγʔΛ໛฿ֶश͢Δࣗಈӡస ੈքϞσϧ 344.ʹΑΓ؀ڥμΠφϛΫεΛֶश &YQFSUσʔλ͔ΒϙϦγʔΛ໛฿ֶश
w ΞʔΩςΫνϟ 7JTJPO ը૾ˠ#&7ˠજࡏදݱ .FNPSZ 344.ʹΑΓঢ়ଶભҠΛϞσϧԽ ߦಈ৚݅෇͖ͰະདྷΛ༧ଌ $POUSPMMFS ϙϦγʔωοτϫʔΫʢ.-1

.*-&<)V /FVS*14>ʢ8BZWFʣ w ֶशϑΣʔζʢऩूࡁΈӡసσʔλ͔Βֶशʣ 7JTJPOɿը૾Λજࡏಛ௃ ʹѹॖ .FNPSZʢ344.ʣɿաڈͷঢ়ଶͱߦಈ͔Βɺݱࡏͷঢ়ଶ
Λֶश %FDPEFSɿঢ়ଶ͔Βը૾ͱ#&7Λ࠶ߏ੒ˠ؀ڥͷߏ଄ΛજࡏʹຒΊࠐΉ 1PMJDZʢ໛฿ֶशʣɿঢ়ଶ͔ΒߦಈΛ༧ଌ͠ɺΤΩεύʔτߦಈͱͷޡࠩΛ࠷খԽ w ਪ࿦ϑΣʔζʢ$"3-"্Ͱͷ࣮ंӡసʣ 0CTFSWJOHʢ؍ଌϞʔυʣɿΧϝϥը૾͔Βঢ়ଶΛਪఆˠ1PMJDZ͕ߦಈΛग़ྗˠं͕࣮྆ߦ *NBHJOJOHʢ૝૾Ϟʔυʣɿ؍ଌͳ͠Ͱ344.͕ະདྷͷঢ়ଶΛ༧ଌˠσίʔμ͕ະདྷ#&7Λੜ੒ ˠ؀ڥμΠφϛΫεֶशº໛฿ֶशͰɺڧԽֶशͳ͠ʹ҆શʹӡసϙϦγʔΛ֫ಘ xt (ht , st )

.*-&ʹΑΔࣗಈӡసγϛϡϨʔγϣϯ w $"3-"্Ͱ࣮ߦྫ ੈքϞσϧֶ͕शͨ͠જࡏ#&7දݱ͚ͩͰ౎ࢢ෦ͷӡసγʔϯΛਖ਼֬ʹ༧ଌɾ࣮ߦͰ͖Δ͜ͱΛ࣮ূ

%SJWF%SFBNFS<8BOH &$$7> w ࣮ੈքͷӡసγφϦΦ͔Βߏங͞Εͨੜ੒ܕ8PSME.PEFM ಈըੜ੒ͱߦಈ༧ଌΛ౷߹ͨ͠ϑϨʔϜϫʔΫͱͯ͠ઃܭ w %J ff
VTJPO.PEFM ؀ڥੜ੒ɾཧղͱͯ͠࢖༻ )%.BQɼ%#PYɼ5FYU͔Βಈըੜ੒ w "DUJPO'PSNFS (36 BUUFOUJPOʹΑΔ࣌ܥྻϞσϦϯά աڈߦಈ͔Βະདྷͷߏ଄ঢ়ଶΛਪఆ w "DUJPO%FDPEFS %J ff VTJPOಛ௃ աڈߦಈΛೖྗ͠ɼ.-1ͰϙϦγʔΛੜ੒

w ަ௨ঢ়گͱ༷ʑͳςΩετϓϩϯϓτʢ੖ΕɺӍɺ໷ʣΛ༻͍ͨӡసಈըͷੜ੒ %SJWF%SFBNFSʹΑΔੜ੒<>

w ະདྷͷӡసಈըͷੜ੒ %SJWF%SFBNFSʹΑΔੜ੒<>

w ಉ͡ʮࣗಈӡసͷੈքϞσϧʯͰ΋ɺ໨తɾग़ྗɾ׆༻๏͸ҟͳΔ .*-&ͱ%SJWF%SFBNFS .*-& ⾢੍ޚܕʢ$POUSPMPSJFOUFEʣ ग़ྗɹɿજࡏঢ়ଶʴߦಈʢεςΞϦϯάɾ଎౓ʣ ໨తɹɿดϧʔϓͰͷϙϦγʔֶश ڧΈɹɿܰྔɾ࣮࣌ؒಈ࡞ɺ$"3-"Ͱ௚઀ӡస ऑΈɹɿ؍ଌͷ࠶ߏ੒͕ඞཁɺજࡏͷղऍੑʹݶք
׆༻ྫɿγϛϡϨʔλ಺ͷΤϯυπʔΤϯυӡస %SJWF%SFBNFS ⾣ੜ੒ܕʢ(FOFSBUJWFʣ ग़ྗɹɿߴ඼࣭ͳӡసಈը ໨తɹɿσʔλੜ੒ɾγφϦΦ֦ு ڧΈɹɿ)%.BQɾςΩετͰଟ༷ͳঢ়گΛ߹੒ ऑΈɹɿਪ࿦ίετେɺϦΞϧλΠϜ੍ޚ͸ࠔ೉ ׆༻ྫɿϨΞγφϦΦੜ੒ɺධՁ༻σʔλ࡞੒ ໾ׂ෼୲੍ޚܕ͸ʮ૸ΔͨΊʯɺੜ੒ܕ͸ʮֶशͷͨΊʯ ं྆ଆ"*ͱσʔλੜ੒ύΠϓϥΠϯͰ໾ׂΛ෼୲

%(4ʹΑΔࣗಈӡసੈքϞσϧɿը૾ۭؒϕʔε w %SJWF%SFBNFS%<;IBP $713> ಈըੜ੒ੈքϞσϧʢ%SJWF%SFBNFSʣ %(BVTTJBO4QMBUUJOH ੈքϞσϧΛʮσʔλϚγϯʯͱͯ͠׆༻͠ɺ৽ني੻ʢMBOFDIBOHF
Ճ଎ݮ଎౳ʣͷಈըΛੜ੒ ੜ੒͞ΕͨಈըΛ%(4ʹऔΓࠐΜͰɺ܇࿅σʔλ෼෍֎ͷࢹ఺΋ߴ඼࣭ʹ࠶ߏ੒ ը૾ۭؒͰະདྷΛੜ੒ %(4Ͱۭؒ੔߹ੑΛ֬อ

%(4ʹΑΔࣗಈӡసੈքϞσϧɿજࡏۭؒϕʔε w %(BVTTJBOΛ಺෦දݱͱͯ͠׆༻͠ɺ%ۭؒͰະདྷͷPDDVQBODZΛ༧ଌ ໰୊ઃఆͷ࠶ఆٛɿݱ؍ଌΛ৚݅ͱͨ͠%0DDVQBODZ'PSFDBTUJOH w (BVTTJBO8PSME<;VP $713>
γʔϯਐԽΛཁૉʹ෼ղͯ͠%(BVTTJBOۭؒͰ༧ଌ ᶃࣗंӡಈิਖ਼ɿ੩తγʔϯશମΛFHPNPUJPOͰҐஔ߹Θͤ ᶄಈత෺ମͷہॴӡಈɿҠಈ͢Δ(BVTTJBOͷΈΛݸผʹ༧ଌ ᶅ৽ن؍ଌͷิ׬ɿݱϑϨʔϜ͔Β৽͘͠ݟ͑ͨྖҬΛ௥Ճ ݁ՌɿOV4DFOFTͰ୯ϑϨʔϜൺN*P6 Ҏ্ɺ௥Ճܭࢉͳ͠ w %VBM-BUFOU8PSME.PEFMT %-8. <;IV $713> ஈ֊(BVTTJBOத৺ࣄલֶश ᶃ(BVTTJBO fl PX༠ಋˠPDDVQBODZ༧ଌ༻ ᶄFHPQMBOOJOH༠ಋˠಈ࡞ܭը༻ ˠ%PDDVQBODZ஌֮ɾ%༧ଌɾӡಈܭըͷλεΫͰ4P5"

7-"ºੈքϞσϧʹΑΔࣗಈӡస w 7-" 7JTJPO-BOHVBHF"DUJPO ˕ڧΈɿݴޠʹΑΔߴϨϕϧਪ࿦ɾࢦࣔཧղɾγʔϯཧղɾղऍੑ ºݶքɿ࣌ؒμΠφϛΫεͷϞσϧԽ͕ऑ͍ˠ4VQFSWJTJPO%F
fi DJUʢૄͳߦಈ৴߸ͷΈʣ w 8PSME.PEFM ੈքϞσϧ ˕ڧΈɿະདྷ༧ଌɾԾ૝ମݧֶशɼ؀ڥμΠφϛΫεͷ಺෦֫ಘ ºݶքɿݴޠʹΑΔߴϨϕϧਪ࿦͕Ͱ͖ͳ͍ˠੜ੒ͨ͠ະདྷΛʮධՁʯͰ͖ͳ͍ w 7-"ºੈքϞσϧ ᶃ૝૾͔ͯ͠Βӡస͢Δɿʮ΋͜͠͏ಈ͍ͨΒͲ͏ͳΔ͔ʯΛ಺෦γϛϡϨʔτ͔ͯ͠Βߦಈ ᶄ4VQFSWJTJPO%F fi DJUͷղফɿະདྷ༧ଌ͕ີͳࣗݾڭࢣ৴߸Λఏڙ͠ɺσʔλεέʔϦϯάଇΛ૿෯ ᶅߦಈ৚݅෇͖ҼՌਪ࿦ɿࣗ෼ͷߦಈ࣍ୈͰະདྷ͕Ͳ͏มΘΔ͔Λਪ࿦Ͱ͖Δʢ8IBUJGʣ ᶆ-POHUBJM΁ͷରԠɿ--.ͷੈք஌ࣝͰϨΞγφϦΦΛิ׬ɺ໛฿ֶशͷݶքΛಥഁ

ͭͷ݁߹ΞʔΩςΫνϟύλʔϯ w 7-"ͱੈքϞσϧͷ݁߹͸݁߹ͷਂ͞ʹΑͬͯͭʹେผ 1BUUFSO" ෼཭ܕ &YUFSOBM4JNVMBUPS 7-" 8. ֎෦ɾಠཱ
ੈքϞσϧ͕֎෦γϛϡϨʔλ ͱͯ͠ಠཱಈ࡞ ୅දख๏ *3-7-" ใुධՁɾ3-܇࿅ 1BUUFSO# ಛ௃ڞ༗ܕ 'FBUVSF4IBSJOH Enc WM VLA ڞ༗දݱɾฒྻ ڞ༗Τϯίʔμ͔Β 8.ͱ7-"Λฒྻग़ྗ ୅දख๏ %SJWF7-"8 ࣄલֶशɾσʔλεέʔϦϯά 1BUUFSO$ ΠϯλʔϦʔϒܕ *OUFSMFBWFE 7-" 8. 7-" 8. ดϧʔϓ ༧ଌͱܭը͕ަޓ ༧ଌͱܭըΛ ดϧʔϓͰަޓ࣮ߦ ୅දख๏ 7-"8PSME ൓লతਪ࿦ 1BUUFSO% જࡏۭؒ౷߹ܕ 6OJGJFE-BUFOU 6OJGJFE -BUFOU WM VLA ҰମԽɾ8IBUJGਪ࿦ 8.જࡏΛҙࢥܾఆม਺ͱͯ͠ ౷ҰۭؒͰ࠷దԽ ୅දख๏ %SJWF8PSME7-" ҼՌ8IBUJGਪ࿦ ݁߹ͷਂ͞ʢࠨˠӈͰਂ·Δʣɿ෼཭ˠฒྻˠަޓˠҰମԽ

1BUUFSO"ɿ෼཭ܕ w ੈքϞσϧ͕ಠཱͨ͠֎෦γϛϡϨʔλͱͯ͠ಈ࡞͠ɺ7-"ͷߦಈΛࣄޙݕূ ೖྗ ը૾ ࢦࣔ 7-" ߦಈҊ ݕূ
8PSME.PEFM ϩʔϧΞ΢τ ࠾༻٫Լ ੈքϞσϧ͸7-"ͷ֎෦ʹஔ͔Εɺߦಈͷ҆શੑΛࣄޙݕূ ˕ϝϦοτ ɾ࣮૷͕୯७ɺطଘ7-"ʹ௥ՃՄೳ ɾϞδϡʔϧಠཱͰ։ൃɾอक͠΍͍͢ ɾ҆શੑݕূͱͯ͠໌֬ͳ໾ׂ෼୲ ºσϝϦοτ ɾߏ଄త෼཭ʹΑΓજࡏ஌ࣝͷసҠ͕ࠔ೉ ɾਪ࿦͕࣌ؒ૿ՃʢϩʔϧΞ΢τͷίετʣ ɾ7-"ͷදݱྗ޲্ʹ͸د༩͠ͳ͍

1BUUFSO"ɿ෼཭ܕ w *3-7-"<+JBOH BS9JW> 3FXBSE8PSME.PEFMͰ7-"ϙϦγʔΛ܇࿅ ੈքϞσϧΛ֎෦ͷใुධՁثͱͯ͠ಠཱ׆༻ 110ʹΑΔڧԽֶशͰ҆શɾշదɾޮ཰Λ࠷దԽ
/"74*.WͰ4P5"ɺ$713"($TUSVOOFSVQ Figure 2. Overview of the IRL-VLA Framework. This ﬁgure illustrates the three-stage pipeline of our close-loop Reinforcement Learning via Reward World Model framework for Vision-Language-Action (VLA) in autonomous driving. a) Imitation Policy Learning initializes the

1BUUFSO#ɿಛ௃ڞ༗ܕ w ڞ༗Τϯίʔμ͔ΒੈքϞσϧ༧ଌͱߦಈ༧ଌΛฒྻʹग़ྗ͠ɺදݱΛڞ௨Խ ˕ϝϦοτ ɾ&OEUPFOEֶशՄೳ ɾେن໛σʔλͷදݱྗΛ࠷େݶ׆༻ ɾ/"74*.ϕϯνϚʔΫͰ4P5"ୡ੒ ºσϝϦοτ ɾߦಈ৚݅෇͖ҼՌਪ࿦Λܽ͘
ɾ൓࣮Ծ૝ʢ8IBUJGʣ૝૾੍͕ݶ͞ΕΔ ೖྗ ڞ༗ &ODPEFS 8PSME.PEFM 7-"&YQFSU ˠະདྷը૾ "DUJPO&YQFSU ܰྔ ˠي੻ ✱ະདྷը૾༧ଌ͕ີͳࣗݾڭࢣ৴߸Λఏڙ

1BUUFSO#ɿಛ௃ڞ༗ܕ w %SJWF7-"8<-J *$-3> 4VQFSWJTJPO%F fi DJUʢ؂ಜ৴߸ෆ଍ʣɿ਺ԯύϥϝʔλͷ7-"Λɺ਺࣍ݩͷߦಈ৴߸͚ͩͰ܇࿅ ͢Δෆۉߧ
ˠૄͳߦಈ৴߸ʢي੻ʣ͚ͩͰͳ͘ɺີͳະདྷը૾༧ଌʢࣗݾڭࢣʣ σʔλεέʔϦϯάଇΛ૿෯ʢഒن໛Ͱݕূʣ DRIVEVLA-W0: WORLD MODELS AMPLIFY DATA SCALING LAW IN AUTONOMOUS DRIVING Yingyan Li1→ Shuyao Shang1→ Weisong Liu1→ Bing Zhan1→ Haochen Wang1→ Yuqi Wang1 Yuntao Chen1 Xiaoman Wang2 Yasong An2 Chufeng Tang2 Lu Hou2 Lue Fan1 Zhaoxiang Zhang1 1NLPR, Institute of Automation, Chinese Academy of Sciences (CASIA) 2Yinwang Intelligent Technology Co. Ltd. {liyingyan2021,shangshuyao2024,liuweisong2024,zhanbing2024}@ia.ac.cn {lue.fan, zhaoxiang.zhang}@ia.ac.cn Code: https://github.com/BraveGroup/DriveVLA-W0 DriveVLA-W0 (World Modeling) VLA (Action Prediction) Image Text Action Sparse Action Supervision Image Text Action Image Text Action 700K 7M 70M 4.00 4.25 4.50 4.75 5.00 5.25 5.50 Number of Frames Collision Rate (‱) TransFuser VLA (Action Prediction) DriveVLA-W0 (World Modeling) Visual & Action Supervision Decrease 20.4% (a) Action Prediction vs. World Modeling (b) Scaling with Data Size Figure 1: World modeling as a catalyst for VLA data scalability. (a): Unlike standard VLAs trained solely on action supervision, our DriveVLA-W0 is trained to predict both future actions and [cs.CV] 18 Dec 2025

1BUUFSO$ɿΠϯλʔϦʔϒܕ w ༧ଌͱܭըΛดϧʔϓͰަޓʹ࣮ߦ͠ɺ૝૾ˠ൓লˠي੻मਖ਼ͷαΠΫϧΛ࣮ݱ ˕ϝϦοτ ɾดϧʔϓͰҰ؏ੑ ɾઌಡΈೳྗ͕େ෯޲্ ºσϝϦοτ ɾਪ࿦͕࣌ؒ௕͍ ɾෳ਺ճͷϩʔϧΞ΢τ
7-" 7-"͕ॳظي੻Λੜ੒ 8PSME.PEFM ߦಈʹج͖ͮ࣍ϑϨʔϜΛ૝૾ 7-" 7-"͕૝૾ະདྷΛ൓লɾਪ࿦ 7-" 7-"͕ي੻Λमਖ਼ɾग़ྗ ดϧʔϓ ᶃ ᶄ ᶅ ᶆ

1BUUFSO$ɿΠϯλʔϦʔϒܕ w 7-"8PSME<8BOH $713> ༧ଌత૝૾ ൓লతਪ࿦ˠߦಈ༝དྷͷي੻Ͱ࣍ϑϨʔϜը૾Λੜ੒ ੜ੒͞Εͨ૝૾ະདྷΛ7-"͕ਪ࿦ɾධՁ
༧ଌي੻Λվળ͠ɺӡసઌಡΈೳྗΛ޲্ ਓؒӡసͷೝ஌ϓϩηεΛ໛฿

1BUUFSO%ɿજࡏۭؒ౷߹ܕ w ੈքϞσϧͷજࡏঢ়ଶΛ7-"ͷҙࢥܾఆม਺ͱͯ͠௚઀࠷దԽɺߦಈ৚݅෇͖ҼՌਪ࿦Λ ࣮ݱ ˕ϝϦοτ ɾߦಈ৚݅෇͖ҼՌਪ࿦Ͱ௕ظܭըʹڧ͍ ɾજࡏۭؒॲཧͰߴ଎ɾεέʔϥϒϧ ºσϝϦοτ ɾΞʔΩςΫνϟઃܭ͕ෳࡶ
ɾજࡏදݱͷֶश͕ෆ҆ఆʹͳΓ΍͍͢ 8.જࡏҙࢥܾఆม਺ɿީิߦಈ͕ະདྷΛͲ͏ม͑Δ͔Λ಺෦ͰධՁ ೖྗ ʢߦಈ৚݅෇͖૝૾ʣ 8.જࡏ 7-"ਪ࿦ ߦಈ ߦಈ

1BUUFSO%ɿજࡏۭؒ౷߹ܕ w %SJWF8PSME7-"<+JB BS9JW> ੈքϞσϧજࡏΛҙࢥܾఆม਺ͱͯ͠࠷దԽ͠ɺڞ༗જࡏۭؒͰ੍ޚՄೳͳ૝૾Λ௨ͯ͡౷Ұతͳ ҼՌ8IBUJGਪ࿦Λ࣮ݱ ϐΫηϧϨϕϧSPMMPVUΛճආʢߴ଎ʣ
౷ҰతͳҼՌ8IBUJGਪ࿦Λ࣮ݱ /"74*.WͰ1%.4 /"74*.WͰ&1%.4 OV4DFOFTͰ$3ʢ4P5"ʣ DriveWorld-VLA: Unified Latent-Space World Modeling with Vision–Language–Action for Autonomous Driving Feiyang jia * 1 2 Lin Liu * 1 2 Ziying Song 1 Caiyan Jia † 1 Hangjun Ye 2 Xiaoshuai Hao † 2 Long Chen ‡ 2 Refine Absence of Prospective Rollout Limited Knowledge Transfer Unified Feature Sharing & Causal Reasoning (a) Disentangled Interaction (b) Feature-Sharing (c) Our DriveWorld-VLA VLA Model Action Reward World Model VLA & World Model Shared Feature Space Action Observe VLA & World Model Shared Feature Space Action Observe Control Refine PDMS in NAVSIM Avg. Col. Rate in nuScenes (d) Performance 0.16 91.3 Figure 1. Comparison of VLA & World Model Coupling Strategies. (a) Disentangled Interaction: The world model acts as an external [cs.CV] 6 Feb 2026

w ਪ࿦࣌ͷ8IBUJGਪ࿦ʢ%SJWF8PSME7-"ʣ ෳ਺ͷߦಈީิΛજࡏۭؒͰฒྻධՁ͔ͯ͠Βग़ྗ 1BUUFSO%ɿજࡏۭؒ౷߹ܕ ਪ࿦ϧʔϓ 0CTFSWF ηϯαೖྗˠજࡏදݱzt 7-"ީิੜ੒
ߦಈީิ{a1 , a2 , . . . , ak } 8.8IBUJGਪ࿦ ֤ ͰજࡏϩʔϧΞ΢τ ak zk t+1 = WM(zt , ak ) ධՁ ֤જࡏະདྷΛ҆શੑɾޮ཰Ͱ࠾఺ 7-"ग़ྗ ࠷ྑߦಈ Λ࣮ं΁ a* ۩ମྫɿԫ৭৴߸ͷަࠩ఺ʢ଎౓LNIʣ ٸϒϨʔΩ a1 ˠޙଓं͕௥ಥϦεΫ ؇΍͔ʹݮ଎ a2 ˠ৴߸खલͰ҆શఀࢭ ͦͷ··௨ա a3 ˠ੺৴߸ਐೖɾҧ൓ϦεΫ Ճ଎ͯ͠௨ա a4 ˠԫ৭ͷ͏ͪʹ௨աɾ଎౓௒ա બ୒ɿ ʢ؇΍͔ʹݮ଎ʣΛ࣮ंʹग़ྗ a2 ˞ϐΫηϧͰ͸ͳ͘જࡏۭؒͰ༧ଌˠϦΞϧλΠϜ࣮ߦՄೳʢ਺ඦສϐΫηϧੜ੒Λճආʣ ˒ ᶃ ᶄ ᶅ ᶆ ᶇ

w ݁߹ͷਂ͕͞૿͢΄Ͳੑೳ޲্ɺͨͩ͠ઃܭɾֶशͷෳࡶ͞΋૿Ճ ·ͱΊɿ7-"ºੈքϞσϧʹΑΔࣗಈӡస ؍఺ "෼཭ܕ #ಛ௃ڞ༗ܕ $ΠϯλʔϦʔϒܕ %જࡏۭؒ౷߹ܕ ݁߹ͷਂ͞
ઙ͍ʢಠཱʣ தʢදݱڞ༗ʣ ਂ͍ʢดϧʔϓʣ ࠷ਂʢҰମԽʣ ୅දख๏ *3-7-" %SJWF7-"8 7-"8PSME %SJWF8PSME7-" 8.ͷ໾ׂ ҆શੑݕূ ࣗݾڭࢣ৴߸ ૝૾ˠ൓ল ҙࢥܾఆม਺ ڧΈ ࣮૷༰қ طଘ7-"׆༻ σʔλεέʔϦϯά &&ֶशՄ Ұ؏ੑߴ͍ ઌಡΈೳྗ ҼՌਪ࿦Մ ߴ଎ɾ4P5" ऑΈ ஌ࣝసҠͳ͠ ਪ࿦࣌ؒ૿ ҼՌਪ࿦ͳ͠ 8IBUJGࠔ೉ ਪ࿦͕࣌ؒ௕͍ ϩʔϧΞ΢τଟ ઃܭ͕ෳࡶ ֶश͕ෆ҆ఆ ൃలͷํ޲ੑ ᶃ4VQFSWJTJPO%FGJDJUͷղফˠେن໛σʔλͷεέʔϦϯάଇΛ׆ੑԽ ᶄʮ૝૾ˠ൓লˠߦಈʯͷดϧʔϓˠਓؒʹ͍ۙӡసೝ஌ͷ࣮ݱ

·ͱΊɿࣗಈӡసʹ͓͚ΔੈքϞσϧ w ࣗಈӡసʹ͸ʮ҆શʹະདྷΛ༧ଌ͢Δ࢓૊Έʯ͕ඞਢ ϨΞγφϦΦͷرগੑɾ࣮ंࢼߦͷϦεΫɾ໛฿ֶशͷ൚Խෆ଍ͱ͍͏՝୊ʹର͠ɺੈքϞσϧ͕ղΛ༩͑Δ w ༻్ผʹ੍ޚܕɾੜ੒ܕɾ%0DDVQBODZܕʹେผ ੍ޚܕʢ.*-&ʣɿ344.Ͱજࡏ੍ۭؒޚΛ࣮ݱ
ੜ੒ܕʢ%SJWF%SFBNFSʣɿ֦ࢄϞσϧͰӡసಈըΛੜ੒ %0DDVQBODZܕʢ(BVTTJBO8PSME %-8.ʣɿ%(BVTTJBOͰزԿతʹະདྷΛ༧ଌ w 7-"ºੈքϞσϧʹΑΔ࣍ੈ୅ΞʔΩςΫνϟ͕ొ৔ ݁߹ͷਂ͞Ͱύλʔϯʢ෼཭ܕˠಛ௃ڞ༗ܕˠΠϯλʔϦʔϒܕˠજࡏۭؒ౷߹ܕʣ 4VQFSWJTJPO%F fi DJUͷղফͱʮ૝૾ˠ൓লˠߦಈʯͷดϧʔϓ͕ൃలͷ伴 w ࢒͞Εͨ՝୊͸ܭࢉίετɾֶशͷ҆ఆੑɾεέʔϥϏϦςΟ ੜ੒ܕ͸ߴ඼࣭͕ͩਪ࿦ίετ͕େɺજࡏۭؒ౷߹ܕ͸ઃܭͱֶश͕ෳࡶ ࣮ंల։ʹ޲͚ͨϦΞϧλΠϜੑͱ൚Խੑೳͷཱ͕྆ࠓޙͷয఺

࠶ߏ੒ܕͱ+&1"ܕ ؍఺ ࠶ߏ੒ܕʢ(FOFSBUJWFʣ +&1"ܕʢ1SFEJDUJWFʣ ༧ଌର৅ ϐΫηϧಈըϑϨʔϜ જࡏಛ௃දݱʢந৅ϕΫτϧʣ ୅දख๏ 1MB/FU
%SFBNFS ("*" %SJWF%SFBNFS *+&1" 7+&1" -F8PSME.PEFM ଛࣦ ըૉϨϕϧ.4&,-֦ࢄଛࣦ જࡏۭؒ.4& ਖ਼ଇԽ ܭࢉίετ ߴʢϐΫηϧۭؒͰܭࢉʣ ௿ʢજࡏۭؒͷΈʣ දݱ่յϦεΫ ௿ʢ࠶ߏ੒੍͕໿ͱͯ͠ಇ͘ʣ ߴˠରࡦ͕ඞਢ ࡉ෦ͷอ࣋ ڧ͍ʢແؔ܎ͳࡉ෦·Ͱ༧ଌʣ ऑ͍ʢແࢹͯ͠Α͍ʣ w ੈքϞσϧͷ 2 ͭͷֶशઓུ

w ະೖྗ෦෼ͷಛ௃දݱΛ༧ଌ͢Δ͜ͱͰɼσʔλʹ಺ࡏ͢Δߏ଄ɾنଇੑΛֶश͢Δࣗݾڭ ࢣ͋Γֶश +PJOU&NCFEEJOH1SFEJDUJWF"SDIJUFDUVSF +&1" <"TTSBO $713> ըૉۭؒͰΤωϧΪʔΛܭࢉ ୅දख๏7"&
."& %SFBNFS ՝୊ແؔ܎ͳࡉ෦·Ͱ༧ଌ જࡏۭؒͰΤωϧΪʔΛܭࢉ ୅දख๏4JN$-3 #:0- %*/0 ՝୊༧ଌػߏΛ࣋ͨͳ͍ જࡏۭؒͰ༧ଌޡࠩΛܭࢉ ୅දख๏*+&1" 7+&1" -F8. ՝୊දݱ่յ͠΍͍͢ B (FOFSBUJWFʢੜ੒ܕʣ ؍ଌۭؒͰ࠶ߏ੒ C +PJOU&NCFEEJOHʢ݁߹ຒΊࠐΈܕʣ જࡏදݱΛ੔߹ͤ͞Δ D +&1" જࡏۭؒͰ༧ଌ͢Δ

ैདྷख๏ͱͦͷ՝୊ w +PJOU&NCFEEJOH1SFEJDUJWF"SDIJUFDUVSF +&1" <-F$VO 0QFO3FWJFX> ະೖྗ෦෼ͷಛ௃දݱΛ༧ଌ͢Δ͜ͱͰɼσʔλʹ಺ࡏ͢Δߏ଄ɾنଇੑΛֶश͢Δࣗݾڭࢣ͋Γ
ֶश w +&1"ͷ՝୊ දݱ่յΛ๷͙ͨΊʹෳࡶͳଟ߲ଛࣦɼ&."ɼࣄલֶश͞ΕͨΤϯίʔμɼิॿ৴߸ʹґଘ දݱͷ่յɿೖྗͷҧ͍͕ࣦΘΕɼಛ௃දݱ͕΄΅ఆ਺Խ͢Δݱ৅

+&1"ܥݚڀͷൃల w +&1"͸ը૾ˠಈըˠߦಈ৚݅෇͖ˠܰྔԽͱஈ֊తʹൃల *+&1" ը૾ ɹ$713 ϚεΫͨ͠ը૾ྖҬͷજࡏ දݱΛ༧ଌ
࠶ߏ੒ෆཁͷࣗݾڭࢣ͋Γ ֶशΛཱ֬ 7+&1" ಈը ಈը΁ͷ֦ுɻۭ࣌ؒϚε ΫͰө૾දݱΛֶश ߦಈ͸·ͩ৚݅෇͚ͳ͍ 7+&1" ߦಈ৚݅෇͖ BDUJPODPOEJUJPOFEʹ֦ ுɺߦಈܭը·ͰՄೳʹ ͨͩ͠େن໛ࣄલֶशΛཁ ͢Δ -F8PSME.PEFM ܰྔɾ&& SBXQJYFMT͔Β&OEUP &OEͰܰྔֶश ࣄલֶशෆཁɺ4*(3FHͰ҆ ఆԽ ஫໨఺εέʔϧґଘ͔Βͷ୤٫-F8PSME.PEFM͸ʮࣄલֶशͳ͠Ͱ΋+&1"͕ػೳ͢Δʯ͜ͱΛࣔͨ͠ॳͷݚڀ

-F8PSME.PEFM -F8. <-.BFT BS9JW`> w ߦಈΛ৚݅ͱͯ࣍࣌͠ࠁͷಛ௃දݱΛ༧ଌ͢Δ+&1"ܕͷ8PSME.PEFM SBXQJYFMT͔Β&OEUP&OEʹܰྔͰ҆ఆֶͨ͠श͕Մೳ
ܰྔ͔ͭ୯७ͳ&OEUP&OEͳ+&1"ܕͷ8PSME.PEFMͷߏங -F8.ͷֶशύΠϓϥΠϯ

ϞσϧΞʔΩςΫνϟ w &ODPEFS ؍ଌ஋ΛίϯύΫτͳ௿࣍ݩಛ௃දݱʹϚοϐϯά جຊΞʔΩςΫνϟ wߏ੒ɿ7J5UJOZ wύϥϝʔλ਺ɿ໿.ύϥϝʔλ
wϨΠϠʔ਺ɿ૚ wΞςϯγϣϯϔουɿݸ wӅΕ૚ͷ࣍ݩ਺ɿ࣍ݩ wύοναΠζɿ 𝑧 𝑡 = enc 𝜃 ( 𝑜 𝑡 ) -F8.ͷֶशΞʔΩςΫνϟ ɿ࣌ࠁ ʹ͓͚Δ௿࣍ݩಛ௃දݱ ɿ࣌ࠁ ʹ͓͚Δ؍ଌ஋ ɿ&ODPEFSֶ͕࣋ͭशύϥϝʔλ 𝑧 𝑡 𝑡 𝑜 𝑡 𝑡 𝜃

ϞσϧΞʔΩςΫνϟ w 1SFEJDUPS ͕༩͑ΒΕͨͱ͖࣍࣌ࠁͷಛ௃දݱΛ༧ଌ جຊΞʔΩςΫνϟ wߏ੒ɿ7J54 wύϥϝʔλ਺ɿ໿.ύϥϝʔλ
wϨΠϠʔ਺ɿ૚ wΞςϯγϣϯϔουɿݸ wυϩοϓΞ΢τɿ໿ "DUJPOͷ౷߹ϝΧχζϜ w ֤૚ʹ"EB-/Λద༻ w "EB-/ͷύϥϝʔλ͸θϩॳظԽΛ࠾༻ ^ 𝑧 𝑡 +1 = pred 𝜙 ( 𝑧 𝑡 , 𝑎 𝑡 ) -F8.ͷֶशΞʔΩςΫνϟ ɿ࣌ࠁ ʹ͓͚Δ༧ଌͨ͠ಛ௃දݱ ɿ࣌ࠁ ʹ͓͚Δߦಈ ɿ1SFEJDUPSֶ͕࣋ͭशύϥϝʔλ ^ 𝑧 𝑡 +1 𝑡 + 1 𝑎 𝑡 𝑡 𝜙

ଛࣦܭࢉ w ֶश໨తɿ؀ڥμΠφϛΫεΛϞσϧԽ ؀ڥμΠφϛΫεɿ؍ଌঢ়ଶͱߦಈʹΑͬͯ࣍ͷঢ়ଶ͕Ͳ͏ܾ·Δ͔ͱ͍͏ભҠ๏ଇ ℒLeWM ≜ ℒpred +
𝜆 SIGReg( 𝑍 ) ℒpred ≜ ^ 𝑧 𝑡 +1 − 𝑧 𝑡 +1 2 2 ^ 𝑧 𝑡 +1 = pred 𝜙 ( 𝑧 𝑡 , 𝑎 𝑡 ) ɿ࣌ࠁ ʹ͓͚Δ༧ଌͨ͠ಛ௃දݱ ɿ࣌ࠁ ʹ͓͚Δਖ਼ղͷಛ௃දݱ ɿPredictorֶ͕࣋ͭशύϥϝʔλ ɿ࣌ࠁ ʹ͓͚Δਖ਼ղͷಛ௃දݱ ɿ࣌ࠁ ʹ͓͚Δߦಈ ^ 𝑧 𝑡 +1 𝑡 + 1 𝑧 𝑡 +1 𝑡 + 1 𝜙 𝑧 𝑡 𝑡 𝑎 𝑡 𝑡 ༧ଌޡࠩ ਖ਼ଇԽ

ଛࣦܭࢉɿ4LFUDIFE*TPUSPQJD(BVTTJBO3FHVMBSJ[FS 4*(3FH w දݱ่յΛ๷͙ਖ਼ଇԽ߲ ಛ௃දݱΛϥϯμϜͳҰ࣍ݩํ޲΁ࣹӨ ඪ४ਖ਼ن෼෍ʹै͏Α͏&QQTQVMMFZݕఆ౷ܭྔͰਖ਼ଇԽ 4*(3FHʹΑΔදݱۭؒͷਖ਼نԽϓϩηε
SIGReg( 𝑍 ) ≜ 1 𝑀 𝑀 ∑ 𝑚 =1 𝑇 (h( 𝑚 )) ɿೖྗσʔλͷಛ௃ྔ ɿ౤Өͷ਺ ɿҰ࣍ݩ౤Өσʔλ ɿ&QQTQVMMFZݕఆ౷ܭྔ ɿཤྺͷ௕͞ ɿόοναΠζ ɿຒΊࠐΈ࣍ݩ 𝑍 𝑀 h( 𝑚 ) 𝑇 𝑁 𝐵 𝑑

-F8PSME.PEFMʹΑΔજࡏۭؒϓϥϯχϯά w ֶशࡁΈϞσϧΛ༻͍ͯજࡏۭؒͰʮ༧ଌˠධՁˠ࠷దԽʯΛดϧʔϓͰ࣮ߦ ᶃΤϯίʔυɿ؍ଌ ͱ໨ඪ Λ&ODPEFSͰજࡏදݱ ʹม׵
ᶄજࡏϩʔϧΞ΢τɿ1SFEJDUPS͕ߦಈ Λ৚݅ʹ Λ༧ଌ ᶅ$PTUධՁɿ࠷ऴ༧ଌ ͱ໨ඪ ͷજࡏڑ཭Λܭࢉ ᶆߦಈ࠷దԽɿ$&.ιϧό͕ίετΛ࠷খԽ͢ΔߦಈྻΛ୳ࡧʢᶄʙᶆΛ൓෮ʣ ᶇ࣮ߦɾ࠶ܭըɿ࠷ྑߦಈ Λ࣮؀ڥͰ࣮ߦɺ৽؍ଌͰᶃ΁໭Δ O1 Og z1 , zg a1 , …, aH z2, …, zH ̂ z2 , …, ̂ zH ̂ zH ̂ zg a1 … Predictor z 1 ẑ 2 a H a 2 a 1 o1 Predictor Predictor ẑ H og zg solver update actions Cost Encoder … Encoder Figure 4: LeWorldModel Latent Planning. Given an initial observation o1 and a goal og , the world model learned in Fig. 2 performs planning in the LeWM latent space. The initial state embedding z and the goal $&.ιϧόɿ࠷దͳߦಈྻΛ୳ͨ͢ΊͷαϯϓϦϯάϕʔεͷ࠷దԽΞϧΰϦζϜ

੍ޚੑೳͷධՁɿ࣮ݧ֓ཁ w -F8.ͷϞσϧن໛ .ύϥϝʔλͷܰྔϞσϧ γϯάϧ(16Ͱͷֶशɾ࣮ߦ w ൺֱର৅
-F8.ɼ%*/08.<(;IPV *$.->ͷೋख๏Λൺֱ %*/08.ɿࣄલֶशࡁΈϞσϧʹґଘ͢Δख๏ w ධՁϓϩτίϧ λεΫɿ%ٴͼ%؀ڥʹ͓͚ΔNBOJQVMBUJPO OBWJHBUJPO MPDPNPUJPO 1VTI5ʢ%؀ڥʣ ɿϩϘοτΞʔϜૢ࡞ 0(#$VCFʢ%؀ڥʣɿཱํମૢ࡞ ධՁ࣠ɿϓϥϯχϯά଎౓ɼϓϥϯχϯά੒ޭ཰ 1VTI5 0(#FODI$VCF

੍ޚੑೳͷධՁɿఆྔతධՁ ϓϥϯχϯά଎౓ͷൺֱ 'JYFE'-01TԼͰͷϓϥϯχϯά੒ޭ཰ൺֱ

੍ޚੑೳͷධՁɿ࣮ݧ֓ཁ w ࣮ݧઃఆ 1VTI5؀ڥͰಘͨಛ௃දݱ͔Β༧ଌ͢ΔઢܗɾඇઢܗͷQSPCFΛֶश w ൺֱର৅ -F8.ɼ1-%.<74PCBM
/FVS*14>ɼ%*/08.ͷࡾख๏Λൺֱ w ධՁϓϩτίϧ λεΫ ɿ1VTI5 ༧ଌର৅ͷ෺ཧྔ ɿΤʔδΣϯτͷҐஔɼϒϩοΫͷҐஔɼϒϩοΫͷ֯౓ ධՁࢦඪ ɿ.4&ʢฏۉೋ৐ޡࠩʣɼSʢ૬ؔ܎਺ʣ

੍ޚੑೳͷධՁɿ࣮ݧ֓ཁ w -F8.͸1-%.ΛҰ؏্ͯ͠ճΓɼ%*/0WͷΑ͏ͳେن໛ࣄલֶशϞσϧʹඖఢ 1VTI5ʹ͓͚Δ෺ཧྔQSPCJOH

-F8PSME.PEFMͷݶքͱࠓޙͷ՝୊ w λεΫɾσʔλͷ੍໿ ධՁ͸1VTI5ʢ%ʣͱ0(#$VCFʢ%ʣͷγϯϓϧͳ੍ޚλεΫͷΈ ࣗಈӡసɾෳࡶγʔϯɾ௕ظ༧ଌ͸ະݕূ ಈըʢ࣌ܥྻʣ΁ͷ௚઀ద༻͸ࠓޙͷ՝୊
࣮؀ڥϩϘοτ΁ͷసҠ͸ະݕূ w εέʔϧͱ൚Խ 7J5UJOZ7J54ͱখن໛ϞσϧͰͷݕূʹཹ·Δ େن໛Խͨ͠ࡍͷڍಈɾεέʔϦϯάଇ͸ະղ໌ ࣮ੈքηϯαʢ-J%"3ɾϨʔμʔʣ΁ͷ֦ு͸ະணख େن໛σʔληοτͰͷޮՌ͸͜Ε͔Β -F8PSME.PEFMͷҐஔ෇͚ɿʮܰྔͰಈ͘ʯ͜ͱΛࣔͨ͠ஈ֊ ࣗಈӡసσʔλ΁ͷద༻ɺಈըϞσϧԽɺେن໛εέʔϦϯά͕ࠓޙͷൃలͷ伴

·ͱΊɿ-F8PSME.PEFM w +&1"ͷ՝୊Λ4*(3FHͰղܾ දݱ่յΛ๷͙&."ɾิॿଛࣦʹཔΒͣɺ֬཰తݕఆྔͰਖ਼ଇԽ͢Δ৽͍͠ઃܭ w .ύϥϝʔλͰ΋%*/0Wڃͷදݱྗ 1VTI50(#$VCFͰϓϥϯχϯά੒ޭ཰͕େ෯޲্ɺ෺ཧྔQSPCJOHͰ΋େن໛ࣄલֶशϞσ
ϧʹඖఢ w ࢒͞Εͨ՝୊͸ಈըɾେن໛ɾ࣮ੈքԠ༻ ຊݚڀ͸੩ࢭըϕʔεͷ࣮ݧ͕த৺ɻࠓޙ͸ಈը༧ଌϞσϧ΁ͷ֦ுɺࣗಈӡసσʔλ΁ͷద༻ɺ ΑΓେن໛ͳεέʔϦϯά͕՝୊

ࢀߟจݙ <>:-F$VO "1BUI5PXBSET"VUPOPNPVT.BDIJOF*OUFMMJHFODF 7FSTJPO 0QFO3FWJFX <>%)BBOE+4DINJEIVCFS 3FDVSSFOU8PSME.PEFMT'BDJMJUBUF1PMJDZ&WPMVUJPO
/FVS*14 <>%)BGOFS 5-JMMJDSBQ *'JTDIFS 37JMMFHBT %)B )-FF BOE+%BWJETPO -FBSOJOH-BUFOU%ZOBNJDTGPS1MBOOJOHGSPN1JYFMT *$.- <>%)BGOFS 5-JMMJDSBQ +#B BOE./PSPV[J %SFBNUP$POUSPM-FBSOJOH#FIBWJPSTCZ-BUFOU*NBHJOBUJPO *$-3 <>%)BGOFS 5-JMMJDSBQ ./PSPV[J BOE+#B .BTUFSJOH"UBSJXJUI%JTDSFUF8PSME.PEFMT *$-3 <>%)BGOFS +1BTVLPOJT +#B BOE5-JMMJDSBQ .BTUFSJOH%JWFSTF%PNBJOTUISPVHI8PSME.PEFMT BS9JW <>")V ($PSSBEP /(SJ ff i UIT ;.VSF[ $(VSBV ):FP ",FOEBMM 3$JQPMMB BOE+4IPUUPO .PEFM#BTFE*NJUBUJPO-FBSOJOHGPS6SCBO%SJWJOH /FVS*14 <>98BOH ;;IV ()VBOH 9$IFO +;IV BOE+-V %SJWF%SFBNFS5PXBSET3FBMXPSMEESJWFO8PSME.PEFMTGPS"VUPOPNPVT%SJWJOH &$$7 <>(;IBP $/J 98BOH ;;IV 9;IBOH :8BOH ()VBOH 9$IFO #8BOH FUBM %SJWF%SFBNFS%8PSME.PEFMT"SF& ff FDUJWF%BUB.BDIJOFTGPS%%SJWJOH4DFOF3FQSFTFOUBUJPO $713 <>4;VP 8;IFOH :)VBOH +;IPV BOE+-V (BVTTJBO8PSME(BVTTJBO8PSME.PEFMGPS4USFBNJOH%0DDVQBODZ1SFEJDUJPO $713 <>:;IV :9VF );IBOH (+JBOH 8;IPV 9:BO +(BP :$BJ #-JV ;-J BOE44IFO %-8.%VBM-BUFOU8PSME.PEFMTFOBCMF)PMJTUJD(BVTTJBODFOUSJD1SFUSBJOJOHJO"VUPOPNPVT%SJWJOH $713 <>"+JBOH :(BP :8BOH ;4VO 48BOH :)FOH )4VO 45BOH -;IV +$IBJ +8BOH ;(V )+JBOH BOE-4VO *3-7-"5SBJOJOHBO7JTJPO-BOHVBHF"DUJPO1PMJDZWJB3FXBSE8PSME.PEFM BS9JW <>:-J 44IBOH 8-JV #;IBO )8BOH :8BOH :$IFO 98BOH :"O $5BOH -)PV -'BO BOE;;IBOH %SJWF7-"88PSME.PEFMT"NQMJGZ%BUB4DBMJOH-BXJO"VUPOPNPVT%SJWJOH *$-3 <>(8BOH 15BOH 93FO (;IBP #'FOH BOE$.B -FBSOJOH7JTJPO-BOHVBHF"DUJPO8PSME.PEFMTGPS"VUPOPNPVT%SJWJOH $713'JOEJOHT <>2-JV )9V +-J #4VO ;)BP %4IF 9;IV BOE-;IBOH 6OJ8PSME7-"*OUFSMFBWFE8PSME.PEFMJOHBOE1MBOOJOHGPS"VUPOPNPVT%SJWJOH BS9JW <>'+JB --JV ;4POH $+JB ):F 9)BP BOE-$IFO %SJWF8PSME7-"6OJ fi FE-BUFOU4QBDF8PSME.PEFMJOHXJUI7JTJPO-BOHVBHF"DUJPOGPS"VUPOPNPVT%SJWJOH BS9JW <>."TTSBO 2%VWBM *.JTSB 1#PKBOPXTLJ 17JODFOU .3BCCBU :-F$VO BOE/#BMMBT 4FMG4VQFSWJTFE-FBSOJOHGSPN*NBHFTXJUIB+PJOU&NCFEEJOH1SFEJDUJWF"SDIJUFDUVSF $713 <>"#BSEFT 2(BSSJEP +1PODF 9$IFO .3BCCBU :-F$VO ."TTSBO BOE/#BMMBT 3FWJTJUJOH'FBUVSF1SFEJDUJPOGPS-FBSOJOH7JTVBM3FQSFTFOUBUJPOTGSPN7JEFP 5.-3 <>."TTSBO "#BSEFT %'BO 2(BSSJEP 3)PXFT .,PNFJMJ ..VDLMFZ "3J[WJ $3PCFSUT ,4JOIB FUBM 7+&1"4FMG4VQFSWJTFE7JEFP.PEFMT&OBCMF6OEFSTUBOEJOH 1SFEJDUJPOBOE 1MBOOJOH BS9JW <>-.BFT 2-F-JEFD %4DJFVS :-F$VO BOE3#BMFTUSJFSP -F8PSME.PEFM4UBCMF&OEUP&OE+PJOU&NCFEEJOH1SFEJDUJWF"SDIJUFDUVSFGSPN1JYFMT BS9JW <>(;IPV )1BO :-F$VO BOE-1JOUP %*/08.8PSME.PEFMTPO1SFUSBJOFE7JTVBM'FBUVSFTFOBCMF;FSPTIPU1MBOOJOH *$.- <>74PCBM 8;IBOH ,$IP 3#BMFTUSJFSP 5(+3VEOFS BOE:-F$VO -FBSOJOHGSPN3FXBSE'SFF0 ff l JOF%BUB"$BTFGPS1MBOOJOHXJUI-BUFOU%ZOBNJDT.PEFMT /FVS*14

チュートリアル：世界モデル

チュートリアル：世界モデル

More Decks by Hironobu Fujiyoshi

Other Decks in Science

Featured

Transcript