Upgrade to Pro — share decks privately, control downloads, hide ads and more …

チュートリアル:世界モデル

Sponsored · Your Podcast. Everywhere. Effortlessly. Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.

 チュートリアル:世界モデル

1. 世界モデルとは
2. 自動運転における世界モデル
3. LeWorldModel

Avatar for Hironobu Fujiyoshi

Hironobu Fujiyoshi

May 30, 2026

More Decks by Hironobu Fujiyoshi

Other Decks in Science

Transcript

  1. νϡʔτϦΞϧɿੈքϞσϧ ౻٢߂࿱ʢத෦େֶɾػց஌֮ϩϘςΟΫεݚڀάϧʔϓʣ IUUQNQSHKQ percept action Actor World Model Intrinsic cost

    Perception Short-term memory configurator Critic Cost Figure 2: A system architecture for autonomous intelligence. All modules in this model are as- sumed to be “di↵erentiable”, in that a module feeding into another one (through an arrow connecting them) can get gradient estimates of the cost’s scalar output with respect to its own output. The configurator module takes inputs (not represented for clarity) from all other modules and configures them to perform the task at hand. The perception module estimates the current state of the world. The world model module predicts possible future world states as a function of imagined actions sequences proposed by the actor. The cost module computes a single scalar output called “energy” that measures the level of dis- comfort of the agent. It is composed of two sub-modules, the intrinsic cost, which is immutable (not trainable) and computes the immediate energy of the current state (pain, pleasure, hunger, etc), and "TZTUFNBSDIJUFDUVSFGPSBVUPOPNPVTJOUFMMJHFODF -F$VO  "1BUI5PXBSET"VUPOPNPVT.BDIJOF*OUFMMJHFODF
  2. w ؀ڥͷৼΔ෣͍Λֶश͠ɺະདྷͷঢ়ଶΛ༧ଌͰ͖Δ"*ͷ಺෦Ϟσϧ w ੈքϞσϧ͕͋ΔͱԿ͕Ͱ͖Δʁ  ະདྷ༧ଌɿߦಈΛى͜͢લʹ݁ՌΛ༧ଌͰ͖Δ  Ծ૝ମݧͰֶशɿ࣮؀ڥΛ࢖ΘͣϞσϧ಺ͰߦಈΛֶश  গͳ͍σʔλͰ൚༻Խɿ؀ڥͷߏ଄Λཧղ͢ΔͨΊ৽͍͠ঢ়گʹ΋ରԠ͠΍͍͢

     ੈքϞσϧʢ8PSME.PEFMTʣͱ͸ʁ ྫɿਓ͕ؒϘʔϧΛΩϟον͢Δͱ͖ ϘʔϧͷيಓΛ೴಺ͰʮγϛϡϨʔγϣϯʯͯ͠ɺखΛ ग़͢৔ॴΛܾΊ͍ͯΔ ˠਓؒ͸಄ͷதʹʮ෺ཧͷ஌ࣝʯΛ͍࣋ͬͯΔ ੈքϞσϧΛ࣋ͭ AI ΋ಉ͡࢓૊Έ աڈͷܦݧ͔Β؀ڥΛֶश͠ɺʮ࣍ʹԿ͕ى͜Δ͔ʯΛ ༧ଌͯ͠ɺ࠷దͳߦಈΛܾఆ͢Δ ˠχϡʔϥϧωοτ͕ʮੈքͷ๏ଇʯΛ಺෦ʹ֫ಘ͢Δ
  3. w طଘͷ"*ͷ՝୊Λղܾ͠ɺΑΓਓؒʹ͍ۙ஌ೳΛ࣮ݱ͢Δ伴ͱͯ͠஫໨  ͳͥࠓɺੈքϞσϧͳͷ͔ ՝୊ طଘͷ"*๊͕͑Δน ⁞ڧԽֶशɿ๲େͳࢼߦࡨޡ͕ඞཁ ࣮؀ڥͰͷࢼߦ͸ةݥɾߴίετ αϯϓϧޮ཰͕ۃΊͯ௿͍  ໛฿ֶशɿະ஌ͷঢ়گʹऑ͍

    ڭࢣσʔλͷ෼෍֎Ͱࣦഊ ؀ڥͷҼՌΛཧղ͍ͯ͠ͳ͍ ⁠--.ɿ਎ମੑΛ൐͏λεΫ͸ࠔ೉ ݴޠʹΑΔදݱʹด͍ͯ͡Δ ਎ମੑɾߦಈΛ൐͏λεΫ͸ࠔ೉ ղܾ ੈքϞσϧ͕੾Γ୓͘΋ͷ 㾎૝૾ͷதͰֶशͰ͖Δ Ϟσϧ಺ͰԾ૝ମݧΛੜ੒ ࣮؀ڥࢼߦΛେ෯࡟ݮ 㾎؀ڥͷߏ଄Λཧղ͢Δ ҼՌؔ܎Λ಺෦දݱͱͯ֫͠ಘ ະ஌γφϦΦ΁ͷ൚Խੑ͕޲্ 㾎਎ମੑͷ͋Δ஌ೳ΁ ෺ཧੈքͰͷ༧ଌɾߦಈ͕Մೳ ࣗಈӡసɾϩϘςΟΫε΁Ԡ༻
  4. w ೥ͷ8PSME.PEFMT࿦จ͔ΒɺԠ༻ྖҬ΁ͱ޿͕ΔൃలͷྲྀΕ  ੈքϞσϧͷݚڀܥේ  8PSME.PEFMT )B4DINJEIVCFS ເͷதͰֶश͢Δ 7.$ߏ଄ΛఏҊ 

    1MB/FU )BGOFSFUBM 344.ΛఏҊ જࡏۭؒͰϓϥϯχϯά  %SFBNFS )BGOFSFUBM જࡏۭؒͰϙϦγʔֶश 7 7΁ൃల  .*-& )VFUBM 8BZWF  ࣗಈӡసʹੈքϞσϧ Λॳಋೖ  %SJWF%SFBNFS 8BOHFUBM ӡసಈըΛੜ੒͢Δ ੜ੒ܕੈքϞσϧ  ~  *7+&1" "TTSBOFUBM .FUB  જࡏۭؒͰ༧ଌ͢Δ ඇੜ੒ܕΞϓϩʔν  -F8PSME.PEFM .BFTFUBM ܰྔͰ҆ఆֶशՄೳͳ +&1"ܕੈքϞσϧ 1. جૅݚڀϑΣʔζ 2. ࣗಈӡస΁ͷԠ༻ 3. JEPAܥ࠷৽ݚڀ ݚڀͷτϨϯυ ᶃ؍ଌۭؒˠજࡏۭؒ ѹॖදݱͰͷޮ཰తͳֶश ᶄήʔϜˠݱ࣮ੈք΁ ࣗಈӡసɾϩϘςΟΫε΁Ԡ༻ ᶅੜ੒ܕˠ༧ଌܕ΁ ܰྔɾ҆ఆͳ+&1"ܥͷొ৔
  5. w ੈքϞσϧͷ֓೦ΛॳΊͯਂ૚ֶशʹ࣋ͪࠐΜͩ࿦จ w ఏҊɿ7r.r$ͷϞδϡʔϧߏ੒  7 7JTJPO ɿ7"&Ͱը૾Λ௿࣍ݩͷજࡏϕΫτϧ ʹѹॖ 

    . .FNPSZ ɿ.%/3//Ͱ࣍ͷ[ͷ֬཰෼෍Λ༧ଌ  $ $POUSPMMFS ɿ୯૚ઢܗϞσϧͰߦಈΛग़ྗ ˠ7.͸ڭࢣͳ͠ɺ$ͷΈਐԽઓུʢ$."&4ʣͰ࠷దԽ w ੒Ռ  ᶃ$BS3BDJOHW ౰࣌ͷ4P5"ੑೳΛୡ੒ ϐΫηϧ͔Β௚઀ϋϯυϧૢ࡞Λֶश  ᶄ7J[%PPN5BLF$PWFS ࣮؀ڥΛҰ੾࢖ΘͣɺੈքϞσϧ಺ʢʹເʣͷγϛϡϨʔγϣϯ͚ͩͰֶश ˠ࣮؀ڥʹసҠͯ͠΋ߴੑೳ z  8PSME.PEFMT<)B4DINJEIVCFS /FVS*14> Large RNNs are highly expressive models that can learn rich spatial and temporal representations of data. However, many model-free RL methods in the literature often only use small neural networks with few parameters. The RL algorithm is often bottlenecked by the credit assignment problem, which makes it hard for traditional RL algorithms to learn millions of weights of a large model, hence in practice, smaller networks are used as they iterate faster to a good policy during training. Ideally, we would like to be able to efficiently train large RNN-based agents. The backpropagation algorithm (Lin- nainmaa, 1970; Kelley, 1960; Werbos, 1982) can be used to train large neural networks efficiently. In this work we look at training a large neural network1 to tackle RL tasks, by dividing the agent into a large world model and a small con- troller model. We first train a large neural network to learn a model of the agent’s world in an unsupervised manner, and then train the smaller controller model to learn to perform a task using this world model. A small controller lets the training algorithm focus on the credit assignment problem on a small search space, while not sacrificing capacity and expressiveness via the larger world model. By training the agent through the lens of its world model, we show that it 1Typical model-free RL models have in the order of 103 to 106 model parameters. We look at training models in the order of 107 parameters, which is still rather small compared to state-of- nent that compresses what it sees into a small representative code. It also has a memory component that makes predic- tions about future codes based on historical information. Finally, our agent has a decision-making component that de- cides what actions to take based only on the representations created by its vision and memory components. Figure 4. Our agent consists of three components that work closely together: Vision (V), Memory (M), and Controller (C) 2.1. VAE (V) Model The environment provides our agent with a high dimensional input observation at each time step. This input is usually 7JTJPO 7 .FNPSZ . BOE$POUSPMMFS $
  6.  ຊνϡʔτϦΞϧͷߏ੒  ੈքϞσϧͱ͸ ੈքϞσϧͷجຊ֓೦ͱ ୅දతͳΞʔΩςΫνϟ 8PSME.PEFMT1MB/FU 344.%SFBNFS  ࣗಈӡసʹ͓͚Δ

    ੈքϞσϧ ӡసγʔϯ΁ͷੈքϞσϧͷ ద༻ͱಈըੜ੒ܕϞσϧ .*-& %SJWF%SFBNFS %SJWF8PSME7-"  -F8PSME.PEFM ܰྔ͔ͭ҆ఆͨ͠ +&1"ܕੈքϞσϧͷ࠷৽ݚڀ ैདྷख๏ͱ՝୊ ϞσϧΞʔΩςΫνϟ ଛࣦܭࢉධՁ࣮ݧ
  7. w 8PSME.PEFMT<)B4DINJEIVCFS /FVS*14>  ΤʔδΣϯτͷऔΓר͘؀ڥΛɼ؍ଌ͔ΒͷֶशʹΑͬͯϞ σϧͱͯ֫͠ಘ͢Δ࿮૊ΈΛఏҊ w ΞʔΩςΫνϟ  7JTJPO

    ߴ࣍ݩͷը૾σʔλΛίϯύΫτͳ௿࣍ݩදݱʹม׵  .FNPSZ աڈɼݱࡏͷঢ়ଶΛݩʹະདྷͷঢ়ଶΛֶश  $POUSPMMFS  7JTJPOͱ.FNPSZΛجʹ࠷దͳߦಈΛग़ྗ  8PSME.PEFMTͷΞʔΩςΫνϟ patial and temporal representations of data. However, model-free RL methods in the literature often only mall neural networks with few parameters. The RL ithm is often bottlenecked by the credit assignment em, which makes it hard for traditional RL algorithms arn millions of weights of a large model, hence in ce, smaller networks are used as they iterate faster to d policy during training. y, we would like to be able to efficiently train large -based agents. The backpropagation algorithm (Lin- maa, 1970; Kelley, 1960; Werbos, 1982) can be used to arge neural networks efficiently. In this work we look ining a large neural network1 to tackle RL tasks, by ng the agent into a large world model and a small con- r model. We first train a large neural network to learn a l of the agent’s world in an unsupervised manner, and rain the smaller controller model to learn to perform k using this world model. A small controller lets the ng algorithm focus on the credit assignment problem mall search space, while not sacrificing capacity and ssiveness via the larger world model. By training the through the lens of its world model, we show that it code. It also has a memory component that makes predic- tions about future codes based on historical information. Finally, our agent has a decision-making component that de- cides what actions to take based only on the representations created by its vision and memory components. Figure 4. Our agent consists of three components that work closely together: Vision (V), Memory (M), and Controller (C) 7JTJPO 7 .FNPSZ . BOE$POUSPMMFS $
  8. w 7JTJPO  ߴ࣍ݩͷը૾σʔλΛίϯύΫτͳ௿࣍ݩදݱʹม׵ w 7BSJBUJPOBM"VUPFODPEFS 7"& Λ࢖༻  ΤʔδΣϯτͷࢹ֮৘ใΛજࡏۭؒʹѹॖ

     .FNPSZ෦ʹ͓͍ͯະདྷͷঢ়ଶΛ༧ଌ͢Δج൫ͱͯ͠࢖༻ w 7"&ͷར఺  ඪ४ͷ"VUPFODPEFS "& ͱҟͳΓϥϯμϜੑΛߟྀՄೳ  ৽͍͠σʔλͷόϦΤʔγϣϯΛੜ੒Մೳ  8PSME.PEFMTͷΞʔΩςΫνϟɿ7JTJPO patial and temporal representations of data. However, model-free RL methods in the literature often only mall neural networks with few parameters. The RL ithm is often bottlenecked by the credit assignment em, which makes it hard for traditional RL algorithms arn millions of weights of a large model, hence in ce, smaller networks are used as they iterate faster to d policy during training. y, we would like to be able to efficiently train large -based agents. The backpropagation algorithm (Lin- maa, 1970; Kelley, 1960; Werbos, 1982) can be used to arge neural networks efficiently. In this work we look ining a large neural network1 to tackle RL tasks, by ng the agent into a large world model and a small con- r model. We first train a large neural network to learn a l of the agent’s world in an unsupervised manner, and rain the smaller controller model to learn to perform k using this world model. A small controller lets the ng algorithm focus on the credit assignment problem mall search space, while not sacrificing capacity and ssiveness via the larger world model. By training the through the lens of its world model, we show that it code. It also has a memory component that makes predic- tions about future codes based on historical information. Finally, our agent has a decision-making component that de- cides what actions to take based only on the representations created by its vision and memory components. Figure 4. Our agent consists of three components that work closely together: Vision (V), Memory (M), and Controller (C) 7JTJPO 7 .FNPSZ . BOE$POUSPMMFS $
  9. w .FNPSZ  աڈɼݱࡏͷঢ়ଶΛݩʹະདྷͷঢ়ଶΛֶश w 3//Λ࢖༻  જࡏۭؒΛར༻ͯ͠ະདྷͷঢ়ଶΛ༧ଌ w .%/

    ࠞ߹ີ౓ωοτϫʔΫ Λಋೖ  ʮෳ਺ͷ͋ΓಘΔະདྷʯΛ֬཰తʹ༧ଌՄೳ  ࣍ͷજࡏঢ়ଶ͕ͲͷΑ͏ͳ෼෍ʹͳΔ͔ͷ֬཰Λग़ྗ  8PSME.PEFMTͷΞʔΩςΫνϟɿ.FNPSZ patial and temporal representations of data. However, model-free RL methods in the literature often only mall neural networks with few parameters. The RL ithm is often bottlenecked by the credit assignment em, which makes it hard for traditional RL algorithms arn millions of weights of a large model, hence in ce, smaller networks are used as they iterate faster to d policy during training. y, we would like to be able to efficiently train large -based agents. The backpropagation algorithm (Lin- maa, 1970; Kelley, 1960; Werbos, 1982) can be used to arge neural networks efficiently. In this work we look ining a large neural network1 to tackle RL tasks, by ng the agent into a large world model and a small con- r model. We first train a large neural network to learn a l of the agent’s world in an unsupervised manner, and rain the smaller controller model to learn to perform k using this world model. A small controller lets the ng algorithm focus on the credit assignment problem mall search space, while not sacrificing capacity and ssiveness via the larger world model. By training the through the lens of its world model, we show that it code. It also has a memory component that makes predic- tions about future codes based on historical information. Finally, our agent has a decision-making component that de- cides what actions to take based only on the representations created by its vision and memory components. Figure 4. Our agent consists of three components that work closely together: Vision (V), Memory (M), and Controller (C) 7JTJPO 7 .FNPSZ . BOE$POUSPMMFS $ 𝑃 ( 𝑧 𝑡 +1 𝑎 𝑡 , 𝑧 𝑡 , h 𝑡 )
  10. w $POUSPMMFS   7JTJPOͱ.FNPSZΛجʹ࠷దͳߦಈΛग़ྗ w ୯७ͳઢܗϞσϧΛ࢖༻  7JTJPOͱ.FNPSZͷ৘ใΛೖྗ 

    ࠷దͳߦಈΛܾఆ  8PSME.PEFMTͷΞʔΩςΫνϟɿ$POUSPMMFS patial and temporal representations of data. However, model-free RL methods in the literature often only mall neural networks with few parameters. The RL ithm is often bottlenecked by the credit assignment em, which makes it hard for traditional RL algorithms arn millions of weights of a large model, hence in ce, smaller networks are used as they iterate faster to d policy during training. y, we would like to be able to efficiently train large -based agents. The backpropagation algorithm (Lin- maa, 1970; Kelley, 1960; Werbos, 1982) can be used to arge neural networks efficiently. In this work we look ining a large neural network1 to tackle RL tasks, by ng the agent into a large world model and a small con- r model. We first train a large neural network to learn a l of the agent’s world in an unsupervised manner, and rain the smaller controller model to learn to perform k using this world model. A small controller lets the ng algorithm focus on the credit assignment problem mall search space, while not sacrificing capacity and ssiveness via the larger world model. By training the through the lens of its world model, we show that it code. It also has a memory component that makes predic- tions about future codes based on historical information. Finally, our agent has a decision-making component that de- cides what actions to take based only on the representations created by its vision and memory components. Figure 4. Our agent consists of three components that work closely together: Vision (V), Memory (M), and Controller (C) 7JTJPO 7 .FNPSZ . BOE$POUSPMMFS $
  11.  8PSME.PEFMT͕Ͱ͖Δ͜ͱ w ͭͷϞδϡʔϧΛ׬શʹ෼཭ͯ͠ॱ൪ʹֶश  7 7JTJPO ɿ7"&Ͱը૾ˠજࡏϕΫτϧ ʹѹॖʢڭࢣͳ͠ʣ 

    . .FNPSZ ɿ.%/3//Ͱʮ࣍ͷ ͷ֬཰෼෍ʯΛ༧ଌʢڭࢣͳ͠ɺߦಈ৚݅෇͖ʣ  $ $POUSPMMFS ɿ ͱ3//ӅΕঢ়ଶ ͔ΒߦಈΛग़͢୯૚ઢܗϞσϧ ʢΘ͔ͣ਺ඦύϥϝʔλΛਐԽઓུ$."&4Ͱ࠷దԽʣ $."&4 $PWBSJBODF.BUSJY"EBQUBUJPO&WPMVUJPO4USBUFHZʣɿڞ෼ࢄߦྻదԠਐԽઓུ w ʮເͷதʯͰ࿅शˠͦͷ··ݱ࣮ͰӡసͰ͖Δ  ੈքϞσϧ͕ੜ੒͢ΔԾ૝؀ڥʢເʣͰ$POUSPMMFSΛֶश  ೖྗ͸ ʢࠓͷܠ৭ʣ  ʢهԱʣɺग़ྗ͸ߦಈ  ˠ%PPNͰ͸ݱ࣮؀ڥΛεςοϓ΋࢖Θͣʹֶश͠ɺ ࣮؀ڥʹసҠͯ͠੒ޭ z z z h z h a Figure 6. RNN with a Mixture Density Network output layer. The MDN outputs the parameters of a mixture of Gaussian distribution used to sample a prediction of the next latent vector z. In our approach, we approximate p(z) as a mixture of Gaus- sian distribution, and train the RNN to output the probability distribution of the next latent vector zt+1 given the current and past information made available to it. More specifically, the RNN will model P(zt+1 | at, zt, ht), where at is the action taken at time t and ht is the hidden state of the RNN at time t. During sampling, we can adjust a temperature parameter ⌧ to control model uncertainty, as done in (Ha & Eck, 2017) – we will find adjusting ⌧ to be useful for training our controller later on. This approach is known as a Mixture Density Net- work (Bishop, 1994) combined with a RNN (MDN-RNN) (Graves, 2013; Ha, 2017a), and has been applied in the past for sequence generation problems such as generating handwriting (Graves, 2013) and sketches (Ha & Eck, 2017). at = Wc [zt ht] + bc (1) In this linear model, Wc and bc are the weight matrix and bias vector that maps the concatenated input vector [zt ht] to the output action vector at . 2.4. Putting V, M, and C Together The following flow diagram illustrates how V, M, and C interacts with the environment: Figure 8. Flow diagram of our Agent model. The raw observation is first processed by V at each time step t to produce zt . The input into C is this latent vector zt concatenated with M’s hidden state ht at each time step. C will then output an action vector at for motor control, and will affect the environment. M will then take the current zt and action at as an input to update its own hidden state to produce ht+1 to be used at time t + 1. ࣮ݧͰࣔͨ͠ޮՌʢ࿦จ5BCMFTrʣ $BS3BDJOHW œ %PPN5BLF$PWFSW œ ˞খ͞ͳ$POUSPMMFSʴ಺෦ϞσϧͰɼࢹ֮ೖྗ͔ΒํࡦֶशɾసҠΛ࣮ূ
  12. w 8PSME.PEFMTͷʮ෼཭ֶशʯΛ&&ʢFOEUPFOEʣֶशʹਐԽ  8PSME.PEFMTɿ7ɾ.ɾ$Λಠཱͯ͠ॱ൪ʹֶश  1MB/FUɿը૾͔Βજࡏঢ়ଶɾະདྷ༧ଌɾใु༧ଌΛͭͷ໨తؔ਺Ͱ ·ͱΊֶͯश ˠϞδϡʔϧؒͷෆ੔߹͕ͳ͘ͳΓɺ༧ଌਫ਼౓͕޲্ w ΋͏ͭͷҧ͍ɿϙϦγʔΛ࣋ͨͣɺͦͷ৔Ͱϓϥϯχϯά

     8PSME.PEFMTɿֶशࡁΈ$POUSPMMFS͕ߦಈΛଈग़ྗ  1MB/FUɿຖεςοϓɺજࡏۭؒͰະདྷΛγϛϡϨʔτ͠ߦಈྻΛ୳ࡧ w ΞʔΩςΫνϟʢ͢΂ͯજࡏۭؒͰಈ࡞ʣ  જࡏμΠφϛΫεϞσϧɿߦಈΛ৚݅ʹະདྷͷঢ়ଶΛ༧ଌ  ใुϞσϧɿ֤ঢ়ଶ͔ΒใुΛ༧ଌ  ϓϥϯφʔɿ༧ଌใुΛ࠷େԽ͢ΔߦಈྻΛ୳ࡧʢ$&.ʣ  %FFQ1MBOOJOH/FUXPSL 1MB/FU <)BGOFS *$.-> %FFQ1MBOOJOH/FUXPSL 344.
  13.  ະདྷ༧ଌϞσϧɿ344. 3FDVSSFOU4UBUF4QBDF.PEFM w 1MB/FUͷ৺ଁ෦ʹʮજࡏμΠφϛΫεϞσϧʯͷਖ਼ମ͕344.  ʮߦಈΛ৚݅ʹະདྷͷঢ়ଶΛ༧ଌʯΛ୲͏த֩Ϟδϡʔϧ  ը૾͔Βજࡏঢ়ଶΛ࡞ΓɺજࡏۭؒͰະདྷΛ༧ଌ͢Δ෦෼ͦͷ΋ͷ w

    ैདྷख๏ͱ՝୊  ܾఆ࿦తϞσϧʢ3//ʣɿಉ͡ೖྗͳΒৗʹಉ͡ग़ྗˠະདྷͷ෼ذΛදݱෆՄ  ֬཰తϞσϧʢ44.ʣɿਖ਼ن෼෍͔ΒαϯϓϦϯάͰ࣍ঢ়ଶܾఆˠ௕ظతͳهԱʹऑ఺ w 344.͸ܾఆ࿦త3//ͱ֬཰త44.Λ౷߹ͨ͠ঢ়ଶۭؒϞσϧ  ܾఆ࿦తϝϞϦߋ৽ɿ(36Ͱաڈͷঢ়ଶɾߦಈΛهԱ  ֬཰తঢ়ଶͷੜ੒ɿϝϞϦ͔Β֬཰෼෍Λग़ྗˍαϯϓϦϯά  ؍ଌͱใुͷੜ੒ɿঢ়ଶ ͔Βը૾ͱใुΛ෮ݩ ˠܾఆ࿦తهԱ ʴ֬཰తঢ়ଶ Ͱ௕ظهԱͱෆ࣮֬ੑΛཱ྆ (ht , st ) (ht ) (st ) ht = f(ht−1 , st−1 , at−1 ) st ∼ p(st |ht ) 3FDVSSFOU4UBUF4QBDF.PEFM
  14.  %SFBNFS<)BGOFS *$-3> w 1MB/FUͱಉ͡344.Λ࢖͍ͭͭɺʮϓϥϯχϯάˠϙϦγʔֶशʯ΁ਐԽ  ؀ڥμΠφϛΫε͸લεϥΠυͷ344.Ͱֶशʢڞ௨ͷ౔୆ʣ  ҧ͍͸ߦಈͷܾΊ͔ͨʹ%SFBNFS͸ϙϦγʔΛ௚઀ֶश 

    ૝૾্ͷي੻ʢJNBHJOBUJPOʣͷதͰڧԽֶश w 1MB/FUͷ՝୊ʢલʑεϥΠυͷ෮शʣ  ߦಈΛͦͷ৔Ͱ࠷దԽʢϓϥϯχϯάϕʔεɺ$&.ʣ  ຖεςοϓ࠷దԽ͕ඞཁˠਪ࿦͕஗͍ɺ௕ظܭը΋ۤख w %SFBNFSͷղܾ  જࡏۭؒͰະདྷΛ༧ଌ͠ɺϙϦγʔΛֶशʢ"DUPS$SJUJDʣ  $SJUJD͕૝૾ͷ஍ฏઢͷઌ·ͰՁ஋Λݟੵ΋Δˠ௕ظܭըʹڧ͍  ਪ࿦࣌͸ϙϦγʔͷΈˠϓϥϯχϯάෆཁͰߴ଎ complex behaviors. While learning world models from high-dimensional sensory inputs is becoming feasible through deep learning, there are many potential ways for deriving behaviors from them. We present Dreamer, a reinforcement learning agent that solves long-horizon tasks from images purely by latent imagination. We efficiently learn behaviors by propagating analytic gradients of learned state values back through trajectories imagined in the compact state space of a learned world model. On 20 challenging visual control tasks, Dreamer exceeds existing approaches in data-efficiency, computation time, and final performance. 1 INTRODUCTION Value and Action Learned by Latent Imagination Dataset of Experience Learned Latent Dynamics Figure 1: Dreamer Intelligent agents can achieve goals in complex environments even though they never encounter the exact same situation twice. This ability requires building representations of the world from past experience that enable generalization to novel situations. World models offer an explicit way to represent an agent’s knowledge about the world in a parametric model that can make predictions about the future. When the sensory inputs are high-dimensional images, latent dynamics models can abstract observations to predict forward in compact state spaces (Watter et al., 2015; Oh et al., 2017; Gregor et al., 2019). Compared to predictions in image space, latent states have a small memory footprint that enables imagining thousands of trajectories in parallel. Learning effective latent dynamics models is becoming feasible through advances in deep learning and latent variable models (Krishnan et al., 2015; Karl et al., 2016; Doerr et al., 2018; Buesing et al., 2018). Behaviors can be derived from dynamics models in many ways. Often, imagined rewards are maximized with a parametric policy (Sutton, 1991; Ha and Schmidhuber, 2018; Zhang et al., 2019) or by online planning (Chua et al., 2018; Hafner et al., 2018). However, considering only rewards within a fixed imagination horizon results in shortsighted behaviors (Wang et al., 2019). Moreover, prior work commonly resorts to derivative-free optimization for robustness to model errors (Ebert et al., 2017; Chua et al., 2018; Parmas et al., 2019), rather than leveraging analytic gradients offered by neural network dynamics (Henaff et al., 2019; Srinivas et al., 2018). We present Dreamer, an agent that learns long-horizon behaviors from images purely by latent imagination. A novel actor critic algorithm accounts for rewards beyond the imagination horizon while making efficient use of arXiv:1912.01603v3 [cs.LG] 17 Mar 2020 ᶃܦݧσʔλͷऩू ᶄજࡏμΠφϛΫεͷֶश ᶅજࡏۭؒͰͷ૝૾ʹΑΔ ɹํࡦɾՁ஋ͷֶश
  15.  %SFBNFSͷൃలɿ7ˠ7ˠ7 w જࡏۭؒͰϙϦγʔΛֶश͢Δ%SFBNFS  ੈ୅ʹΘͨΔվྑΛܦͯ൚༻Խ %SFBNFS *$-3 7ʢॳ୅ʣ ओͳߩݙ

    જࡏۭؒͰϙϦγʔΛ௚઀ֶश જࡏදݱ Ψ΢ε෼෍ʢ࿈ଓʣ ର৅λεΫ %FFQ.JOE$POUSPM4VJUF ಛ௃ 344.ϕʔεʗ૝૾ͰBDUPSDSJUJD %SFBNFS7 *$-3 7ʢ཭ࢄԽʣ ओͳߩݙ ཭ࢄજࡏม਺ʴ,-CBMBODJOH જࡏදݱ ΧςΰϦΧϧʢºʣ ର৅λεΫ "UBSJʢਓؒ௒͑ୡ੒ʣ ಛ௃ ཭ࢄදݱͰදݱྗͱ҆ఆੑ͕޲্ %SFBNFS7 BS9JW 7ʢ൚༻Խʣ ओͳߩݙ TZNMPH༧ଌʴݻఆϋΠύϥ જࡏදݱ ΧςΰϦΧϧʴTZNMPH ର৅λεΫ  λεΫʢ୯Ұઃఆʣ ಛ௃ .JOFDSBGUͰμΠϠΛ֫ಘ͢Δॳͷ"* ˠ ˠ ʮ཭ࢄදݱͷಋೖʯͱʮεέʔϧෆมͳֶशʯˠޙͷࣗಈӡసɾ+&1"ܥͷજࡏۭؒઃܭʹӨڹ
  16.  ·ͱΊɿੈքϞσϧͱ͸ w ੈքϞσϧ͸ʮ؀ڥͷৼΔ෣͍Λ಺෦ʹ֫ಘͨ͠༧ଌثʯ  ਓ͕ؒ಄ͷதͰγϛϡϨʔγϣϯ͢Δͷͱಉ͡࢓૊ΈΛ"*಺Ͱ࣮ݱ͢Δ͜ͱͰɺະདྷ༧ଌͱԾ૝ମ ݧֶशΛՄೳ w ୅දతΞʔΩςΫνϟ͸7JTJPO .FNPSZ

    $POUSPMMFS  8PSME.PEFMT 7"& 3// .-1 Λى఺ʹɺ1MB/FU͕344.Ͱજࡏۭؒ༧ଌΛ࣮ݱɺ%SFBNFS Ͱજࡏۭؒ಺ϙϦγʔֶशΛୡ੒ w ਐԽͷຊ࣭͸ʮ؍ଌۭؒˠજࡏۭؒʯʮϓϥϯχϯάˠϙϦγʔֶशʯ  ܭࢉޮ཰ͱֶश҆ఆੑΛߴΊͨ݁ՌɺήʔϜ͔Βݱ࣮ੈքʢࣗಈӡసɾϩϘοτʣ΁ͷԠ༻͕Մೳ ʹ
  17.  ຊνϡʔτϦΞϧͷߏ੒  ੈքϞσϧͱ͸ ੈքϞσϧͷجຊ֓೦ͱ ୅දతͳΞʔΩςΫνϟ 8PSME.PEFMT1MB/FU 344.%SFBNFS  ࣗಈӡసʹ͓͚Δ

    ੈքϞσϧ ӡసγʔϯ΁ͷੈքϞσϧͷ ద༻ͱಈըੜ੒ܕϞσϧ .*-& %SJWF%SFBNFS %SJWF8PSME7-"  -F8PSME.PEFM ܰྔ͔ͭ҆ఆͨ͠ +&1"ܕੈքϞσϧͷ࠷৽ݚڀ ैདྷख๏ͱ՝୊ ϞσϧΞʔΩςΫνϟ ଛࣦܭࢉධՁ࣮ݧ
  18.  ͳͥࣗಈӡసʹੈքϞσϧ͕ඞཁ͔ w ैདྷͷࣗಈӡసٕज़͕௚໘͢Δͭͷຊ࣭త՝୊ΛੈքϞσϧ͕ղܾ ةݥγφϦΦͷرগੑ ࣄނɾٸϒϨʔΩɾาߦऀඈͼग़͠౳ͷ ϨΞΠϕϯτ͕࣮૸ߦσʔλʹগͳ͍ ˠੈքϞσϧ͕ະܦݧγφϦΦΛੜ੒ σʔλ֦ுɾϩϯάςʔϧରԠʹߩݙ ࣮ंࢼߦͷϦεΫ

    ڧԽֶशͰ࣮ंΛ૸ΒͤΔͷ͸ ࣄނϦεΫɾίετͷ྆໘Ͱඇݱ࣮త ˠੈքϞσϧ಺ͰԾ૝૸ߦΛ࣮ߦ ϙϦγʔධՁɾվળΛ҆શʹ ໛฿ֶशͷݶք ઐ໳ՈσʔλΛਅࣅΔ͚ͩͰ͸ ؀ڥͷҼՌΛཧղͤͣະ஌ঢ়گͰࣦഊ ˠੈքϞσϧ͕؀ڥͷಈྗֶΛ֫ಘ ෼෍֎γφϦΦ΁ͷ൚ԽΛڧԽ ༧ଌͱܭըͷ౷߹ ैདྷͷ&&͸஌֮ˠߦಈΛ௚݁͠ɺ ະདྷ༧ଌͷ಺෦දݱΛ࣋ͨͳ͍ ˠੈքϞσϧ͕ʮ༧ଌ ܭըʯΛ౷߹ ղऍੑͱઌಡΈೳྗΛ֫ಘ
  19.  .*-&<)V /FVS*14>ʢ8BZWFʣ w 344.Ͱӡస؀ڥΛֶशͭͭ͠ɺΤΩεύʔτσʔλ͔ΒϙϦγʔΛ໛฿ֶश͢Δࣗಈӡస ੈքϞσϧ  344.ʹΑΓ؀ڥμΠφϛΫεΛֶश  &YQFSUσʔλ͔ΒϙϦγʔΛ໛฿ֶश

    w ΞʔΩςΫνϟ  7JTJPO ը૾ˠ#&7ˠજࡏදݱ  .FNPSZ 344.ʹΑΓঢ়ଶભҠΛϞσϧԽ ߦಈ৚݅෇͖ͰະདྷΛ༧ଌ  $POUSPMMFS ϙϦγʔωοτϫʔΫʢ.-1
  20.  .*-&<)V /FVS*14>ʢ8BZWFʣ w ֶशϑΣʔζʢऩूࡁΈӡసσʔλ͔Βֶशʣ  7JTJPOɿը૾Λજࡏಛ௃ ʹѹॖ  .FNPSZʢ344.ʣɿաڈͷঢ়ଶͱߦಈ͔Βɺݱࡏͷঢ়ଶ

    Λֶश  %FDPEFSɿঢ়ଶ͔Βը૾ͱ#&7Λ࠶ߏ੒ˠ؀ڥͷߏ଄ΛજࡏʹຒΊࠐΉ  1PMJDZʢ໛฿ֶशʣɿঢ়ଶ͔ΒߦಈΛ༧ଌ͠ɺΤΩεύʔτߦಈͱͷޡࠩΛ࠷খԽ w ਪ࿦ϑΣʔζʢ$"3-"্Ͱͷ࣮ंӡసʣ  0CTFSWJOHʢ؍ଌϞʔυʣɿΧϝϥը૾͔Βঢ়ଶΛਪఆˠ1PMJDZ͕ߦಈΛग़ྗˠं͕࣮྆ߦ  *NBHJOJOHʢ૝૾Ϟʔυʣɿ؍ଌͳ͠Ͱ344.͕ະདྷͷঢ়ଶΛ༧ଌˠσίʔμ͕ະདྷ#&7Λੜ੒ ˠ؀ڥμΠφϛΫεֶशº໛฿ֶशͰɺڧԽֶशͳ͠ʹ҆શʹӡసϙϦγʔΛ֫ಘ xt (ht , st )
  21.  %SJWF%SFBNFS<8BOH &$$7> w ࣮ੈքͷӡసγφϦΦ͔Βߏங͞Εͨੜ੒ܕ8PSME.PEFM  ಈըੜ੒ͱߦಈ༧ଌΛ౷߹ͨ͠ϑϨʔϜϫʔΫͱͯ͠ઃܭ w %J ff

    VTJPO.PEFM  ؀ڥੜ੒ɾཧղͱͯ͠࢖༻  )%.BQɼ%#PYɼ5FYU͔Βಈըੜ੒ w "DUJPO'PSNFS  (36 BUUFOUJPOʹΑΔ࣌ܥྻϞσϦϯά  աڈߦಈ͔Βະདྷͷߏ଄ঢ়ଶΛਪఆ w "DUJPO%FDPEFS  %J ff VTJPOಛ௃ աڈߦಈΛೖྗ͠ɼ.-1ͰϙϦγʔΛੜ੒
  22. w ಉ͡ʮࣗಈӡసͷੈքϞσϧʯͰ΋ɺ໨తɾग़ྗɾ׆༻๏͸ҟͳΔ  .*-&ͱ%SJWF%SFBNFS .*-& ⾢੍ޚܕʢ$POUSPMPSJFOUFEʣ ग़ྗɹɿજࡏঢ়ଶʴߦಈʢεςΞϦϯάɾ଎౓ʣ ໨తɹɿดϧʔϓͰͷϙϦγʔֶश ڧΈɹɿܰྔɾ࣮࣌ؒಈ࡞ɺ$"3-"Ͱ௚઀ӡస ऑΈɹɿ؍ଌͷ࠶ߏ੒͕ඞཁɺજࡏͷղऍੑʹݶք

    ׆༻ྫɿγϛϡϨʔλ಺ͷΤϯυπʔΤϯυӡస %SJWF%SFBNFS ⾣ੜ੒ܕʢ(FOFSBUJWFʣ ग़ྗɹɿߴ඼࣭ͳӡసಈը ໨తɹɿσʔλੜ੒ɾγφϦΦ֦ு ڧΈɹɿ)%.BQɾςΩετͰଟ༷ͳঢ়گΛ߹੒ ऑΈɹɿਪ࿦ίετେɺϦΞϧλΠϜ੍ޚ͸ࠔ೉ ׆༻ྫɿϨΞγφϦΦੜ੒ɺධՁ༻σʔλ࡞੒ ໾ׂ෼୲੍ޚܕ͸ʮ૸ΔͨΊʯɺੜ੒ܕ͸ʮֶशͷͨΊʯ ं྆ଆ"*ͱσʔλੜ੒ύΠϓϥΠϯͰ໾ׂΛ෼୲
  23.  %(4ʹΑΔࣗಈӡసੈքϞσϧɿը૾ۭؒϕʔε w %SJWF%SFBNFS%<;IBP $713>  ಈըੜ੒ੈքϞσϧʢ%SJWF%SFBNFSʣ %(BVTTJBO4QMBUUJOH  ੈքϞσϧΛʮσʔλϚγϯʯͱͯ͠׆༻͠ɺ৽ني੻ʢMBOFDIBOHF

    Ճ଎ݮ଎౳ʣͷಈըΛੜ੒  ੜ੒͞ΕͨಈըΛ%(4ʹऔΓࠐΜͰɺ܇࿅σʔλ෼෍֎ͷࢹ఺΋ߴ඼࣭ʹ࠶ߏ੒  ը૾ۭؒͰະདྷΛੜ੒ %(4Ͱۭؒ੔߹ੑΛ֬อ
  24.  %(4ʹΑΔࣗಈӡసੈքϞσϧɿજࡏۭؒϕʔε w %(BVTTJBOΛ಺෦දݱͱͯ͠׆༻͠ɺ%ۭؒͰະདྷͷPDDVQBODZΛ༧ଌ  ໰୊ઃఆͷ࠶ఆٛɿݱ؍ଌΛ৚݅ͱͨ͠%0DDVQBODZ'PSFDBTUJOH w (BVTTJBO8PSME<;VP $713> 

    γʔϯਐԽΛཁૉʹ෼ղͯ͠%(BVTTJBOۭؒͰ༧ଌ ᶃࣗंӡಈิਖ਼ɿ੩తγʔϯશମΛFHPNPUJPOͰҐஔ߹Θͤ ᶄಈత෺ମͷہॴӡಈɿҠಈ͢Δ(BVTTJBOͷΈΛݸผʹ༧ଌ ᶅ৽ن؍ଌͷิ׬ɿݱϑϨʔϜ͔Β৽͘͠ݟ͑ͨྖҬΛ௥Ճ ݁ՌɿOV4DFOFTͰ୯ϑϨʔϜൺN*P6 Ҏ্ɺ௥Ճܭࢉͳ͠ w %VBM-BUFOU8PSME.PEFMT %-8. <;IV $713>  ஈ֊(BVTTJBOத৺ࣄલֶश ᶃ(BVTTJBO fl PX༠ಋˠPDDVQBODZ༧ଌ༻ ᶄFHPQMBOOJOH༠ಋˠಈ࡞ܭը༻ ˠ%PDDVQBODZ஌֮ɾ%༧ଌɾӡಈܭըͷλεΫͰ4P5"
  25.  7-"ºੈքϞσϧʹΑΔࣗಈӡస w 7-" 7JTJPO-BOHVBHF"DUJPO   ˕ڧΈɿݴޠʹΑΔߴϨϕϧਪ࿦ɾࢦࣔཧղɾγʔϯཧղɾղऍੑ  ºݶքɿ࣌ؒμΠφϛΫεͷϞσϧԽ͕ऑ͍ˠ4VQFSWJTJPO%F

    fi DJUʢૄͳߦಈ৴߸ͷΈʣ w 8PSME.PEFM ੈքϞσϧ   ˕ڧΈɿະདྷ༧ଌɾԾ૝ମݧֶशɼ؀ڥμΠφϛΫεͷ಺෦֫ಘ  ºݶքɿݴޠʹΑΔߴϨϕϧਪ࿦͕Ͱ͖ͳ͍ˠੜ੒ͨ͠ະདྷΛʮධՁʯͰ͖ͳ͍ w 7-"ºੈքϞσϧ  ᶃ૝૾͔ͯ͠Βӡస͢Δɿʮ΋͜͠͏ಈ͍ͨΒͲ͏ͳΔ͔ʯΛ಺෦γϛϡϨʔτ͔ͯ͠Βߦಈ  ᶄ4VQFSWJTJPO%F fi DJUͷղফɿະདྷ༧ଌ͕ີͳࣗݾڭࢣ৴߸Λఏڙ͠ɺσʔλεέʔϦϯάଇΛ૿෯  ᶅߦಈ৚݅෇͖ҼՌਪ࿦ɿࣗ෼ͷߦಈ࣍ୈͰະདྷ͕Ͳ͏มΘΔ͔Λਪ࿦Ͱ͖Δʢ8IBUJGʣ  ᶆ-POHUBJM΁ͷରԠɿ--.ͷੈք஌ࣝͰϨΞγφϦΦΛิ׬ɺ໛฿ֶशͷݶքΛಥഁ
  26.  ͭͷ݁߹ΞʔΩςΫνϟύλʔϯ w 7-"ͱੈքϞσϧͷ݁߹͸݁߹ͷਂ͞ʹΑͬͯͭʹେผ 1BUUFSO" ෼཭ܕ &YUFSOBM4JNVMBUPS 7-" 8. ֎෦ɾಠཱ

    ੈքϞσϧ͕֎෦γϛϡϨʔλ ͱͯ͠ಠཱಈ࡞ ୅දख๏ *3-7-" ใुධՁɾ3-܇࿅ 1BUUFSO# ಛ௃ڞ༗ܕ 'FBUVSF4IBSJOH Enc WM VLA ڞ༗දݱɾฒྻ ڞ༗Τϯίʔμ͔Β 8.ͱ7-"Λฒྻग़ྗ ୅දख๏ %SJWF7-"8 ࣄલֶशɾσʔλεέʔϦϯά 1BUUFSO$ ΠϯλʔϦʔϒܕ *OUFSMFBWFE 7-" 8. 7-" 8. ดϧʔϓ ༧ଌͱܭը͕ަޓ ༧ଌͱܭըΛ ดϧʔϓͰަޓ࣮ߦ ୅දख๏ 7-"8PSME ൓লతਪ࿦ 1BUUFSO% જࡏۭؒ౷߹ܕ 6OJGJFE-BUFOU 6OJGJFE -BUFOU WM VLA ҰମԽɾ8IBUJGਪ࿦ 8.જࡏΛҙࢥܾఆม਺ͱͯ͠ ౷ҰۭؒͰ࠷దԽ ୅දख๏ %SJWF8PSME7-" ҼՌ8IBUJGਪ࿦ ݁߹ͷਂ͞ʢࠨˠӈͰਂ·Δʣɿ෼཭ˠฒྻˠަޓˠҰମԽ
  27.  1BUUFSO"ɿ෼཭ܕ w ੈքϞσϧ͕ಠཱͨ͠֎෦γϛϡϨʔλͱͯ͠ಈ࡞͠ɺ7-"ͷߦಈΛࣄޙݕূ ೖྗ ը૾ ࢦࣔ 7-" ߦಈҊ ݕূ

    8PSME.PEFM ϩʔϧΞ΢τ ࠾༻٫Լ ੈքϞσϧ͸7-"ͷ֎෦ʹஔ͔Εɺߦಈͷ҆શੑΛࣄޙݕূ ˕ϝϦοτ ɾ࣮૷͕୯७ɺطଘ7-"ʹ௥ՃՄೳ ɾϞδϡʔϧಠཱͰ։ൃɾอक͠΍͍͢ ɾ҆શੑݕূͱͯ͠໌֬ͳ໾ׂ෼୲ ºσϝϦοτ ɾߏ଄త෼཭ʹΑΓજࡏ஌ࣝͷసҠ͕ࠔ೉ ɾਪ࿦͕࣌ؒ૿ՃʢϩʔϧΞ΢τͷίετʣ ɾ7-"ͷදݱྗ޲্ʹ͸د༩͠ͳ͍
  28.  1BUUFSO"ɿ෼཭ܕ w *3-7-"<+JBOH BS9JW>  3FXBSE8PSME.PEFMͰ7-"ϙϦγʔΛ܇࿅ ੈքϞσϧΛ֎෦ͷใुධՁثͱͯ͠ಠཱ׆༻  110ʹΑΔڧԽֶशͰ҆શɾշదɾޮ཰Λ࠷దԽ

     /"74*.WͰ4P5"ɺ$713"($TUSVOOFSVQ Figure 2. Overview of the IRL-VLA Framework. This figure illustrates the three-stage pipeline of our close-loop Reinforcement Learning via Reward World Model framework for Vision-Language-Action (VLA) in autonomous driving. a) Imitation Policy Learning initializes the
  29.  1BUUFSO#ɿಛ௃ڞ༗ܕ w ڞ༗Τϯίʔμ͔ΒੈքϞσϧ༧ଌͱߦಈ༧ଌΛฒྻʹग़ྗ͠ɺදݱΛڞ௨Խ ˕ϝϦοτ ɾ&OEUPFOEֶशՄೳ ɾେن໛σʔλͷදݱྗΛ࠷େݶ׆༻ ɾ/"74*.ϕϯνϚʔΫͰ4P5"ୡ੒ ºσϝϦοτ ɾߦಈ৚݅෇͖ҼՌਪ࿦Λܽ͘

    ɾ൓࣮Ծ૝ʢ8IBUJGʣ૝૾੍͕ݶ͞ΕΔ ೖྗ ڞ༗ &ODPEFS 8PSME.PEFM 7-"&YQFSU ˠະདྷը૾ "DUJPO&YQFSU ܰྔ ˠي੻ ✱ະདྷը૾༧ଌ͕ີͳࣗݾڭࢣ৴߸Λఏڙ
  30.  1BUUFSO#ɿಛ௃ڞ༗ܕ w %SJWF7-"8<-J *$-3>  4VQFSWJTJPO%F fi DJUʢ؂ಜ৴߸ෆ଍ʣɿ਺ԯύϥϝʔλͷ7-"Λɺ਺࣍ݩͷߦಈ৴߸͚ͩͰ܇࿅ ͢Δෆۉߧ

    ˠૄͳߦಈ৴߸ʢي੻ʣ͚ͩͰͳ͘ɺີͳະདྷը૾༧ଌʢࣗݾڭࢣʣ  σʔλεέʔϦϯάଇΛ૿෯ʢഒن໛Ͱݕূʣ DRIVEVLA-W0: WORLD MODELS AMPLIFY DATA SCALING LAW IN AUTONOMOUS DRIVING Yingyan Li1→ Shuyao Shang1→ Weisong Liu1→ Bing Zhan1→ Haochen Wang1→ Yuqi Wang1 Yuntao Chen1 Xiaoman Wang2 Yasong An2 Chufeng Tang2 Lu Hou2 Lue Fan1 Zhaoxiang Zhang1 1NLPR, Institute of Automation, Chinese Academy of Sciences (CASIA) 2Yinwang Intelligent Technology Co. Ltd. {liyingyan2021,shangshuyao2024,liuweisong2024,zhanbing2024}@ia.ac.cn {lue.fan, zhaoxiang.zhang}@ia.ac.cn Code: https://github.com/BraveGroup/DriveVLA-W0 DriveVLA-W0 (World Modeling) VLA (Action Prediction) Image Text Action Sparse Action Supervision Image Text Action Image Text Action 700K 7M 70M 4.00 4.25 4.50 4.75 5.00 5.25 5.50 Number of Frames Collision Rate (‱) TransFuser VLA (Action Prediction) DriveVLA-W0 (World Modeling) Visual & Action Supervision Decrease 20.4% (a) Action Prediction vs. World Modeling (b) Scaling with Data Size Figure 1: World modeling as a catalyst for VLA data scalability. (a): Unlike standard VLAs trained solely on action supervision, our DriveVLA-W0 is trained to predict both future actions and [cs.CV] 18 Dec 2025
  31.  1BUUFSO$ɿΠϯλʔϦʔϒܕ w ༧ଌͱܭըΛดϧʔϓͰަޓʹ࣮ߦ͠ɺ૝૾ˠ൓লˠي੻मਖ਼ͷαΠΫϧΛ࣮ݱ ˕ϝϦοτ ɾดϧʔϓͰҰ؏ੑ ɾઌಡΈೳྗ͕େ෯޲্ ºσϝϦοτ ɾਪ࿦͕࣌ؒ௕͍ ɾෳ਺ճͷϩʔϧΞ΢τ

    7-" 7-"͕ॳظي੻Λੜ੒ 8PSME.PEFM ߦಈʹج͖ͮ࣍ϑϨʔϜΛ૝૾ 7-" 7-"͕૝૾ະདྷΛ൓লɾਪ࿦ 7-" 7-"͕ي੻Λमਖ਼ɾग़ྗ ดϧʔϓ ᶃ ᶄ ᶅ ᶆ
  32.  1BUUFSO%ɿજࡏۭؒ౷߹ܕ w ੈքϞσϧͷજࡏঢ়ଶΛ7-"ͷҙࢥܾఆม਺ͱͯ͠௚઀࠷దԽɺߦಈ৚݅෇͖ҼՌਪ࿦Λ ࣮ݱ ˕ϝϦοτ ɾߦಈ৚݅෇͖ҼՌਪ࿦Ͱ௕ظܭըʹڧ͍ ɾજࡏۭؒॲཧͰߴ଎ɾεέʔϥϒϧ ºσϝϦοτ ɾΞʔΩςΫνϟઃܭ͕ෳࡶ

    ɾજࡏදݱͷֶश͕ෆ҆ఆʹͳΓ΍͍͢ 8.જࡏҙࢥܾఆม਺ɿީิߦಈ͕ະདྷΛͲ͏ม͑Δ͔Λ಺෦ͰධՁ ೖྗ ʢߦಈ৚݅෇͖૝૾ʣ 8.જࡏ 7-"ਪ࿦ ߦಈ ߦಈ
  33.  1BUUFSO%ɿજࡏۭؒ౷߹ܕ w %SJWF8PSME7-"<+JB BS9JW>  ੈքϞσϧજࡏΛҙࢥܾఆม਺ͱͯ͠࠷దԽ͠ɺڞ༗જࡏۭؒͰ੍ޚՄೳͳ૝૾Λ௨ͯ͡౷Ұతͳ ҼՌ8IBUJGਪ࿦Λ࣮ݱ  ϐΫηϧϨϕϧSPMMPVUΛճආʢߴ଎ʣ

     ౷ҰతͳҼՌ8IBUJGਪ࿦Λ࣮ݱ  /"74*.WͰ1%.4 /"74*.WͰ&1%.4 OV4DFOFTͰ$3ʢ4P5"ʣ DriveWorld-VLA: Unified Latent-Space World Modeling with Vision–Language–Action for Autonomous Driving Feiyang jia * 1 2 Lin Liu * 1 2 Ziying Song 1 Caiyan Jia † 1 Hangjun Ye 2 Xiaoshuai Hao † 2 Long Chen ‡ 2 Refine Absence of Prospective Rollout Limited Knowledge Transfer Unified Feature Sharing & Causal Reasoning (a) Disentangled Interaction (b) Feature-Sharing (c) Our DriveWorld-VLA VLA Model Action Reward World Model VLA & World Model Shared Feature Space Action Observe VLA & World Model Shared Feature Space Action Observe Control Refine PDMS in NAVSIM Avg. Col. Rate in nuScenes (d) Performance 0.16 91.3 Figure 1. Comparison of VLA & World Model Coupling Strategies. (a) Disentangled Interaction: The world model acts as an external [cs.CV] 6 Feb 2026
  34. w ਪ࿦࣌ͷ8IBUJGਪ࿦ʢ%SJWF8PSME7-"ʣ  ෳ਺ͷߦಈީิΛજࡏۭؒͰฒྻධՁ͔ͯ͠Βग़ྗ  1BUUFSO%ɿજࡏۭؒ౷߹ܕ ਪ࿦ϧʔϓ 0CTFSWF ηϯαೖྗˠજࡏදݱzt 7-"ީิੜ੒

    ߦಈީิ{a1 , a2 , . . . , ak } 8.8IBUJGਪ࿦ ֤ ͰજࡏϩʔϧΞ΢τ ak zk t+1 = WM(zt , ak ) ධՁ ֤જࡏະདྷΛ҆શੑɾޮ཰Ͱ࠾఺ 7-"ग़ྗ ࠷ྑߦಈ Λ࣮ं΁ a* ۩ମྫɿԫ৭৴߸ͷަࠩ఺ʢ଎౓LNIʣ ٸϒϨʔΩ a1 ˠޙଓं͕௥ಥϦεΫ ؇΍͔ʹݮ଎ a2 ˠ৴߸खલͰ҆શఀࢭ ͦͷ··௨ա a3 ˠ੺৴߸ਐೖɾҧ൓ϦεΫ Ճ଎ͯ͠௨ա a4 ˠԫ৭ͷ͏ͪʹ௨աɾ଎౓௒ա બ୒ɿ ʢ؇΍͔ʹݮ଎ʣΛ࣮ंʹग़ྗ a2 ˞ϐΫηϧͰ͸ͳ͘જࡏۭؒͰ༧ଌˠϦΞϧλΠϜ࣮ߦՄೳʢ਺ඦສϐΫηϧੜ੒Λճආʣ  ˒   ᶃ ᶄ ᶅ ᶆ ᶇ
  35. w ݁߹ͷਂ͕͞૿͢΄Ͳੑೳ޲্ɺͨͩ͠ઃܭɾֶशͷෳࡶ͞΋૿Ճ  ·ͱΊɿ7-"ºੈքϞσϧʹΑΔࣗಈӡస ؍఺ "෼཭ܕ #ಛ௃ڞ༗ܕ $ΠϯλʔϦʔϒܕ %જࡏۭؒ౷߹ܕ ݁߹ͷਂ͞

    ઙ͍ʢಠཱʣ தʢදݱڞ༗ʣ ਂ͍ʢดϧʔϓʣ ࠷ਂʢҰମԽʣ ୅දख๏ *3-7-" %SJWF7-"8 7-"8PSME %SJWF8PSME7-" 8.ͷ໾ׂ ҆શੑݕূ ࣗݾڭࢣ৴߸ ૝૾ˠ൓ল ҙࢥܾఆม਺ ڧΈ ࣮૷༰қ طଘ7-"׆༻ σʔλεέʔϦϯά &&ֶशՄ Ұ؏ੑߴ͍ ઌಡΈೳྗ ҼՌਪ࿦Մ ߴ଎ɾ4P5" ऑΈ ஌ࣝసҠͳ͠ ਪ࿦࣌ؒ૿ ҼՌਪ࿦ͳ͠ 8IBUJGࠔ೉ ਪ࿦͕࣌ؒ௕͍ ϩʔϧΞ΢τଟ ઃܭ͕ෳࡶ ֶश͕ෆ҆ఆ ൃలͷํ޲ੑ ᶃ4VQFSWJTJPO%FGJDJUͷղফˠେن໛σʔλͷεέʔϦϯάଇΛ׆ੑԽ ᶄʮ૝૾ˠ൓লˠߦಈʯͷดϧʔϓˠਓؒʹ͍ۙӡసೝ஌ͷ࣮ݱ
  36.  ·ͱΊɿࣗಈӡసʹ͓͚ΔੈքϞσϧ w ࣗಈӡసʹ͸ʮ҆શʹະདྷΛ༧ଌ͢Δ࢓૊Έʯ͕ඞਢ  ϨΞγφϦΦͷرগੑɾ࣮ंࢼߦͷϦεΫɾ໛฿ֶशͷ൚Խෆ଍ͱ͍͏՝୊ʹର͠ɺੈքϞσϧ͕ղΛ༩͑Δ w ༻్ผʹ੍ޚܕɾੜ੒ܕɾ%0DDVQBODZܕʹେผ  ੍ޚܕʢ.*-&ʣɿ344.Ͱજࡏ੍ۭؒޚΛ࣮ݱ

     ੜ੒ܕʢ%SJWF%SFBNFSʣɿ֦ࢄϞσϧͰӡసಈըΛੜ੒  %0DDVQBODZܕʢ(BVTTJBO8PSME %-8.ʣɿ%(BVTTJBOͰزԿతʹະདྷΛ༧ଌ w 7-"ºੈքϞσϧʹΑΔ࣍ੈ୅ΞʔΩςΫνϟ͕ొ৔  ݁߹ͷਂ͞Ͱύλʔϯʢ෼཭ܕˠಛ௃ڞ༗ܕˠΠϯλʔϦʔϒܕˠજࡏۭؒ౷߹ܕʣ  4VQFSWJTJPO%F fi DJUͷղফͱʮ૝૾ˠ൓লˠߦಈʯͷดϧʔϓ͕ൃలͷ伴 w ࢒͞Εͨ՝୊͸ܭࢉίετɾֶशͷ҆ఆੑɾεέʔϥϏϦςΟ  ੜ੒ܕ͸ߴ඼࣭͕ͩਪ࿦ίετ͕େɺજࡏۭؒ౷߹ܕ͸ઃܭͱֶश͕ෳࡶ  ࣮ंల։ʹ޲͚ͨϦΞϧλΠϜੑͱ൚Խੑೳͷཱ͕྆ࠓޙͷয఺
  37.  ຊνϡʔτϦΞϧͷߏ੒  ੈքϞσϧͱ͸ ੈքϞσϧͷجຊ֓೦ͱ ୅දతͳΞʔΩςΫνϟ 8PSME.PEFMT1MB/FU 344.%SFBNFS  ࣗಈӡసʹ͓͚Δ

    ੈքϞσϧ ӡసγʔϯ΁ͷੈքϞσϧͷ ద༻ͱಈըੜ੒ܕϞσϧ .*-& %SJWF%SFBNFS %SJWF8PSME7-"  -F8PSME.PEFM ܰྔ͔ͭ҆ఆͨ͠ +&1"ܕੈքϞσϧͷ࠷৽ݚڀ ैདྷख๏ͱ՝୊ ϞσϧΞʔΩςΫνϟ ଛࣦܭࢉධՁ࣮ݧ
  38.  ࠶ߏ੒ܕͱ+&1"ܕ ؍఺ ࠶ߏ੒ܕʢ(FOFSBUJWFʣ +&1"ܕʢ1SFEJDUJWFʣ ༧ଌର৅ ϐΫηϧಈըϑϨʔϜ જࡏಛ௃දݱʢந৅ϕΫτϧʣ ୅දख๏ 1MB/FU

    %SFBNFS ("*"  %SJWF%SFBNFS *+&1" 7+&1" -F8PSME.PEFM ଛࣦ ըૉϨϕϧ.4&,-֦ࢄଛࣦ જࡏۭؒ.4& ਖ਼ଇԽ ܭࢉίετ ߴʢϐΫηϧۭؒͰܭࢉʣ ௿ʢજࡏۭؒͷΈʣ දݱ่յϦεΫ ௿ʢ࠶ߏ੒੍͕໿ͱͯ͠ಇ͘ʣ ߴˠରࡦ͕ඞਢ ࡉ෦ͷอ࣋ ڧ͍ʢແؔ܎ͳࡉ෦·Ͱ༧ଌʣ ऑ͍ʢແࢹͯ͠Α͍ʣ w ੈքϞσϧͷ 2 ͭͷֶशઓུ
  39. w ະೖྗ෦෼ͷಛ௃දݱΛ༧ଌ͢Δ͜ͱͰɼσʔλʹ಺ࡏ͢Δߏ଄ɾنଇੑΛֶश͢Δࣗݾڭ ࢣ͋Γֶश  +PJOU&NCFEEJOH1SFEJDUJWF"SDIJUFDUVSF +&1" <"TTSBO $713> ըૉۭؒͰΤωϧΪʔΛܭࢉ ୅දख๏7"&

    ."& %SFBNFS ՝୊ແؔ܎ͳࡉ෦·Ͱ༧ଌ જࡏۭؒͰΤωϧΪʔΛܭࢉ ୅දख๏4JN$-3 #:0- %*/0 ՝୊༧ଌػߏΛ࣋ͨͳ͍ જࡏۭؒͰ༧ଌޡࠩΛܭࢉ ୅දख๏*+&1" 7+&1" -F8. ՝୊දݱ่յ͠΍͍͢ B (FOFSBUJWFʢੜ੒ܕʣ ؍ଌۭؒͰ࠶ߏ੒ C +PJOU&NCFEEJOHʢ݁߹ຒΊࠐΈܕʣ જࡏදݱΛ੔߹ͤ͞Δ D +&1" જࡏۭؒͰ༧ଌ͢Δ
  40.  ैདྷख๏ͱͦͷ՝୊ w +PJOU&NCFEEJOH1SFEJDUJWF"SDIJUFDUVSF +&1" <-F$VO  0QFO3FWJFX>  ະೖྗ෦෼ͷಛ௃දݱΛ༧ଌ͢Δ͜ͱͰɼσʔλʹ಺ࡏ͢Δߏ଄ɾنଇੑΛֶश͢Δࣗݾڭࢣ͋Γ

    ֶश w +&1"ͷ՝୊  දݱ่յΛ๷͙ͨΊʹෳࡶͳଟ߲ଛࣦɼ&."ɼࣄલֶश͞ΕͨΤϯίʔμɼิॿ৴߸ʹґଘ  දݱͷ่յɿೖྗͷҧ͍͕ࣦΘΕɼಛ௃දݱ͕΄΅ఆ਺Խ͢Δݱ৅
  41.  +&1"ܥݚڀͷൃల w +&1"͸ը૾ˠಈըˠߦಈ৚݅෇͖ˠܰྔԽͱஈ֊తʹൃల  *+&1" ը૾ ɹ$713 ϚεΫͨ͠ը૾ྖҬͷજࡏ දݱΛ༧ଌ

    ࠶ߏ੒ෆཁͷࣗݾڭࢣ͋Γ ֶशΛཱ֬  7+&1" ಈը  ಈը΁ͷ֦ுɻۭ࣌ؒϚε ΫͰө૾දݱΛֶश ߦಈ͸·ͩ৚݅෇͚ͳ͍  7+&1" ߦಈ৚݅෇͖  BDUJPODPOEJUJPOFEʹ֦ ுɺߦಈܭը·ͰՄೳʹ ͨͩ͠େن໛ࣄલֶशΛཁ ͢Δ  -F8PSME.PEFM ܰྔɾ&&  SBXQJYFMT͔Β&OEUP &OEͰܰྔֶश ࣄલֶशෆཁɺ4*(3FHͰ҆ ఆԽ ஫໨఺εέʔϧґଘ͔Βͷ୤٫Š-F8PSME.PEFM͸ʮࣄલֶशͳ͠Ͱ΋+&1"͕ػೳ͢Δʯ͜ͱΛࣔͨ͠ॳͷݚڀ
  42.  ϞσϧΞʔΩςΫνϟ w &ODPEFS  ؍ଌ஋ΛίϯύΫτͳ௿࣍ݩಛ௃දݱʹϚοϐϯά  جຊΞʔΩςΫνϟ wߏ੒ɿ7J5UJOZ wύϥϝʔλ਺ɿ໿.ύϥϝʔλ

    wϨΠϠʔ਺ɿ૚ wΞςϯγϣϯϔουɿݸ wӅΕ૚ͷ࣍ݩ਺ɿ࣍ݩ wύοναΠζɿ 𝑧 𝑡 = enc 𝜃 ( 𝑜 𝑡 ) -F8.ͷֶशΞʔΩςΫνϟ ɿ࣌ࠁ ʹ͓͚Δ௿࣍ݩಛ௃දݱ ɿ࣌ࠁ ʹ͓͚Δ؍ଌ஋ ɿ&ODPEFSֶ͕࣋ͭशύϥϝʔλ 𝑧 𝑡 𝑡 𝑜 𝑡 𝑡 𝜃
  43.  ϞσϧΞʔΩςΫνϟ w 1SFEJDUPS  ͕༩͑ΒΕͨͱ͖࣍࣌ࠁͷಛ௃දݱΛ༧ଌ  جຊΞʔΩςΫνϟ wߏ੒ɿ7J54 wύϥϝʔλ਺ɿ໿.ύϥϝʔλ

    wϨΠϠʔ਺ɿ૚ wΞςϯγϣϯϔουɿݸ wυϩοϓΞ΢τɿ໿  "DUJPOͷ౷߹ϝΧχζϜ w ֤૚ʹ"EB-/Λద༻ w "EB-/ͷύϥϝʔλ͸θϩॳظԽΛ࠾༻ ^ 𝑧 𝑡 +1 = pred 𝜙 ( 𝑧 𝑡 , 𝑎 𝑡 ) -F8.ͷֶशΞʔΩςΫνϟ ɿ࣌ࠁ ʹ͓͚Δ༧ଌͨ͠ಛ௃දݱ ɿ࣌ࠁ ʹ͓͚Δߦಈ ɿ1SFEJDUPSֶ͕࣋ͭशύϥϝʔλ ^ 𝑧 𝑡 +1 𝑡 + 1 𝑎 𝑡 𝑡 𝜙
  44.  ଛࣦܭࢉ w ֶश໨తɿ؀ڥμΠφϛΫεΛϞσϧԽ  ؀ڥμΠφϛΫεɿ؍ଌঢ়ଶͱߦಈʹΑͬͯ࣍ͷঢ়ଶ͕Ͳ͏ܾ·Δ͔ͱ͍͏ભҠ๏ଇ ℒLeWM ≜ ℒpred +

    𝜆 SIGReg( 𝑍 ) ℒpred ≜ ^ 𝑧 𝑡 +1 − 𝑧 𝑡 +1 2 2 ^ 𝑧 𝑡 +1 = pred 𝜙 ( 𝑧 𝑡 , 𝑎 𝑡 ) ɿ࣌ࠁ ʹ͓͚Δ༧ଌͨ͠ಛ௃දݱ ɿ࣌ࠁ ʹ͓͚Δਖ਼ղͷಛ௃දݱ ɿPredictorֶ͕࣋ͭशύϥϝʔλ ɿ࣌ࠁ ʹ͓͚Δਖ਼ղͷಛ௃දݱ ɿ࣌ࠁ ʹ͓͚Δߦಈ ^ 𝑧 𝑡 +1 𝑡 + 1 𝑧 𝑡 +1 𝑡 + 1 𝜙 𝑧 𝑡 𝑡 𝑎 𝑡 𝑡 ༧ଌޡࠩ ਖ਼ଇԽ
  45.  ଛࣦܭࢉɿ4LFUDIFE*TPUSPQJD(BVTTJBO3FHVMBSJ[FS 4*(3FH w දݱ่յΛ๷͙ਖ਼ଇԽ߲  ಛ௃දݱΛϥϯμϜͳҰ࣍ݩํ޲΁ࣹӨ  ඪ४ਖ਼ن෼෍ʹै͏Α͏&QQTQVMMFZݕఆ౷ܭྔͰਖ਼ଇԽ 4*(3FHʹΑΔදݱۭؒͷਖ਼نԽϓϩηε

    SIGReg( 𝑍 ) ≜ 1 𝑀 𝑀 ∑ 𝑚 =1 𝑇 (h( 𝑚 )) ɿೖྗσʔλͷಛ௃ྔ ɿ౤Өͷ਺ ɿҰ࣍ݩ౤Өσʔλ ɿ&QQTQVMMFZݕఆ౷ܭྔ ɿཤྺͷ௕͞ ɿόοναΠζ ɿຒΊࠐΈ࣍ݩ 𝑍 𝑀 h( 𝑚 ) 𝑇 𝑁 𝐵 𝑑
  46.  -F8PSME.PEFMʹΑΔજࡏۭؒϓϥϯχϯά w ֶशࡁΈϞσϧΛ༻͍ͯજࡏۭؒͰʮ༧ଌˠධՁˠ࠷దԽʯΛดϧʔϓͰ࣮ߦ  ᶃΤϯίʔυɿ؍ଌ ͱ໨ඪ Λ&ODPEFSͰજࡏදݱ ʹม׵ 

    ᶄજࡏϩʔϧΞ΢τɿ1SFEJDUPS͕ߦಈ Λ৚݅ʹ Λ༧ଌ  ᶅ$PTUධՁɿ࠷ऴ༧ଌ ͱ໨ඪ ͷજࡏڑ཭Λܭࢉ  ᶆߦಈ࠷దԽɿ$&.ιϧό͕ίετΛ࠷খԽ͢ΔߦಈྻΛ୳ࡧʢᶄʙᶆΛ൓෮ʣ  ᶇ࣮ߦɾ࠶ܭըɿ࠷ྑߦಈ Λ࣮؀ڥͰ࣮ߦɺ৽؍ଌͰᶃ΁໭Δ O1 Og z1 , zg a1 , …, aH z2, …, zH ̂ z2 , …, ̂ zH ̂ zH ̂ zg a1 … Predictor z 1 ẑ 2 a H a 2 a 1 o1 Predictor Predictor ẑ H og zg solver update actions Cost Encoder … Encoder Figure 4: LeWorldModel Latent Planning. Given an initial observation o1 and a goal og , the world model learned in Fig. 2 performs planning in the LeWM latent space. The initial state embedding z and the goal $&.ιϧόɿ࠷దͳߦಈྻΛ୳ͨ͢ΊͷαϯϓϦϯάϕʔεͷ࠷దԽΞϧΰϦζϜ
  47.  ੍ޚੑೳͷධՁɿ࣮ݧ֓ཁ w -F8.ͷϞσϧن໛  .ύϥϝʔλͷܰྔϞσϧ  γϯάϧ(16Ͱͷֶशɾ࣮ߦ w ൺֱର৅

     -F8.ɼ%*/08.<(;IPV *$.->ͷೋख๏Λൺֱ  %*/08.ɿࣄલֶशࡁΈϞσϧʹґଘ͢Δख๏ w ධՁϓϩτίϧ  λεΫɿ%ٴͼ%؀ڥʹ͓͚ΔNBOJQVMBUJPO OBWJHBUJPO MPDPNPUJPO  1VTI5ʢ%؀ڥʣ ɿϩϘοτΞʔϜૢ࡞  0(#$VCFʢ%؀ڥʣɿཱํମૢ࡞  ධՁ࣠ɿϓϥϯχϯά଎౓ɼϓϥϯχϯά੒ޭ཰ 1VTI5 0(#FODI$VCF
  48.  ੍ޚੑೳͷධՁɿ࣮ݧ֓ཁ w ࣮ݧઃఆ  1VTI5؀ڥͰಘͨಛ௃දݱ͔Β༧ଌ͢ΔઢܗɾඇઢܗͷQSPCFΛֶश w ൺֱର৅  -F8.ɼ1-%.<74PCBM

    /FVS*14>ɼ%*/08.ͷࡾख๏Λൺֱ w ධՁϓϩτίϧ  λεΫ ɿ1VTI5  ༧ଌର৅ͷ෺ཧྔ ɿΤʔδΣϯτͷҐஔɼϒϩοΫͷҐஔɼϒϩοΫͷ֯౓  ධՁࢦඪ ɿ.4&ʢฏۉೋ৐ޡࠩʣɼSʢ૬ؔ܎਺ʣ
  49.  -F8PSME.PEFMͷݶքͱࠓޙͷ՝୊ w λεΫɾσʔλͷ੍໿  ධՁ͸1VTI5ʢ%ʣͱ0(#$VCFʢ%ʣͷγϯϓϧͳ੍ޚλεΫͷΈ  ࣗಈӡసɾෳࡶγʔϯɾ௕ظ༧ଌ͸ະݕূ  ಈըʢ࣌ܥྻʣ΁ͷ௚઀ద༻͸ࠓޙͷ՝୊

     ࣮؀ڥϩϘοτ΁ͷసҠ͸ະݕূ w εέʔϧͱ൚Խ  7J5UJOZ7J54ͱখن໛ϞσϧͰͷݕূʹཹ·Δ  େن໛Խͨ͠ࡍͷڍಈɾεέʔϦϯάଇ͸ະղ໌  ࣮ੈքηϯαʢ-J%"3ɾϨʔμʔʣ΁ͷ֦ு͸ະணख  େن໛σʔληοτͰͷޮՌ͸͜Ε͔Β -F8PSME.PEFMͷҐஔ෇͚ɿʮܰྔͰಈ͘ʯ͜ͱΛࣔͨ͠ஈ֊ ࣗಈӡసσʔλ΁ͷద༻ɺಈըϞσϧԽɺେن໛εέʔϦϯά͕ࠓޙͷൃలͷ伴
  50.  ·ͱΊɿ-F8PSME.PEFM w +&1"ͷ՝୊Λ4*(3FHͰղܾ  දݱ่յΛ๷͙&."ɾิॿଛࣦʹཔΒͣɺ֬཰తݕఆྔͰਖ਼ଇԽ͢Δ৽͍͠ઃܭ w .ύϥϝʔλͰ΋%*/0Wڃͷදݱྗ  1VTI50(#$VCFͰϓϥϯχϯά੒ޭ཰͕େ෯޲্ɺ෺ཧྔQSPCJOHͰ΋େن໛ࣄલֶशϞσ

    ϧʹඖఢ w ࢒͞Εͨ՝୊͸ಈըɾେن໛ɾ࣮ੈքԠ༻  ຊݚڀ͸੩ࢭըϕʔεͷ࣮ݧ͕த৺ɻࠓޙ͸ಈը༧ଌϞσϧ΁ͷ֦ுɺࣗಈӡసσʔλ΁ͷద༻ɺ ΑΓେن໛ͳεέʔϦϯά͕՝୊
  51.  ࢀߟจݙ <>:-F$VO "1BUI5PXBSET"VUPOPNPVT.BDIJOF*OUFMMJHFODF 7FSTJPO  0QFO3FWJFX  <>%)BBOE+4DINJEIVCFS 3FDVSSFOU8PSME.PEFMT'BDJMJUBUF1PMJDZ&WPMVUJPO

    /FVS*14 <>%)BGOFS 5-JMMJDSBQ *'JTDIFS 37JMMFHBT %)B )-FF BOE+%BWJETPO -FBSOJOH-BUFOU%ZOBNJDTGPS1MBOOJOHGSPN1JYFMT *$.- <>%)BGOFS 5-JMMJDSBQ +#B BOE./PSPV[J %SFBNUP$POUSPM-FBSOJOH#FIBWJPSTCZ-BUFOU*NBHJOBUJPO *$-3 <>%)BGOFS 5-JMMJDSBQ ./PSPV[J BOE+#B .BTUFSJOH"UBSJXJUI%JTDSFUF8PSME.PEFMT *$-3 <>%)BGOFS +1BTVLPOJT +#B BOE5-JMMJDSBQ .BTUFSJOH%JWFSTF%PNBJOTUISPVHI8PSME.PEFMT BS9JW  <>")V ($PSSBEP /(SJ ff i UIT ;.VSF[ $(VSBV ):FP ",FOEBMM 3$JQPMMB BOE+4IPUUPO .PEFM#BTFE*NJUBUJPO-FBSOJOHGPS6SCBO%SJWJOH /FVS*14 <>98BOH ;;IV ()VBOH 9$IFO +;IV BOE+-V %SJWF%SFBNFS5PXBSET3FBMXPSMEESJWFO8PSME.PEFMTGPS"VUPOPNPVT%SJWJOH &$$7 <>(;IBP $/J 98BOH ;;IV 9;IBOH :8BOH ()VBOH 9$IFO #8BOH FUBM %SJWF%SFBNFS%8PSME.PEFMT"SF& ff FDUJWF%BUB.BDIJOFTGPS%%SJWJOH4DFOF3FQSFTFOUBUJPO $713 <>4;VP 8;IFOH :)VBOH +;IPV BOE+-V (BVTTJBO8PSME(BVTTJBO8PSME.PEFMGPS4USFBNJOH%0DDVQBODZ1SFEJDUJPO $713 <>:;IV :9VF );IBOH (+JBOH 8;IPV 9:BO +(BP :$BJ #-JV ;-J BOE44IFO %-8.%VBM-BUFOU8PSME.PEFMTFOBCMF)PMJTUJD(BVTTJBODFOUSJD1SFUSBJOJOHJO"VUPOPNPVT%SJWJOH  $713 <>"+JBOH :(BP :8BOH ;4VO 48BOH :)FOH )4VO 45BOH -;IV +$IBJ +8BOH ;(V )+JBOH BOE-4VO *3-7-"5SBJOJOHBO7JTJPO-BOHVBHF"DUJPO1PMJDZWJB3FXBSE8PSME.PEFM  BS9JW  <>:-J 44IBOH 8-JV #;IBO )8BOH :8BOH :$IFO 98BOH :"O $5BOH -)PV -'BO BOE;;IBOH %SJWF7-"88PSME.PEFMT"NQMJGZ%BUB4DBMJOH-BXJO"VUPOPNPVT%SJWJOH *$-3  <>(8BOH 15BOH 93FO (;IBP #'FOH BOE$.B -FBSOJOH7JTJPO-BOHVBHF"DUJPO8PSME.PEFMTGPS"VUPOPNPVT%SJWJOH $713'JOEJOHT <>2-JV )9V +-J #4VO ;)BP %4IF 9;IV BOE-;IBOH 6OJ8PSME7-"*OUFSMFBWFE8PSME.PEFMJOHBOE1MBOOJOHGPS"VUPOPNPVT%SJWJOH BS9JW  <>'+JB --JV ;4POH $+JB ):F 9)BP BOE-$IFO %SJWF8PSME7-"6OJ fi FE-BUFOU4QBDF8PSME.PEFMJOHXJUI7JTJPO-BOHVBHF"DUJPOGPS"VUPOPNPVT%SJWJOH BS9JW  <>."TTSBO 2%VWBM *.JTSB 1#PKBOPXTLJ 17JODFOU .3BCCBU :-F$VO BOE/#BMMBT 4FMG4VQFSWJTFE-FBSOJOHGSPN*NBHFTXJUIB+PJOU&NCFEEJOH1SFEJDUJWF"SDIJUFDUVSF $713 <>"#BSEFT 2(BSSJEP +1PODF 9$IFO .3BCCBU :-F$VO ."TTSBO BOE/#BMMBT 3FWJTJUJOH'FBUVSF1SFEJDUJPOGPS-FBSOJOH7JTVBM3FQSFTFOUBUJPOTGSPN7JEFP 5.-3 <>."TTSBO "#BSEFT %'BO 2(BSSJEP 3)PXFT .,PNFJMJ ..VDLMFZ "3J[WJ $3PCFSUT ,4JOIB FUBM 7+&1"4FMG4VQFSWJTFE7JEFP.PEFMT&OBCMF6OEFSTUBOEJOH 1SFEJDUJPOBOE 1MBOOJOH BS9JW  <>-.BFT 2-F-JEFD %4DJFVS :-F$VO BOE3#BMFTUSJFSP -F8PSME.PEFM4UBCMF&OEUP&OE+PJOU&NCFEEJOH1SFEJDUJWF"SDIJUFDUVSFGSPN1JYFMT BS9JW  <>(;IPV )1BO :-F$VO BOE-1JOUP %*/08.8PSME.PEFMTPO1SFUSBJOFE7JTVBM'FBUVSFTFOBCMF;FSPTIPU1MBOOJOH *$.- <>74PCBM 8;IBOH ,$IP 3#BMFTUSJFSP 5(+3VEOFS BOE:-F$VO -FBSOJOHGSPN3FXBSE'SFF0 ff l JOF%BUB"$BTFGPS1MBOOJOHXJUI-BUFOU%ZOBNJDT.PEFMT /FVS*14