IDRiM2022_EvacuationLearningModel

Evacuation choice modeling using reinforcement learning based on the multi-armed
bandit problem Satoki MASUDA, Eiji HATO Department of Civil Engineering, The University of Tokyo [email protected] At IDRiM2022, Thu. 22 September., Young Scientist Session Group I

Motivation • Various types of disaster information from various sources
2 • These information sometimes triggers evacuation, but sometimes does not. SNS TV news neighbor website

Motivation • In addition, response to information varies from person
to person. 3 • If we model information learning process, we can predict, encourage, and control evacuation behavior by information. A B C Custom-made evacuation learning system ? ? ? ? that optimizes - contents of information - distribution sources of information

Objective of research 4 1. Developing the evacuation behavioral model
that incorporates information learning process 2. Representing the heterogeneity of reaction to disaster information

Previous research on evacuation modeling 5 • One-shot questionnaire →
Discrete choice model time to disaster 48h 6h 24h 12h home shelter A shelter B time to disaster 48h 6h 24h 12h home shelter A shelter B Pr!"#$ 𝜽 = logit(𝑎𝑔𝑒, ℎ𝑎𝑧𝑎𝑟𝑑, 𝑒𝑣𝑎𝑐𝑢𝑎𝑡𝑖𝑜𝑛 𝑜𝑟𝑑𝑒𝑟, 𝑒𝑡𝑐; 𝜽) Dynamics of learning and oblivion process is not included

Experiment design 6 evacuation drill hazard map time to disaster
48h 6h 24h 12h time to disaster 48h 6h 24h 12h time to disaster 48h 6h 24h 12h time to disaster 48h 6h 24h 12h home shelter A shelter B congestion info. Questionnaire① Questionnaire ④ Questionnaire ② Questionnaire ③ wave1 wave2 wave3 wave4 repeat the observation of evacuation choices and information provision - focus on change within each person

Experiment in Koto, Tokyo 7 inundation area in the case
of embankment along Arakawa river collapsing https://www.kcf.or.jp/nakagawa/kikaku/detail/?id=70 Past flood

Experiment design 8 Questionnaire① Questionnaire ④ Questionnaire ② Questionnaire ③
drill congestion info. with drill = 110 w/o drill = 142 drill drill drill congestion info. + hazard map dropouts = 20 wave1 wave2 wave3 wave4 congestion info. congestion info. + hazard map March 2nd ~ 4th, 2022 March 11th ~ 15th, 2022 March 25th ~ 29th, 2022 April 14th ~ 20th, 2022

Experiment design – evacuation drill 9 ①Ojima-4 public housing 1
2 3 ③Toyosu evacuation building ②Toyosu-4 public housing hazard map videos of past disasters sensing the disaster situation observing the facility

Modelling approach – two dynamics 10 drill hazard map time
to disaster 48h 6h 24h 12h time to disaster 48h 6h 24h 12h time to disaster 48h 6h 24h 12h time to disaster 48h 6h 24h 12h home shelter A shelter B congestion info. Questionnaire① Questionnaire ④ Questionnaire ② Questionnaire ③ wave1 wave2 wave3 wave4 intra-wave dynamic evacuation choice model inter-wave dynamic learning model

Modelling approach – intra-wave 11 𝑝 𝑠%&' 𝑠% = 𝑒
' ( ) 𝑠%&' 𝑠% ; 𝜽 &*+! ,"#$ ∑ ,"#$ % ∈. ," 𝑒 ' ( ) 𝑠%&' / 𝑠% ; 𝜽 &*+! ,"#$ % Departure time choice and destination choice model → Dynamic discrete choice model time to disaster 48 hours 6 hours 24 hours 12 hours home evacuation site A evacuation site B !! "" !# "" !$%&' "" # "()* |"( utility of departure time choice utility of destination choice Questionnaire① wave1

Modelling approach – inter-wave 12 𝑣0 𝑠%&' 𝑠% ; ?
𝜽, ? 𝜶, 𝝀 = 𝑣' 𝑠%&' 𝑠% ; ? 𝜽, ? 𝜶 + C 0%12 0 𝛾030% 𝑔0% 𝑠%&' |𝑠% ; 𝝀 Dynamic learning model → Reinforcement learning utility of wave 𝑘 memory rate utility of newly acquired information at wave 𝑘 drill hazard map time to disaster 48h 6h 24h 12h time to disaster 48h 6h 24h 12h time to disaster 48h 6h 24h 12h time to disaster 48h 6h 24h 12h home shelter A shelter B congestion info. Questionnaire① Questionnaire ④ Questionnaire ② Questionnaire ③ wave1 wave2 wave3 wave4 - 𝛾 = how much people forget previous information - 𝝀 = how much people explore or exploit information = Reward

Estimation result of those who chose “evacuate” at wave 1
13 𝝀 𝛾 The same information has opposite effect. Heterogeneity of response to information is shown. Estimation using EM algorithm shows the same information can have opposite effect on different group of people.

Conclusion 14 ü We conducted a repeated experiment to observe
relation between information learning and behavioral change. ü We modeled dynamics of information learning process using the idea of reinforcement learning. ü Parameter estimation using EM algorithm shows the heterogeneity of learning process. ü The dynamic learning model will lead to predicting, encouraging, and controlling evacuation behavior by information. ★Point to discuss - What kind of application can we design if we understand and predict effect of DRR education on each person?

15 Thank you for listening! Contact: [email protected]

Experiment design – information provision 17 Hazard map Congestion information
=others’ response

Estimation result of those who chose “not evacuate” at wave
1 18 all data estimates t-value Change in departure time choice utility congestion -12.820 -0.03 Chage in destination choice utility whether the destintaion is in hazard map -0.222 -0.44 participate evacuation drill (home alternative） 1.693 3.18** whether the destination is congested (non-home alternatives） -2.676 -4.85** memory rate 0.928 5.16** number of samples 144 intial log-likelihood -290.0 final log-likelihood -215.9 likelihood ratio 0.255 adjusted likelihood ratio 0.238 *:significant at 5%, **significant at 1%

Multi-armed bandit problem 19 current success rate 20% (1/5) current
success rate 60% (3/5) current success rate 0% (0/1) current success rate 20% (1/5) current success rate 60% (3/5) current success rate 50% (1/2) wave k wave k+1 choose this arm and success next choice? A gambler tries to maximize the sum of rewards earned through a sequence of lever pulls. There is a trade-off between - "exploitation" of the machine that has the highest expected payoff - "exploration" to get more information about the expected payoffs of the other machines

Multi-armed bandit problem • A fixed limited set of resources
(trial times) must be allocated between competing choices (slot machines) in a way that maximizes their expected gain (money), when each choice's properties are only partially known at the time of allocation, and may become better understood as time passes or by allocating resources to the choice. • In evacuation learning, - trial times = learning cost - slot machines = evacuation choices (departure time, destination…) - money = utility 20

Multi-armed bandit problem 21 information provision next choice? An evacuee
tries to maximize the sum of utility earned through a sequence of evacuation choices. There is a trade-off between - "exploitation" of the choices that has the highest expected payoff - "exploration" to get more information about the expected payoffs of the other choices current utility 20 current utility 60 current utility 0 wave k current utility 20 current utility 60 current utility 50 wave k+1

Multi-armed bandit problem • Multi-armed bandit problem exploring best policy
under the partial knowledge about reward • Our study exploring reward system under the policy of utility maximization (inverse reinforcement learning) 22

Modelling approach – inter-wave 23 𝑣0 𝑠%&' 𝑠% ; ?
𝜽, ? 𝜶, 𝝀 = 𝑣' 𝑠%&' 𝑠% ; ? 𝜽, ? 𝜶 + C 0%12 0 𝛾030% 𝑔0% 𝑠%&' |𝑠% ; 𝝀 • Update utility function (Reward) 𝑣0 𝑠%&' 𝑠% ; ? 𝜽, ? 𝜶, 𝝀 : utility of wave k 𝛾030% : memory rate 𝑔0% 𝑠%&' |𝑠% ; 𝝀 : utility of newly acquired information at wave k • risk of residence area • socio-demographic variables • congestion • hazard map • The utility of evacuation choice is updated when one gets disaster information • But the information is forgotten by the memory rate 𝜸

IDRiM2022_EvacuationLearningModel

IDRiM2022_EvacuationLearningModel

SatokiMasuda

More Decks by SatokiMasuda

Other Decks in Research

Featured

Transcript

Evacuation choice modeling using reinforcement learning based on the multi-armed

Motivation • Various types of disaster information from various sources

Motivation • In addition, response to information varies from person

Objective of research 4 1. Developing the evacuation behavioral model

Previous research on evacuation modeling 5 • One-shot questionnaire →

Experiment design 6 evacuation drill hazard map time to disaster

Experiment in Koto, Tokyo 7 inundation area in the case

Experiment design 8 Questionnaire① Questionnaire ④ Questionnaire ② Questionnaire ③

Experiment design – evacuation drill 9 ①Ojima-4 public housing 1

Modelling approach – two dynamics 10 drill hazard map time

Modelling approach – intra-wave 11 𝑝 𝑠%&' 𝑠% = 𝑒

Modelling approach – inter-wave 12 𝑣0 𝑠%&' 𝑠% ; ?

Estimation result of those who chose “evacuate” at wave 1

Conclusion 14 ü We conducted a repeated experiment to observe

15 Thank you for listening! Contact: [email protected]

Experiment design – information provision 17 Hazard map Congestion information

Estimation result of those who chose “not evacuate” at wave

Multi-armed bandit problem 19 current success rate 20% (1/5) current

Multi-armed bandit problem • A fixed limited set of resources

Multi-armed bandit problem 21 information provision next choice? An evacuee

Multi-armed bandit problem • Multi-armed bandit problem exploring best policy

Modelling approach – inter-wave 23 𝑣0 𝑠%&' 𝑠% ; ?