Supervised learning for decision making a. Does direct imitation work? b. How can we make it work more often? 3. Case studies of recent work in (deep) imitation learning 4. What is missing from imitation learning?
(ex : position, momentum, cat, mouse ) 2. Observation : Image pixel (Underlying the state of the world ) but those are actually hidden inside the image , you actually the image to get those out
) - policy - policy ( fully observed ) st ot at - state - observation - action o1 s1 a1 o2 s2 a2 o3 s3 a3 1. Drawing a graphically model to relate state, observation, and action 2. Observing previous observations might give you more information p(st+1 ∣ st , at ) p(st+1 ∣ st , at )
Record image from dashboard 4. Encode in the steering wheel and record of the steering wheel 5. get the data set ( observation : record image, action : record steering wheel ) 6. Storing data 7. Use it with supervised learning algorithm
are sampling action given observation. 2. Policy pi theta was trained on a distribution of observations, called pdata 3. pdata : human drove the car and the distribution of observations in data set