Slide 42
Slide 42 text
@nyghtowl
Main Functions
1. model
2. value function
- DP (V & Pi Iteration) = full model, bootstrapping
- MC = model-free sampling, episodic
- TD-learning (DQN) = sampling, bootstrapping & online
3. policy search
- Policy Search = gradients or not
4. value function & policy search
- A3C - TD-learning & PG