algorithms : Q - Learning, policy gradients, actor-critic 3. Advanced model learning and Prediction 4. Exploration 5. Transfer and multi-task learning, meta-learning 6. Open problems, research talks, invited lectures
In order to make a decision -> mathematical formalism 3. Reinforcement learning is what give us the mathematical framework for dealing with the decision making
an World 2. Agent makes a decision 3. World responds to that decision with consequences - observation, reward "DUJ "HFOU &O 3F A t R t 4UB S t R t+1 S t+1 &OWJSPONFOU "HFOU
general form in the most complex setting 2. Deep models are what allow reinforcement learning algorithms to solve complex problems end to end 3. Reinforcement learning provides formulas (algorithm framework) 4. Deep learning provides the representations that allow us to apply that formalism to very complex problems with high dimensional observations and complicated action spaces 3-9%-
to perform a multitude of tasks 2. Controlling : challenging due to high dimensionality and large number of potential contacts. 3. Success of DRL in robotics has thus far been limited to simpler manipulators and tasks 4VNNBSZ 1. Model-free DRL can effectively scale up to complex manipulation tasks ( in simulated experiments ) 2. use of a small number of human demonstrations -> sample complexity can be significantly reduced 3. Use of demonstrations -> natural movements 4. successful policies -> object relocation, in-hand manipulation, tool use, door opening
reward function from examples ( Inverse R.L ) 3. Transferring knowledge between domains (transfer learning , meta-learning) 4. Learning to predict and using prediction to act
behavior (Imitation ) - Inferring rewards from observed behavior 2. Learning from observing the world - Learning to Predict - Unsupervised Learning 3. Learning from other tasks - Transfer learning - Meta-learning : learning to learn
Deep RL methods are usually slow 2. Humans can reuse past knowledge - Transfer learning in deep RL is an open problem 3. Not clear what the reward function should be 4. Not clear what the role of prediction should be