How to Coach Robots to Play Soccer: Learning + Advising

How to Coach Robots to Play Soccer: Learning + Advising

People use different methods to make decisions. “Trial and error”, for instance, is widely used by people to learn the best decisions from experiences. Also, arguing with other people, or even self-arguing, could help people to identify advantages and disadvantages of each choice. In this talk, I will introduce how these two techniques can be used jointly to help computers (autonomous agents) to make decisions.

First I will motivate my research, followed by a high-level description of our integration technique. Also, I will compare the performances of my technique and standard Reinforcement Learning on RoboCup Soccer games, under both single-agent and multi-agent settings. Potential application domains and future work will also be briefly discussed. Videos and concrete examples will be running throughout this talk to instantiate my ideas and techniques.

Be1c8a24b76f8b2b23f53eb22d401810?s=128

Imperial ACM

March 14, 2014
Tweet

Transcript

  1. How  to  Coach  Robots  to  Play  Soccer:   Learning  +

     Advising Yang  (Alex)  Gao   DoC  Student  Seminar   14/03/2014
  2. How  do  teach  them  play? •  How  do  we  play?

      •  How  do  we  learn  to  play?   •  How  do  we  teach  people  play?   •  Can  we  teach  robots  as  we  teach  people?  
  3. How  do  we  learn  to  play? •  Arguing/Discussion   – List

     out  all  possible   opLons   – Check  their  relaLons   – Check  each  opLon’s   advantage  and   disadvantage   – Find  the  best  opLon   •  Trail  and  Error   – When  find  some   acLons  good,  repeat   these  acLons   – When  find  some   acLons  bad,  avoid   these  acLons
  4. Reinforcement  Training  for  People

  5. Reinforcement  Training  for  Robots

  6. ArgumentaLon  for  Decision  Making

  7. Problem  of  Reinforcement  Learning

  8. Learning  seWng  of  the  Game •  Keepers     – Only

     the  ball  holder  is  learning   – AcLons:  HoldBall,  PassTo(K2),  PassTo(K3),  …   •  Takers     – Each  taker  is  learning  independently   – AcLons:  TackleBall,  Mark(K2),  Mark(K3),  …  
  9. Reinforcement  Learning  for  RoboCup •  Learning  by  receiving   rewards/punishments

      •  Rewards  for  Keepers:   – Lose  the  ball:  -­‐10   – Other  acLon:  +  DuraLon  of   the  acLon     •  Rewards  for  Takers:   – Get  the  ball:  +10   – Other  acLon:    -­‐  DuraLon  of   the  acLon  
  10. ArgumentaLon  for  Keepers •  What  we  think  K1  should  

    do  now?   – Hold  the  ball   •  a]ract  takers  to  come  closer   – Pass  to  K2   •  K2  is  far  from  K1   – Pass  to  K3   •  K3  is  in  an  open  posiLon  
  11. ArgumentaLon  Framework  for   Keepers Pass(2)   K2  is  far

    Hold   A]ract Pass(3)   K3  is  open
  12. ArgumentaLon  for  Takers •  What  should  T1  do?   – 

    Tackle  the  ball   •  Closest  to  the  ball   –  Mark  K2   •  Block  the  passway   –  Mark  K3   •  Closest  to  K3   •  What  should  T2  do?   –  Mark  K2   •  Closest  to  K2   –  Mark  K3   •  K3  is  in  an  open  posiLon  
  13. ArgumentaLon  Framework  for  Takers T1:Mark  K2   Block  passway T1:Mark

     K3   Close T2:Mark  K3   Open T2:Mark  K2   Closest  to  K2 T1:Tackle   Closest  to  the  ball
  14. Overview Describe   domain   knowledge   in  natural  

    language   Rank  the   importance   of   arguments   Delete   a]acks  from   weak  arg.  to     strong  arg.   Find  the   recommend ed  acLon   Give  extra   rewards  to  the   recommended     acLons  
  15. Keeper’s  Performance  Comparison In  the  beginning  of  the  learning Standard

     RL ArgumentaLon  +  RL
  16. Takers’  Performance  Comparison A`er  20  hours  of  learning Standard  RL

    ArgumentaLon  +  RL
  17. Learning  Curves  (3v2  keeper’s)

  18. Takers’  Performance  Comparison In  the  beginning  of  the  learning Standard

     RL ArgumentaLon  +  RL
  19. Takers’  Performance  Comparison A`er  60  hours  of  learning Standard  RL

    ArgumentaLon  +  RL
  20. Learning  Curves  (3v2  takers)

  21. Conclusion  and  Discussion •  When  incorporated  with  ArgumentaLon,  RL  

    has  be]er  performances  in:   – Single-­‐agent  learning  (Only  one  Keeper  learns)   – MulL-­‐agent  cooperaLve  learning  (Only  takers   learn)   •  When  both  sides  use  ArgumentaLon   Accelerated  RL,  the  convergence  Lme  is   shorter,  but  the  final  performance  is  similar   (reach  a  Nash  Equilibrium?)
  22. Reference [1]  P.  Stone  et  al.,  ‘Reinforcement  learning  for  robocup

     soccer   keepaway’,  in  Adap%ve  Behaviour,  2005     [2]  T.  Bench-­‐Capon  and  K.  Atkinson,  ‘Abstract  argumentaLon   and  values’,  in  Argumenta%on  in  AI,  2009     [3]  E.  Wiewiora  et  al.,  ‘Principled  methods  for  advising   reinforcement  learning  agents’,  in  Proceedings  of   ICML-­‐2003     [4]  P.M.Dung,  ‘On  the  acceptability  of  arguments  and  its   fundamental  role  in  nonmonotonic  reasoning,  logic   programming  and  n-­‐person  games’,  Ar%ficial  Intelligence,   321-­‐357,  (1995)   [5]  Y.  Gao  et  al.,  ‘ArgumentaLon-­‐Based  Reinforcement   Learning  for  RoboCup  Soccer  Keepaway’,  ECAI-­‐2012
  23. THANK  YOU