Upgrade to Pro — share decks privately, control downloads, hide ads and more …

How to Coach Robots to Play Soccer: Learning + Advising

How to Coach Robots to Play Soccer: Learning + Advising

People use different methods to make decisions. “Trial and error”, for instance, is widely used by people to learn the best decisions from experiences. Also, arguing with other people, or even self-arguing, could help people to identify advantages and disadvantages of each choice. In this talk, I will introduce how these two techniques can be used jointly to help computers (autonomous agents) to make decisions.

First I will motivate my research, followed by a high-level description of our integration technique. Also, I will compare the performances of my technique and standard Reinforcement Learning on RoboCup Soccer games, under both single-agent and multi-agent settings. Potential application domains and future work will also be briefly discussed. Videos and concrete examples will be running throughout this talk to instantiate my ideas and techniques.

Imperial ACM

March 14, 2014
Tweet

More Decks by Imperial ACM

Other Decks in Research

Transcript

  1. How  to  Coach  Robots  to  Play  Soccer:   Learning  +

     Advising Yang  (Alex)  Gao   DoC  Student  Seminar   14/03/2014
  2. How  do  teach  them  play? •  How  do  we  play?

      •  How  do  we  learn  to  play?   •  How  do  we  teach  people  play?   •  Can  we  teach  robots  as  we  teach  people?  
  3. How  do  we  learn  to  play? •  Arguing/Discussion   – List

     out  all  possible   opLons   – Check  their  relaLons   – Check  each  opLon’s   advantage  and   disadvantage   – Find  the  best  opLon   •  Trail  and  Error   – When  find  some   acLons  good,  repeat   these  acLons   – When  find  some   acLons  bad,  avoid   these  acLons
  4. Learning  seWng  of  the  Game •  Keepers     – Only

     the  ball  holder  is  learning   – AcLons:  HoldBall,  PassTo(K2),  PassTo(K3),  …   •  Takers     – Each  taker  is  learning  independently   – AcLons:  TackleBall,  Mark(K2),  Mark(K3),  …  
  5. Reinforcement  Learning  for  RoboCup •  Learning  by  receiving   rewards/punishments

      •  Rewards  for  Keepers:   – Lose  the  ball:  -­‐10   – Other  acLon:  +  DuraLon  of   the  acLon     •  Rewards  for  Takers:   – Get  the  ball:  +10   – Other  acLon:    -­‐  DuraLon  of   the  acLon  
  6. ArgumentaLon  for  Keepers •  What  we  think  K1  should  

    do  now?   – Hold  the  ball   •  a]ract  takers  to  come  closer   – Pass  to  K2   •  K2  is  far  from  K1   – Pass  to  K3   •  K3  is  in  an  open  posiLon  
  7. ArgumentaLon  for  Takers •  What  should  T1  do?   – 

    Tackle  the  ball   •  Closest  to  the  ball   –  Mark  K2   •  Block  the  passway   –  Mark  K3   •  Closest  to  K3   •  What  should  T2  do?   –  Mark  K2   •  Closest  to  K2   –  Mark  K3   •  K3  is  in  an  open  posiLon  
  8. ArgumentaLon  Framework  for  Takers T1:Mark  K2   Block  passway T1:Mark

     K3   Close T2:Mark  K3   Open T2:Mark  K2   Closest  to  K2 T1:Tackle   Closest  to  the  ball
  9. Overview Describe   domain   knowledge   in  natural  

    language   Rank  the   importance   of   arguments   Delete   a]acks  from   weak  arg.  to     strong  arg.   Find  the   recommend ed  acLon   Give  extra   rewards  to  the   recommended     acLons  
  10. Conclusion  and  Discussion •  When  incorporated  with  ArgumentaLon,  RL  

    has  be]er  performances  in:   – Single-­‐agent  learning  (Only  one  Keeper  learns)   – MulL-­‐agent  cooperaLve  learning  (Only  takers   learn)   •  When  both  sides  use  ArgumentaLon   Accelerated  RL,  the  convergence  Lme  is   shorter,  but  the  final  performance  is  similar   (reach  a  Nash  Equilibrium?)
  11. Reference [1]  P.  Stone  et  al.,  ‘Reinforcement  learning  for  robocup

     soccer   keepaway’,  in  Adap%ve  Behaviour,  2005     [2]  T.  Bench-­‐Capon  and  K.  Atkinson,  ‘Abstract  argumentaLon   and  values’,  in  Argumenta%on  in  AI,  2009     [3]  E.  Wiewiora  et  al.,  ‘Principled  methods  for  advising   reinforcement  learning  agents’,  in  Proceedings  of   ICML-­‐2003     [4]  P.M.Dung,  ‘On  the  acceptability  of  arguments  and  its   fundamental  role  in  nonmonotonic  reasoning,  logic   programming  and  n-­‐person  games’,  Ar%ficial  Intelligence,   321-­‐357,  (1995)   [5]  Y.  Gao  et  al.,  ‘ArgumentaLon-­‐Based  Reinforcement   Learning  for  RoboCup  Soccer  Keepaway’,  ECAI-­‐2012