How to Coach Robots to Play Soccer: Learning + Advising

How to Coach Robots to Play Soccer: Learning +
Advising Yang (Alex) Gao DoC Student Seminar 14/03/2014

How do teach them play? •  How do we play?
•  How do we learn to play? •  How do we teach people play? •  Can we teach robots as we teach people?

How do we learn to play? •  Arguing/Discussion – List
out all possible opLons – Check their relaLons – Check each opLon’s advantage and disadvantage – Find the best opLon •  Trail and Error – When ﬁnd some acLons good, repeat these acLons – When ﬁnd some acLons bad, avoid these acLons

Reinforcement Training for People

Reinforcement Training for Robots

ArgumentaLon for Decision Making

Problem of Reinforcement Learning

Learning seWng of the Game •  Keepers – Only
the ball holder is learning – AcLons: HoldBall, PassTo(K2), PassTo(K3), … •  Takers – Each taker is learning independently – AcLons: TackleBall, Mark(K2), Mark(K3), …

Reinforcement Learning for RoboCup •  Learning by receiving rewards/punishments
•  Rewards for Keepers: – Lose the ball: -‐10 – Other acLon: + DuraLon of the acLon •  Rewards for Takers: – Get the ball: +10 – Other acLon: -‐ DuraLon of the acLon

ArgumentaLon for Keepers •  What we think K1 should
do now? – Hold the ball •  a]ract takers to come closer – Pass to K2 •  K2 is far from K1 – Pass to K3 •  K3 is in an open posiLon

ArgumentaLon Framework for Keepers Pass(2) K2 is far
Hold A]ract Pass(3) K3 is open

ArgumentaLon for Takers •  What should T1 do? – 
Tackle the ball •  Closest to the ball –  Mark K2 •  Block the passway –  Mark K3 •  Closest to K3 •  What should T2 do? –  Mark K2 •  Closest to K2 –  Mark K3 •  K3 is in an open posiLon

ArgumentaLon Framework for Takers T1:Mark K2 Block passway T1:Mark
K3 Close T2:Mark K3 Open T2:Mark K2 Closest to K2 T1:Tackle Closest to the ball

Overview Describe domain knowledge in natural
language Rank the importance of arguments Delete a]acks from weak arg. to strong arg. Find the recommend ed acLon Give extra rewards to the recommended acLons

Keeper’s Performance Comparison In the beginning of the learning Standard
RL ArgumentaLon + RL

Takers’ Performance Comparison A`er 20 hours of learning Standard RL
ArgumentaLon + RL

Learning Curves (3v2 keeper’s)

Takers’ Performance Comparison In the beginning of the learning Standard
RL ArgumentaLon + RL

Takers’ Performance Comparison A`er 60 hours of learning Standard RL
ArgumentaLon + RL

Learning Curves (3v2 takers)

Conclusion and Discussion •  When incorporated with ArgumentaLon, RL
has be]er performances in: – Single-‐agent learning (Only one Keeper learns) – MulL-‐agent cooperaLve learning (Only takers learn) •  When both sides use ArgumentaLon Accelerated RL, the convergence Lme is shorter, but the ﬁnal performance is similar (reach a Nash Equilibrium?)

Reference [1] P. Stone et al., ‘Reinforcement learning for robocup
soccer keepaway’, in Adap%ve Behaviour, 2005 [2] T. Bench-‐Capon and K. Atkinson, ‘Abstract argumentaLon and values’, in Argumenta%on in AI, 2009 [3] E. Wiewiora et al., ‘Principled methods for advising reinforcement learning agents’, in Proceedings of ICML-‐2003 [4] P.M.Dung, ‘On the acceptability of arguments and its fundamental role in nonmonotonic reasoning, logic programming and n-‐person games’, Ar%ﬁcial Intelligence, 321-‐357, (1995) [5] Y. Gao et al., ‘ArgumentaLon-‐Based Reinforcement Learning for RoboCup Soccer Keepaway’, ECAI-‐2012

THANK YOU

How to Coach Robots to Play Soccer: Learning + ...

How to Coach Robots to Play Soccer: Learning + Advising

Imperial ACM

More Decks by Imperial ACM

Other Decks in Research

Featured

Transcript