Optimization of probabilistic argumentation with Markov processes

Slide 1

Slide 1 text

optimization of probabilistic argumentation with markov processes E. Hadoux1, A. Beynier1, N. Maudet1, P. Weng2 and A. Hunter3 Tue., Sept. 29th (1) Sorbonne Universités, UPMC Univ Paris 6, UMR 7606, LIP6, F-75005, Paris, France (2) SYSU-CMU Joint Institute of Engineering, Guangzhou, China SYSU-CMU Shunde International Joint Research Institute, Shunde, China (3) Department of Computer Science, University College London, Gower Street, London WC1E 6BT, UK

Slide 2

Slide 2 text

Introduction ∙ Debate argumentation problems between two agents 1

Slide 3

Slide 3 text

Introduction ∙ Debate argumentation problems between two agents ∙ Probabilistic executable logic to improve expressivity 1

Slide 4

Slide 4 text

Slide 5

Slide 5 text

Introduction ∙ Debate argumentation problems between two agents ∙ Probabilistic executable logic to improve expressivity ∙ New class of problems: Argumentation Problem with Probabilistic Strategies (APS) (Hunter, 2014) ∙ Purpose of this work: optimize the sequence of arguments of one agent 1

Slide 6

Slide 6 text

Slide 7

Slide 7 text

formalization

Slide 8

Slide 8 text

Formalization of a debate problem ∙ Turn-based game between two agents ∙ Rules to ﬁre in order to attack arguments of the opponent and revise knowledge 3

Slide 9

Slide 9 text

Slide 10

Slide 10 text

Formalization of a debate problem ∙ Turn-based game between two agents ∙ Rules to ﬁre in order to attack arguments of the opponent and revise knowledge Let us deﬁne a debate problem with: ∙ A, the set or arguments ∙ E, the set of attacks 3

Slide 11

Slide 11 text

Slide 12

Slide 12 text

Slide 13

Slide 13 text

Notation ∙ Arguments: literals (e.g., a, b, c) 4

Slide 14

Slide 14 text

Notation ∙ Arguments: literals (e.g., a, b, c) ∙ Attacks: e(x, y) if x attacks y 4

Slide 15

Slide 15 text

Notation ∙ Arguments: literals (e.g., a, b, c) ∙ Attacks: e(x, y) if x attacks y ∙ Args. in public (resp. private) space: a(x) (resp. hi(x)) 4

Slide 16

Slide 16 text

Slide 17

Slide 17 text

Notation ∙ Arguments: literals (e.g., a, b, c) ∙ Attacks: e(x, y) if x attacks y ∙ Args. in public (resp. private) space: a(x) (resp. hi(x)) ∙ Goals: ∧ k g(xk) (resp. g(¬xk)) if xk is (resp. is not) accepted in the public space (Dung, 1995) ∙ Rules: prem ⇒ Pr(Acts) 4

Slide 18

Slide 18 text

Slide 19

Slide 19 text

Slide 20

Slide 20 text

Formalization of an APS An APS is characterized (from the point of view of agent 1) by ⟨A, E, G, S1, g1, g2, S2, P, R1, R2⟩: ∙ A, E, P as speciﬁed above ∙ G, the set of all possible goals ∙ Si , the set of private states for agent i ∙ gi ∈ G, the given goal for agent i ∙ Ri , the set of rules for agent i 5

Slide 21

Slide 21 text

Example: Arguments Is e-sport a sport? 6

Slide 22

Slide 22 text

Example: Arguments Is e-sport a sport? a E-sport is a sport b E-sport requires focusing, precision and generates tiredness c Not all sports are physical d Sports not referenced by IOC exist e Chess is a sport f E-sport is not a physical activity g E-sport is not referenced by IOC h Working requires focusing and generates tiredness but is not a sport 6

Slide 23

Slide 23 text

Example: Formalization ∙ A = {a, b, c, d, e, f, g, h} 7

Slide 24

Slide 24 text

Example: Formalization ∙ A = {a, b, c, d, e, f, g, h} ∙ E = { e(f, a), e(g, a), e(b, f), e(c, f), e(h, b), e(g, c), e(d, g), e(e, g)} 7

Slide 25

Slide 25 text

Example: Formalization ∙ A = {a, b, c, d, e, f, g, h} ∙ E = { e(f, a), e(g, a), e(b, f), e(c, f), e(h, b), e(g, c), e(d, g), e(e, g)} ∙ g1 = g(a) 7

Slide 26

Slide 26 text

Example: Formalization ∙ A = {a, b, c, d, e, f, g, h} ∙ E = { e(f, a), e(g, a), e(b, f), e(c, f), e(h, b), e(g, c), e(d, g), e(e, g)} ∙ g1 = g(a) ∙ R1 = {h1(a) ⇒ ⊞a(a), h1(b) ∧ a(f) ∧ h1(c) ∧ e(b, f) ∧ e(c, f) ⇒ 0.5 : ⊞a(b) ∧ ⊞e(b, f) ∨ 0.5 : ⊞a(c) ∧ ⊞e(c, f), h1(d) ∧ a(g) ∧ h1(e) ∧ e(d, g) ∧ e(e, g) ⇒ 0.8 : ⊞a(e) ∧ ⊞e(e, g) ∨ 0.2 : ⊞a(d) ∧ ⊞e(d, g)} 7

Slide 27

Slide 27 text

Example: Formalization ∙ R2 = {h2(h) ∧ a(b) ∧ e(h, b) ⇒ ⊞a(h) ∧ ⊞e(h, b), h2(g) ∧ a(c) ∧ e(g, c) ⇒ ⊞a(g) ∧ ⊞e(g, c), a(a) ∧ h2(f) ∧ h2(g) ∧ e(f, a) ⇒ 0.8 : ⊞a(f) ∧ ⊞e(f, a) ∨ 0.2 : ⊞a(g) ∧ ⊞e(g, a)} ∙ Initial state: h1(a, b, c, d, e), {}, h2(f, g, h) 8

Slide 28

Slide 28 text

Attacks graph a g f c b d e h Figure: Graph of arguments of Example e-sport 9

Slide 29

Slide 29 text

Probabilistic Finite State Machine: Graph APS → Probabilistic Finite State Machine σ1 start σ2 σ3 σ4 σ5 σ6 σ7 σ8 σ9 σ10 σ11 σ12 1 0.8 0.2 0.5 0.5 1 1 0.8 0.2 0.8 0.2 Figure: PFSM of Example e-sport 10

Slide 30

Slide 30 text

Probabilistic Finite State Machine To optimize the sequence of arguments for agent 1, we could optimize the PFSM but: 11

Slide 31

Slide 31 text

Probabilistic Finite State Machine To optimize the sequence of arguments for agent 1, we could optimize the PFSM but: 1. depends of the initial state 11

Slide 32

Slide 32 text

Slide 33

Slide 33 text

Probabilistic Finite State Machine To optimize the sequence of arguments for agent 1, we could optimize the PFSM but: 1. depends of the initial state 2. requires knowledge of the private state of the opponent Using Markov models, we can relax assumptions 1 and 2. Moreover, the APS formalization can be modiﬁed in order to comply with the Markov assumption. 11

Slide 34

Slide 34 text

Markov Decision Process A Markov Decision Process (MDP) (Puterman, 1994) is characterized by a tuple ⟨S, A, T, R⟩: ∙ S, a set of states, ∙ A, a set of actions, ∙ T : S × A → Pr(S), a transition function, ∙ R : S × A → R, a reward function. 12

Slide 35

Slide 35 text

Partially-Observable Markov Decision Process A Partially-Observable MDP (POMDP) (Puterman, 1994) is characterized by a tuple ⟨S, A, T, R, O, Q⟩: ∙ S, a set of states, ∙ A, a set of actions, ∙ T : S × A → Pr(S), a transition function, ∙ R : S × A → R, a reward function, ∙ O, an observation set, ∙ Q : S × A → Pr(O), an observation function. 13

Slide 36

Slide 36 text

Mixed-Observability Markov Decision Process A Mixed-Observability MDP (MOMDP) (Ong et al., 2010) is characterized by a tuple ⟨Sv, Sh, A, T, R, Ov, Oh, Q⟩: ∙ Sv, Sh , a visible and hidden parts of the state, ∙ A, a set of actions, ∙ T : Sv × A × Sh → Pr(Sv × Sh), a transition function, ∙ R : Sv × A × Sh → R, a reward function, ∙ Ov = Sv, an observation set on the visible part of the state, ∙ Oh , an observation set on the hidden part of the state, ∙ Q : Sv × A × Sh → Pr(Ov × Oh), an observation function. 14

Slide 37

Slide 37 text

transformation to a momdp

Slide 38

Slide 38 text

Transformation to a MOMDP An APS from the point of view of agent 1 can be transformed to a MOMDP: ∙ Sv = S1 × P, Sh = S2 ∙ A = {prem(r) ⇒ m|r ∈ R1 and m ∈ acts(r)} ∙ Ov = Sv and Oh = ∅ ∙ Q(⟨sv, sh⟩, a, ⟨sv⟩) = 1, otherwise 0 ∙ T, see after 16

Slide 39

Slide 39 text

Transformation to a MOMDP: Transition function Application set Let Cs(Ri) be the set of rules of Ri that can be ﬁred in state s. The application set Fr(m, s) is the set of predicates resulting from the application of act m of a rule r on s. If r cannot be ﬁred in s, Fr(m, s) = s. ∙ s, a state and r : p ⇒ m, an action s.t. r ∈ A ∙ s′ = Fr(m, s) ∙ r′ ∈ Cs′ (R2) s.t. r′ : p′ ⇒ [π1/m1, . . . , πn/mn] ∙ s′′ i = Fr′ (mi, s′) ∙ T(s, r, s′′ i ) = πi 17

Slide 40

Slide 40 text

Reward function For the reward function: ∙ with Dung’s semantics: positive reward for each part holding ∙ can be generalized: General Gradual Valuation (Cayrol and Lagasquie-Schiex, 2005) 18

Slide 41

Slide 41 text

Transformation to a MOMDP Model sizes: APS : 8 arguments, 8 attacks, 6 rules POMDP : 4 294 967 296 states MOMDP : 16 777 216 states Untractable instances → need to optimize at the root 19

Slide 42

Slide 42 text

solving an aps

Slide 43

Slide 43 text

Solving an APS Two algorithms to solve MOMDPs: ∙ MO-IP (Araya-López et al., 2010), IP of POMDP on MOMDP (exact method) ∙ MO-SARSOP (Ong et al., 2010), SARSOP of POMDP on MOMDP (approximate method albeit very efﬁcient) Two kinds of optimizations: with or without dependencies on the initial state 21

Slide 44

Slide 44 text

Optimizations without dependencies Irr. Prunes irrelevant arguments 22

Slide 45

Slide 45 text

Optimizations without dependencies Irr. Prunes irrelevant arguments Enth. Infers attacks 22

Slide 46

Slide 46 text

Optimizations without dependencies Irr. Prunes irrelevant arguments Enth. Infers attacks Dom. Removes dominated arguments 22

Slide 47

Slide 47 text

Optimizations without dependencies Irr. Prunes irrelevant arguments Enth. Infers attacks Dom. Removes dominated arguments Guarantee on the unicity and optimality of the solution. 22

Slide 48

Slide 48 text

Attacks graph Argument dominance If an argument is attacked by any unattacked argument, it is dominated. a f g b c d e h Figure: Attacks graph of Example 23

Slide 49

Slide 49 text

Optimization with dependencies Irr(s0) has to be reapplied each time the initial state changes. 24

Slide 50

Slide 50 text

Slide 51

Slide 51 text

Optimization with dependencies Irr(s0) has to be reapplied each time the initial state changes. 1. For each predicate that is never modiﬁed but used as premises: 1.1 Remove all the rules that are not compatible with the value of this predicate in the initial state. 1.2 For all remaining rules, remove the predicate from the premises. 2. For each remaining action of agent 1, track the rules of agent 2 compatible with the application of this action. If a rule of agent 2 is not compatible with any application of an action of agent 1, remove it. 24

Slide 52

Slide 52 text

experiments

Slide 53

Slide 53 text

Experiments We computed a solution for the e-sport problem with: ∙ MO-IP, which did not ﬁnish after tens of hours ∙ MO-SARSOP without optimizations, idem ∙ MO-SARSOP with optimizations, 4sec for the optimal solution 26

Slide 54

Slide 54 text

Experiments: Policy graph r1 1,1 start r1 2,2 r1 3,1 ∅ r1 3,1 ∅ r1 2,2 ∅ ∅ o2 o5 o4 o6 o7 o8 o5 o1 o7 o8 o3 o3 o4 Figure: Policy graph for Example 27

Slide 55

Slide 55 text

Experiments: More examples None Irr. Enth. Dom. Irr(s0). All Ex 1 — — — — — 0.56 Ex 2 3.3 0.3 0.3 0.4 0 0 Dv. — — — — — 32 6 1313 22 43 7 2.4 0.9 7 — 180 392 16 20 6.7 8 — — — — 319 45 9 — — — — — — Table: Computation time (in seconds) 28

Slide 56

Slide 56 text

conclusion and discussions

Slide 57

Slide 57 text

Conclusion We presented: 1. A new framework to represent more complex debate problems (APS) 2. A method to transform those problems to a MOMDP 3. Several optimizations that can be used outside of the context of MOMDP 4. A method to optimize actions of an agent in an APS 30

Slide 58

Slide 58 text

Perspectives We are currently working on using POMCP (Silver and Veness, 2010). We are also using HS3MDPs (Hadoux et al., 2014). 31

Slide 59

Slide 59 text

Questions? 32

Slide 60

Slide 60 text

Bibliography I Araya-López, M., Thomas, V., Buffet, O., and Charpillet, F. (2010). A closer look at MOMDPs. In 22nd IEEE International Conference on Tools with Artificial Intelligence (ICTAI). Cayrol, C. and Lagasquie-Schiex, M.-C. (2005). Graduality in argumentation. Journal of Artificial Intelligence Research (JAIR), 23:245–297. Dung, P. M. (1995). On the acceptability of arguments and its fundamental role in nonmonotonic reasoning, logic programming and n-person games. Artificial Intelligence, 77(2):321–358. 33

Slide 61

Slide 61 text

Bibliography II Hadoux, E., Beynier, A., and Weng, P. (2014). Solving Hidden-Semi-Markov-Mode Markov Decision Problems. In Straccia, U. and Calì, A., editors, Scalable Uncertainty Management, volume 8720 of Lecture Notes in Computer Science, pages 176–189. Springer International Publishing. Hunter, A. (2014). Probabilistic strategies in dialogical argumentation. In International Conference on Scalable Uncertainty Management (SUM’14) LNCS volume 8720. Ong, S. C., Png, S. W., Hsu, D., and Lee, W. S. (2010). Planning under uncertainty for robotic tasks with mixed observability. In The International Journal of Robotics Research. 34

Slide 62

Slide 62 text

Bibliography III Puterman, M. L. (1994). Markov Decision Processes: discrete stochastic dynamic programming. John Wiley & Sons. Silver, D. and Veness, J. (2010). Monte-Carlo planning in large POMDPs. In Proceedings of the 24th Conference on Neural Information Processing Systems (NIPS), pages 2164–2172. 35