Optimization of probabilistic argumentation with Markov processes

optimization of probabilistic argumentation with markov processes E. Hadoux1, A.
Beynier1, N. Maudet1, P. Weng2 and A. Hunter3 Tue., Sept. 29th (1) Sorbonne Universités, UPMC Univ Paris 6, UMR 7606, LIP6, F-75005, Paris, France (2) SYSU-CMU Joint Institute of Engineering, Guangzhou, China SYSU-CMU Shunde International Joint Research Institute, Shunde, China (3) Department of Computer Science, University College London, Gower Street, London WC1E 6BT, UK

Introduction ∙ Debate argumentation problems between two agents 1

Introduction ∙ Debate argumentation problems between two agents ∙ Probabilistic
executable logic to improve expressivity 1

executable logic to improve expressivity ∙ New class of problems: Argumentation Problem with Probabilistic Strategies (APS) (Hunter, 2014) 1

executable logic to improve expressivity ∙ New class of problems: Argumentation Problem with Probabilistic Strategies (APS) (Hunter, 2014) ∙ Purpose of this work: optimize the sequence of arguments of one agent 1

executable logic to improve expressivity ∙ New class of problems: Argumentation Problem with Probabilistic Strategies (APS) (Hunter, 2014) ∙ Purpose of this work: optimize the sequence of arguments of one agent There will be abuse of the word predicate! 1

formalization

Formalization of a debate problem ∙ Turn-based game between two
agents ∙ Rules to ﬁre in order to attack arguments of the opponent and revise knowledge 3

agents ∙ Rules to ﬁre in order to attack arguments of the opponent and revise knowledge Let us deﬁne a debate problem with: ∙ A, the set or arguments 3

agents ∙ Rules to ﬁre in order to attack arguments of the opponent and revise knowledge Let us deﬁne a debate problem with: ∙ A, the set or arguments ∙ E, the set of attacks 3

agents ∙ Rules to ﬁre in order to attack arguments of the opponent and revise knowledge Let us deﬁne a debate problem with: ∙ A, the set or arguments ∙ E, the set of attacks ∙ P = 2A × 2E, the public space gathering voiced arguments 3

agents ∙ Rules to ﬁre in order to attack arguments of the opponent and revise knowledge Let us deﬁne a debate problem with: ∙ A, the set or arguments ∙ E, the set of attacks ∙ P = 2A × 2E, the public space gathering voiced arguments ∙ Two agents: agent 1 and agent 2 3

Notation ∙ Arguments: literals (e.g., a, b, c) 4

Notation ∙ Arguments: literals (e.g., a, b, c) ∙ Attacks:
e(x, y) if x attacks y 4

e(x, y) if x attacks y ∙ Args. in public (resp. private) space: a(x) (resp. hi(x)) 4

e(x, y) if x attacks y ∙ Args. in public (resp. private) space: a(x) (resp. hi(x)) ∙ Goals: ∧ k g(xk) (resp. g(¬xk)) if xk is (resp. is not) accepted in the public space (Dung, 1995) 4

e(x, y) if x attacks y ∙ Args. in public (resp. private) space: a(x) (resp. hi(x)) ∙ Goals: ∧ k g(xk) (resp. g(¬xk)) if xk is (resp. is not) accepted in the public space (Dung, 1995) ∙ Rules: prem ⇒ Pr(Acts) 4

e(x, y) if x attacks y ∙ Args. in public (resp. private) space: a(x) (resp. hi(x)) ∙ Goals: ∧ k g(xk) (resp. g(¬xk)) if xk is (resp. is not) accepted in the public space (Dung, 1995) ∙ Rules: prem ⇒ Pr(Acts) ∙ Premises: conjunctions of e(, ), a(), hi() 4

e(x, y) if x attacks y ∙ Args. in public (resp. private) space: a(x) (resp. hi(x)) ∙ Goals: ∧ k g(xk) (resp. g(¬xk)) if xk is (resp. is not) accepted in the public space (Dung, 1995) ∙ Rules: prem ⇒ Pr(Acts) ∙ Premises: conjunctions of e(, ), a(), hi() ∙ Acts: conjunctions of ⊞, ⊟ on e(, ), a() and ⊕, ⊖ on hi() 4

Formalization of an APS An APS is characterized (from the
point of view of agent 1) by ⟨A, E, G, S1, g1, g2, S2, P, R1, R2⟩: ∙ A, E, P as speciﬁed above ∙ G, the set of all possible goals ∙ Si , the set of private states for agent i ∙ gi ∈ G, the given goal for agent i ∙ Ri , the set of rules for agent i 5

Example: Arguments Is e-sport a sport? 6

Example: Arguments Is e-sport a sport? a E-sport is a
sport b E-sport requires focusing, precision and generates tiredness c Not all sports are physical d Sports not referenced by IOC exist e Chess is a sport f E-sport is not a physical activity g E-sport is not referenced by IOC h Working requires focusing and generates tiredness but is not a sport 6

Example: Formalization ∙ A = {a, b, c, d, e,
f, g, h} 7

f, g, h} ∙ E = { e(f, a), e(g, a), e(b, f), e(c, f), e(h, b), e(g, c), e(d, g), e(e, g)} 7

f, g, h} ∙ E = { e(f, a), e(g, a), e(b, f), e(c, f), e(h, b), e(g, c), e(d, g), e(e, g)} ∙ g1 = g(a) 7

f, g, h} ∙ E = { e(f, a), e(g, a), e(b, f), e(c, f), e(h, b), e(g, c), e(d, g), e(e, g)} ∙ g1 = g(a) ∙ R1 = {h1(a) ⇒ ⊞a(a), h1(b) ∧ a(f) ∧ h1(c) ∧ e(b, f) ∧ e(c, f) ⇒ 0.5 : ⊞a(b) ∧ ⊞e(b, f) ∨ 0.5 : ⊞a(c) ∧ ⊞e(c, f), h1(d) ∧ a(g) ∧ h1(e) ∧ e(d, g) ∧ e(e, g) ⇒ 0.8 : ⊞a(e) ∧ ⊞e(e, g) ∨ 0.2 : ⊞a(d) ∧ ⊞e(d, g)} 7

Example: Formalization ∙ R2 = {h2(h) ∧ a(b) ∧ e(h,
b) ⇒ ⊞a(h) ∧ ⊞e(h, b), h2(g) ∧ a(c) ∧ e(g, c) ⇒ ⊞a(g) ∧ ⊞e(g, c), a(a) ∧ h2(f) ∧ h2(g) ∧ e(f, a) ⇒ 0.8 : ⊞a(f) ∧ ⊞e(f, a) ∨ 0.2 : ⊞a(g) ∧ ⊞e(g, a)} ∙ Initial state: h1(a, b, c, d, e), {}, h2(f, g, h) 8

Attacks graph a g f c b d e h
Figure: Graph of arguments of Example e-sport 9

Probabilistic Finite State Machine: Graph APS → Probabilistic Finite State
Machine σ1 start σ2 σ3 σ4 σ5 σ6 σ7 σ8 σ9 σ10 σ11 σ12 1 0.8 0.2 0.5 0.5 1 1 0.8 0.2 0.8 0.2 Figure: PFSM of Example e-sport 10

Probabilistic Finite State Machine To optimize the sequence of arguments
for agent 1, we could optimize the PFSM but: 11

for agent 1, we could optimize the PFSM but: 1. depends of the initial state 11

for agent 1, we could optimize the PFSM but: 1. depends of the initial state 2. requires knowledge of the private state of the opponent 11

for agent 1, we could optimize the PFSM but: 1. depends of the initial state 2. requires knowledge of the private state of the opponent Using Markov models, we can relax assumptions 1 and 2. Moreover, the APS formalization can be modiﬁed in order to comply with the Markov assumption. 11

Markov Decision Process A Markov Decision Process (MDP) (Puterman, 1994)
is characterized by a tuple ⟨S, A, T, R⟩: ∙ S, a set of states, ∙ A, a set of actions, ∙ T : S × A → Pr(S), a transition function, ∙ R : S × A → R, a reward function. 12

Partially-Observable Markov Decision Process A Partially-Observable MDP (POMDP) (Puterman, 1994)
is characterized by a tuple ⟨S, A, T, R, O, Q⟩: ∙ S, a set of states, ∙ A, a set of actions, ∙ T : S × A → Pr(S), a transition function, ∙ R : S × A → R, a reward function, ∙ O, an observation set, ∙ Q : S × A → Pr(O), an observation function. 13

Mixed-Observability Markov Decision Process A Mixed-Observability MDP (MOMDP) (Ong et
al., 2010) is characterized by a tuple ⟨Sv, Sh, A, T, R, Ov, Oh, Q⟩: ∙ Sv, Sh , a visible and hidden parts of the state, ∙ A, a set of actions, ∙ T : Sv × A × Sh → Pr(Sv × Sh), a transition function, ∙ R : Sv × A × Sh → R, a reward function, ∙ Ov = Sv, an observation set on the visible part of the state, ∙ Oh , an observation set on the hidden part of the state, ∙ Q : Sv × A × Sh → Pr(Ov × Oh), an observation function. 14

transformation to a momdp

Transformation to a MOMDP An APS from the point of
view of agent 1 can be transformed to a MOMDP: ∙ Sv = S1 × P, Sh = S2 ∙ A = {prem(r) ⇒ m|r ∈ R1 and m ∈ acts(r)} ∙ Ov = Sv and Oh = ∅ ∙ Q(⟨sv, sh⟩, a, ⟨sv⟩) = 1, otherwise 0 ∙ T, see after 16

Transformation to a MOMDP: Transition function Application set Let Cs(Ri)
be the set of rules of Ri that can be ﬁred in state s. The application set Fr(m, s) is the set of predicates resulting from the application of act m of a rule r on s. If r cannot be ﬁred in s, Fr(m, s) = s. ∙ s, a state and r : p ⇒ m, an action s.t. r ∈ A ∙ s′ = Fr(m, s) ∙ r′ ∈ Cs′ (R2) s.t. r′ : p′ ⇒ [π1/m1, . . . , πn/mn] ∙ s′′ i = Fr′ (mi, s′) ∙ T(s, r, s′′ i ) = πi 17

Reward function For the reward function: ∙ with Dung’s semantics:
positive reward for each part holding ∙ can be generalized: General Gradual Valuation (Cayrol and Lagasquie-Schiex, 2005) 18

Transformation to a MOMDP Model sizes: APS : 8 arguments,
8 attacks, 6 rules POMDP : 4 294 967 296 states MOMDP : 16 777 216 states Untractable instances → need to optimize at the root 19

solving an aps

Solving an APS Two algorithms to solve MOMDPs: ∙ MO-IP
(Araya-López et al., 2010), IP of POMDP on MOMDP (exact method) ∙ MO-SARSOP (Ong et al., 2010), SARSOP of POMDP on MOMDP (approximate method albeit very efﬁcient) Two kinds of optimizations: with or without dependencies on the initial state 21

Optimizations without dependencies Irr. Prunes irrelevant arguments 22

Optimizations without dependencies Irr. Prunes irrelevant arguments Enth. Infers attacks
22

Dom. Removes dominated arguments 22

Dom. Removes dominated arguments Guarantee on the unicity and optimality of the solution. 22

Attacks graph Argument dominance If an argument is attacked by
any unattacked argument, it is dominated. a f g b c d e h Figure: Attacks graph of Example 23

Optimization with dependencies Irr(s0) has to be reapplied each time
the initial state changes. 24

the initial state changes. 1. For each predicate that is never modiﬁed but used as premises: 1.1 Remove all the rules that are not compatible with the value of this predicate in the initial state. 1.2 For all remaining rules, remove the predicate from the premises. 24

the initial state changes. 1. For each predicate that is never modiﬁed but used as premises: 1.1 Remove all the rules that are not compatible with the value of this predicate in the initial state. 1.2 For all remaining rules, remove the predicate from the premises. 2. For each remaining action of agent 1, track the rules of agent 2 compatible with the application of this action. If a rule of agent 2 is not compatible with any application of an action of agent 1, remove it. 24

experiments

Experiments We computed a solution for the e-sport problem with:
∙ MO-IP, which did not ﬁnish after tens of hours ∙ MO-SARSOP without optimizations, idem ∙ MO-SARSOP with optimizations, 4sec for the optimal solution 26

Experiments: Policy graph r1 1,1 start r1 2,2 r1 3,1
∅ r1 3,1 ∅ r1 2,2 ∅ ∅ o2 o5 o4 o6 o7 o8 o5 o1 o7 o8 o3 o3 o4 Figure: Policy graph for Example 27

Experiments: More examples None Irr. Enth. Dom. Irr(s0). All Ex
1 — — — — — 0.56 Ex 2 3.3 0.3 0.3 0.4 0 0 Dv. — — — — — 32 6 1313 22 43 7 2.4 0.9 7 — 180 392 16 20 6.7 8 — — — — 319 45 9 — — — — — — Table: Computation time (in seconds) 28

conclusion and discussions

Conclusion We presented: 1. A new framework to represent more
complex debate problems (APS) 2. A method to transform those problems to a MOMDP 3. Several optimizations that can be used outside of the context of MOMDP 4. A method to optimize actions of an agent in an APS 30

Perspectives We are currently working on using POMCP (Silver and
Veness, 2010). We are also using HS3MDPs (Hadoux et al., 2014). 31

Questions? 32

Bibliography I Araya-López, M., Thomas, V., Buffet, O., and Charpillet,
F. (2010). A closer look at MOMDPs. In 22nd IEEE International Conference on Tools with Artificial Intelligence (ICTAI). Cayrol, C. and Lagasquie-Schiex, M.-C. (2005). Graduality in argumentation. Journal of Artificial Intelligence Research (JAIR), 23:245–297. Dung, P. M. (1995). On the acceptability of arguments and its fundamental role in nonmonotonic reasoning, logic programming and n-person games. Artificial Intelligence, 77(2):321–358. 33

Bibliography II Hadoux, E., Beynier, A., and Weng, P. (2014).
Solving Hidden-Semi-Markov-Mode Markov Decision Problems. In Straccia, U. and Calì, A., editors, Scalable Uncertainty Management, volume 8720 of Lecture Notes in Computer Science, pages 176–189. Springer International Publishing. Hunter, A. (2014). Probabilistic strategies in dialogical argumentation. In International Conference on Scalable Uncertainty Management (SUM’14) LNCS volume 8720. Ong, S. C., Png, S. W., Hsu, D., and Lee, W. S. (2010). Planning under uncertainty for robotic tasks with mixed observability. In The International Journal of Robotics Research. 34

Bibliography III Puterman, M. L. (1994). Markov Decision Processes: discrete
stochastic dynamic programming. John Wiley & Sons. Silver, D. and Veness, J. (2010). Monte-Carlo planning in large POMDPs. In Proceedings of the 24th Conference on Neural Information Processing Systems (NIPS), pages 2164–2172. 35

Optimization of probabilistic argumentation wit...

Optimization of probabilistic argumentation with Markov processes

More Decks by Emmanuel Hadoux

Other Decks in Research

Featured

Transcript