Lazy Abstraction for Markov Decision Processes

Slide 1

Slide 1 text

Lazy Abstraction for Markov Decision Processes Dániel Szekeres https://ftsrg.mit.bme.hu/ Lazy Abstraction for MDPs

Slide 2

Slide 2 text

Lazy Abstraction for MDPs 2 System under analysis External systems User behavior Physical environment Component failures Probabilistic behavior Non-deterministic behavior Context: Reliability analysis

Slide 3

Slide 3 text

Lazy Abstraction for MDPs 3 Markov Decision Processes (MDP) Multiple actions available in each state → Non-deterministic behavior Resulting state sampled from a distribution → Probabilistic behavior Discrete set of states Commonly described through higher-level formalisms

Slide 4

Slide 4 text

Lazy Abstraction for MDPs 4 • A set of state variables • A set of commands, each having: – A Boolean guard expression over the state variables – A probability distribution over effects changing the variables Probabilistic Guarded Commands

Slide 5

Slide 5 text

Lazy Abstraction for MDPs 5 State-space explosion Exponentially large state space in the description size Hinders verifying complex systems in practice Exacerbated by numerical computations in probabilistic model checking

Slide 6

Slide 6 text

6 • Stop exploring new states when enough information is available Counteracting state space explosion • Merges similar concrete states into abstract states • Needs to be conservative Partial state space exploration Abstraction Lazy Abstraction for MDPs

Slide 7

Slide 7 text

7 Counteracting state space explosion Partial state space exploration + Abstraction Lazy Abstraction for MDPs • Explore only a part of the abstract state space • Already used in non-probabilistic abstraction-based model-checking • Not in probabilistic model-checking – Existing MDP abstraction-refinement algorithms rely on the whole abstract state space – Lazy abstraction synergizes much better with partial exploration → needs to be adapted for MDPs

Slide 8

Slide 8 text

Partial state-space exploration for MDPs: BRTDP Lazy Abstraction for MDPs 10

Slide 9

Slide 9 text

Lazy Abstraction for MDPs 11 Bounded Real-Time Dynamic Programming (BRTDP) [1.0, 1.0] [0.0, 0.0] [0.0, 1.0] [0.0, 1.0] [0.0, 1.0] p=0.5 Simulate traces → update only simulated states Maintain both a lower and an upper value approximation [1.0, 1.0] [0.0, 0.0] [1.0, 1.0] [0.0, 0.0] [0.0, 1.0] [0.0, 1.0] Iterate until convergence: Initial state has small enough interval [0.0, 1.0]

Slide 10

Slide 10 text

Lazy Abstraction for MDPs 12 Bounded Real-Time Dynamic Programming (BRTDP) [1.0, 1.0] [0.0, 0.0] [0.0, 1.0] [0.0, 1.0] [0.0, 1.0] p=0.5 Simulate traces → update only simulated states Maintain both a lower and an upper value approximation [1.0, 1.0] [0.0, 0.0] [1.0, 1.0] [0.0, 0.0] [0.0, 1.0] [0.0, 1.0] Iterate until convergence: Initial state has small enough interval [0.0, 1.0]

Slide 11

Slide 11 text

Lazy Abstraction for MDPs 13 Bounded Real-Time Dynamic Programming (BRTDP) [1.0, 1.0] [0.0, 0.0] [0.0, 1.0] [0.0, 1.0] [0.0, 1.0] p=0.5 Simulate traces → update only simulated states Maintain both a lower and an upper value approximation [1.0, 1.0] [0.0, 0.0] [1.0, 1.0] [0.0, 0.0] [0.5, 1.0] [0.0, 1.0] Iterate until convergence: Initial state has small enough interval [0.25, 1.0]

Slide 12

Slide 12 text

Lazy Abstraction for MDPs 14 Bounded Real-Time Dynamic Programming (BRTDP) [1.0, 1.0] [0.0, 0.0] [0.0, 1.0] [0.0, 1.0] [0.0, 1.0] p=0.5 Simulate traces → update only simulated states Maintain both a lower and an upper value approximation [1.0, 1.0] [0.0, 0.0] [1.0, 1.0] [0.0, 0.0] [0.5, 1.0] [0.0, 1.0] Iterate until convergence: Initial state has small enough interval [0.25, 1.0]

Slide 13

Slide 13 text

Lazy Abstraction for MDPs 15 Bounded Real-Time Dynamic Programming (BRTDP) [1.0, 1.0] [0.0, 0.0] [0.0, 1.0] [1.0, 1.0] [0.0, 1.0] p=0.5 Simulate traces → update only simulated states Maintain both a lower and an upper value approximation [1.0, 1.0] [0.0, 0.0] [1.0, 1.0] [0.0, 0.0] [1.0, 1.0] [0.0, 1.0] Iterate until convergence: Initial state has small enough interval [0.5, 1.0]

Slide 14

Slide 14 text

Lazy Abstraction for MDPs 16 Bounded Real-Time Dynamic Programming (BRTDP) [1.0, 1.0] [0.0, 0.0] [0.0, 1.0] [1.0, 1.0] [0.0, 1.0] p=0.5 Simulate traces → update only simulated states Maintain both a lower and an upper value approximation [1.0, 1.0] [0.0, 0.0] [1.0, 1.0] [0.0, 0.0] [1.0, 1.0] [0.0, 1.0] Iterate until convergence: Initial state has small enough interval [0.5, 1.0]

Slide 15

Slide 15 text

Lazy Abstraction for MDPs 17 Bounded Real-Time Dynamic Programming (BRTDP) [1.0, 1.0] [0.0, 0.0] [0.0, 1.0] [1.0, 1.0] [0.0, 1.0] p=0.5 Simulate traces → update only simulated states Maintain both a lower and an upper value approximation [1.0, 1.0] [0.0, 0.0] [1.0, 1.0] [0.0, 0.0] [1.0, 1.0] [1.0, 1.0] Iterate until convergence: Initial state has small enough interval [1.0, 1.0]

Slide 16

Slide 16 text

Lazy abstraction for MDPs Lazy Abstraction for MDPs 18

Slide 17

Slide 17 text

19 CounterExample-Guided Abstraction Refinement Lazy Abstraction for MDPs Concretize counterexample Construct abstract model Check abstract model Refine precision Output: Property violated Output: Property satisfied Property satisfied? Concretizable? Yes No Yes No Starts with trivial abstraction Based on the counterexample

Slide 18

Slide 18 text

Lazy Abstraction for MDPs 20 • Builds on the idea of CEGAR • Merged abstract exploration and refinement • Precision is local to each node in the abstract state graph • Refinement is performed locally on the required nodes • Better suited for combination with BRTDP than non-lazy probabilistic CEGAR approaches Lazy abstraction Π0 Π0 Π0 Π1 Π1 Π1 Π1 Π1

Slide 19

Slide 19 text

Lazy Abstraction for MDPs 21 • Several different lazy abstraction implementations (BLAST, Impact, etc.) → We use an Adaptive Simulation Graph-based version • Abstract model: Probabilistic Adaptive Simulation Graph (PASG) • Domain-agnostic in general • Currently implemented with Explicit Value Abstraction: Some variables are tracked exactly, others are unknown Lazy abstraction for MDPs x=0, y=0 x=0, y=1 x=0, y=? x=1, y=0 x=1, y=1 x=1, y=?

Slide 20

Slide 20 text

Lazy Abstraction for MDPs 22 𝑛0 𝐿𝑐 : 𝑥 = 0, 𝑦 = 0 𝐿𝑎 : 𝑥 = 0 Initial node: - concrete label is the concrete initial state - abstract label is as coarse as possible Probabilistic Adaptive Simulation Graph (ASG): - Nodes are labeled by a concrete state - and an abstract state (describing a set of concrete states) that contains it - The concrete state represents all states in the abstract state w.r.t. available “behaviors” (action sequences)

Slide 21

Slide 21 text

Lazy Abstraction for MDPs 23 𝑛0 𝐿𝑐 : 𝑥 = 0, 𝑦 = 0 𝐿𝑎 : 𝑥 = 0 𝑛3 𝐿𝑐 : 𝑥 = 1, 𝑦 = 2 𝐿𝑎 : 𝑥 = 1 𝑛2 𝐿𝑐 : 𝑥 = 1, 𝑦 = 0 𝐿𝑎 : 𝑥 = 1 𝑛1 𝐿𝑐 : 𝑥 = 0, 𝑦 = 0 𝐿𝑎 : 𝑥 = 0 0.2 0.8 1.0 𝑐1 𝑐2 Expansion: - Select an action enabled in the concrete state - Compute the image of the concrete state - Overapproximate the image of the abstract state Adaptive Simulation Graph (ASG): - Nodes are labeled by a concrete state - and an abstract state (describing a set of concrete states) that contains it - The concrete state represents all states in the abstract state w.r.t. available “behaviors” (action sequences) Probabilistic Adaptive Simulation Graph (ASG): - Nodes are labeled by a concrete state - and an abstract state (describing a set of concrete states) that contains it - The concrete state represents all states in the abstract state w.r.t. available “behaviors” (action sequences)

Slide 22

Slide 22 text

Lazy Abstraction for MDPs 24 𝑛0 𝐿𝑐 : 𝑥 = 0, 𝑦 = 0 𝐿𝑎 : 𝑥 = 0 𝑛3 𝐿𝑐 : 𝑥 = 1, 𝑦 = 2 𝐿𝑎 : 𝑥 = 1 𝑛2 𝐿𝑐 : 𝑥 = 1, 𝑦 = 0 𝐿𝑎 : 𝑥 = 1 𝑛1 𝐿𝑐 : 𝑥 = 0, 𝑦 = 0 𝐿𝑎 : 𝑥 = 0 0.2 0.8 1.0 𝑐1 𝑐2 If an action is not enabled in any part of the abstract state, it is ignored X 𝑐3 Expansion: - Select an action enabled in the concrete state - Compute the image of the concrete state - Overapproximate the image of the abstract state

Slide 23

Slide 23 text

Lazy Abstraction for MDPs 25 𝑛0 𝐿𝑐 : 𝑥 = 0, 𝑦 = 0 𝐿𝑎 : 𝑥 = 0 𝑛3 𝐿𝑐 : 𝑥 = 1, 𝑦 = 2 𝐿𝑎 : 𝑥 = 1 𝑛2 𝐿𝑐 : 𝑥 = 1, 𝑦 = 0 𝐿𝑎 : 𝑥 = 1 𝑛1 𝐿𝑐 : 𝑥 = 0, 𝑦 = 0 𝐿𝑎 : 𝑥 = 0 0.2 0.8 1.0 𝑐1 𝑐2 𝑐𝑜𝑣𝑒𝑟 Covering: - If the new concrete state after expansion is already contained in another abstract state - A cover edge is created - Expansion of the covered node can be skipped

Slide 24

Slide 24 text

Lazy Abstraction for MDPs 26 𝑛0 𝐿𝑐 : 𝑥 = 0, 𝑦 = 0 𝐿𝑎 : 𝑥 = 0 𝑛3 𝐿𝑐 : 𝑥 = 1, 𝑦 = 2 𝐿𝑎 : 𝑥 = 1 𝑛2 𝐿𝑐 : 𝑥 = 1, 𝑦 = 0 𝐿𝑎 : 𝑥 = 1 𝑛1 𝐿𝑐 : 𝑥 = 0, 𝑦 = 0 𝐿𝑎 : 𝑥 = 0 0.2 0.8 1.0 𝑐1 𝑐2 𝑐𝑜𝑣𝑒𝑟 𝑛5 𝐿𝑐 : 𝑥 = 2, 𝑦 = 0 𝐿𝑎 : 𝑥 = 2 𝑛4 𝐿𝑐 : 𝑥 = 1, 𝑦 = 0 𝐿𝑎 : 𝑥 = 1 𝑐1 0.2 0.8

Slide 25

Slide 25 text

Lazy Abstraction for MDPs 27 𝑛0 𝐿𝑐 : 𝑥 = 0, 𝑦 = 0 𝐿𝑎 : 𝑥 = 0 𝑛3 𝐿𝑐 : 𝑥 = 1, 𝑦 = 2 𝐿𝑎 : 𝑥 = 1 𝑛2 𝐿𝑐 : 𝑥 = 1, 𝑦 = 0 𝐿𝑎 : 𝑥 = 1 𝑛1 𝐿𝑐 : 𝑥 = 0, 𝑦 = 0 𝐿𝑎 : 𝑥 = 0 0.2 0.8 1.0 𝑐1 𝑐2 𝑐𝑜𝑣𝑒𝑟 𝑛5 𝐿𝑐 : 𝑥 = 2, 𝑦 = 0 𝐿𝑎 : 𝑥 = 2 𝑛4 𝐿𝑐 : 𝑥 = 1, 𝑦 = 0 𝐿𝑎 : 𝑥 = 1 𝑐1 0.2 0.8 X , 𝑦 = 0 , 𝑦 = 0 , 𝑦 = 0 X 𝑐3 , 𝑦 = 0 , 𝑦 = 0

Slide 26

Slide 26 text

Lazy Abstraction for MDPs 28 PASG versions Upper-cover: • Direct adaptation of the original ASG for MDPs • Action that might be enabled somewhere in the abstract label must be enabled in the concrete • Upper approximation

Slide 27

Slide 27 text

Lazy Abstraction for MDPs 29 PASG versions Lower-cover: • Inverted representativity requirement • Action disabled somewhere in the abstract label must be disabled in the concrete • Lower approximation

Slide 28

Slide 28 text

Lazy Abstraction for MDPs 30 PASG versions Bi-cover: • Combines the upper- and lower-cover constraints • Provides exact numerical results • Resulting value is independent of the order of exploration

Slide 29

Slide 29 text

Lazy Abstraction for MDPs 31 Quantitative Analysis – Full Exploration 0 1 Upper-cover Lower-cover Original (concrete) • Construct full PASG → Analyze it as an MDP • Cover edges are deterministic actions • Any MDP analysis algorithm can be applied (value iteration variants, policy iteration, linear programming, …) • Provable guarantees for the target probability: Bi-cover

Slide 30

Slide 30 text

Lazy abstraction + BRTDP Lazy Abstraction for MDPs 32

Slide 31

Slide 31 text

Lazy Abstraction for MDPs 33 BRTDP reminder [1.0, 1.0] [0.0, 0.0] [0.0, 1.0] [1.0, 1.0] [0.0, 1.0] p=0.5 Simulate traces → update only simulated states Maintain both a lower and an upper value approximation [1.0, 1.0] [0.0, 0.0] [1.0, 1.0] [0.0, 0.0] [1.0, 1.0] [0.0, 1.0] Iterate until convergence: Initial state has small enough interval [0.5, 1.0]

Slide 32

Slide 32 text

Lazy Abstraction for MDPs 34 • Uses BRTDP for analysis • Merges PASG construction and numeric computations • PASG nodes are constructed during trace simulation Quantitative Analysis – On-the-fly Less states explored Less inconsistencies Coarser abstract labels Smaller abstract state space

Slide 33

Slide 33 text

Lazy Abstraction for MDPs 35 • Provable guarantees: • Convergence for finite state spaces: PASG is finished after a finite number of traces + BRTDP convergence results applied to the finished PASG • Guarantees for the target probability: Quantitative Analysis – On-the-fly 0 1 Upper-cover Lower-cover Original (concrete) Bi-cover

Slide 34

Slide 34 text

Lazy Abstraction for MDPs 36 Correctness of the on-the-fly analysis 𝑛0 𝐿𝑐 : 𝑥 = 0, 𝑦 = 0 𝐿𝑎 : 𝑥 = 0 𝑛3 𝐿𝑐 : 𝑥 = 1, 𝑦 = 2 𝐿𝑎 : 𝑥 = 1 𝑛2 𝐿𝑐 : 𝑥 = 1, 𝑦 = 0 𝐿𝑎 : 𝑥 = 1 𝑛1 𝐿𝑐 : 𝑥 = 0, 𝑦 = 0 𝐿𝑎 : 𝑥 = 0 0.2 0.8 1.0 𝑐1 𝑐2 𝑐𝑜𝑣𝑒𝑟 𝑛5 𝐿𝑐 : 𝑥 = 2, 𝑦 = 0 𝐿𝑎 : 𝑥 = 2 𝑛4 𝐿𝑐 : 𝑥 = 1, 𝑦 = 0 𝐿𝑎 : 𝑥 = 1 𝑐1 0.2 0.8

Slide 35

Slide 35 text

𝑛3 𝐿𝑐 : 𝑥 = 1, 𝑦 = 2 𝐿𝑎 : 𝑥 = 1 Lazy Abstraction for MDPs 37 𝑛0 𝐿𝑐 : 𝑥 = 0, 𝑦 = 0 𝐿𝑎 : 𝑥 = 0 𝑛2 𝐿𝑐 : 𝑥 = 1, 𝑦 = 0 𝐿𝑎 : 𝑥 = 1 𝑛1 𝐿𝑐 : 𝑥 = 0, 𝑦 = 0 𝐿𝑎 : 𝑥 = 0 0.2 0.8 1.0 𝑐1 𝑐2 𝑐𝑜𝑣𝑒𝑟 𝑛5 𝐿𝑐 : 𝑥 = 2, 𝑦 = 0 𝐿𝑎 : 𝑥 = 2 𝑛4 𝐿𝑐 : 𝑥 = 1, 𝑦 = 0 𝐿𝑎 : 𝑥 = 1 𝑐1 0.2 0.8 , 𝑦 = 0 , 𝑦 = 0 , 𝑦 = 0 𝑐3 Correctness of the on-the-fly analysis Refinement when n 5 is expanded → the cover edge is removed → the green trace does not exist in the finished PASG 𝑛5 𝐿𝑐 : 𝑥 = 2, 𝑦 = 0 𝐿𝑎 : 𝑥 = 2 𝑛4 𝐿𝑐 : 𝑥 = 1, 𝑦 = 0 𝐿𝑎 : 𝑥 = 1 𝑐1 0.2 0.8 But an equivalent one exists!

Slide 36

Slide 36 text

Lazy Abstraction for MDPs 38 (Preliminary) Measurements on the QComp benchmarks

Slide 37

Slide 37 text

Lazy Abstraction for MDPs 39 [1.0, 1.0] [0.0, 0.0] [1.0, 1.0] [0.0, 0.0] [1.0, 1.0] [0.0, 0.0] Current state: • Upper/lower/bi-cover PASG • Full construction / BRTDP • Only for maximal probability • Explicit Value Domain → Implemented in the Theta model checker Future work: Predicate domain

Slide 38

Slide 38 text

Thank you for your attention