Slide 1

Slide 1 text

How to Evaluate Process Mining Algorithms Prof. Dr. Jan Mendling

Slide 2

Slide 2 text

1. What is Process Mining? 2. Current Evaluation Practices in Process Mining? 3. Methodological Frameworks for Evaluating Process Mining Algorithm 4. Epistemological Tasks for Process Mining Algorithms 5. Properties of Algorithms and Datasets 6. Research Framework for Process Mining Algorithms Agenda SEITE 2

Slide 3

Slide 3 text

1. What is Process Mining? 2. Current Evaluation Practices in Process Mining? 3. Methodological Frameworks for Evaluating Process Mining Algorithm 4. Epistemological Tasks for Process Mining Algorithms 5. Properties of Algorithms and Datasets 6. Research Framework for Process Mining Algorithms Agenda SEITE 3

Slide 4

Slide 4 text

The BPM Lifecycle SEITE 4

Slide 5

Slide 5 text

What is Process Mining? SEITE 5

Slide 6

Slide 6 text

Process Mining 6 / event log discovered model Discovery Conformance Deviance Difference diagnostics Performance input model Enhanced model event log’

Slide 7

Slide 7 text

•Apromore •ProM Open-source •Disco Lightweight •Minit •myInvenio •QPR Process Analyzer •Signavio Process Intelligence •StereoLOGIC Discovery Analyst •Lana Labs Mid-range •ARIS Process Performance Manager •Celonis Process Mining •Perceptive Process Mining (Lexmark) •Interstage Process Discovery (Fujitsu) Heavyweight Process Mining Tools 7

Slide 8

Slide 8 text

Automated Process Discovery 8 CID Task Time Stamp … 13219 Enter Loan Application 2007-11-09 T 11:20:10 - 13219 Retrieve Applicant Data 2007-11-09 T 11:22:15 - 13220 Enter Loan Application 2007-11-09 T 11:22:40 - 13219 Compute Installments 2007-11-09 T 11:22:45 - 13219 Notify Eligibility 2007-11-09 T 11:23:00 - 13219 Approve Simple Application 2007-11-09 T 11:24:30 - 13220 Compute Installements 2007-11-09 T 11:24:35 - … … … …

Slide 9

Slide 9 text

α-miner Basic Idea: Ordering relations 9 • Direct succession: x>y iff for some case x is directly followed by y. • Causality: x→y iff x>y and not y>x. • Parallel: x||y iff x>y and y>x • Unrelated: x#y iff not x>y and not y>x. case 1 : task A case 2 : task A case 3 : task A case 3 : task B case 1 : task B case 1 : task C case 2 : task C case 4 : task A case 2 : task B ... A>B A>C B>C B>D C>B C>D E>F A→B A→C B→D C→D E→F B||C C||B ABCD ACBD EF

Slide 10

Slide 10 text

Heuristic Miner

Slide 11

Slide 11 text

Process Model discovered with Heuristics Miner

Slide 12

Slide 12 text

Process Model discovered with Inductive Miner

Slide 13

Slide 13 text

Process Model discovered with Structured Miner

Slide 14

Slide 14 text

1. What is Process Mining? 2. Current Evaluation Practices in Process Mining? 3. Methodological Frameworks for Evaluating Process Mining Algorithm 4. Epistemological Tasks for Process Mining Algorithms 5. Properties of Algorithms and Datasets 6. Research Framework for Process Mining Algorithms Agenda SEITE 14

Slide 15

Slide 15 text

Augusto et al: Automated Discovery of Process Models from Event Logs: Review and Benchmark, TKDE (2019) Model Type Model Language Constructs Implementation Evaluation

Slide 16

Slide 16 text

• 31 of 35 algorithms evaluated with real-life logs • 11 of 35 with synthetic logs • 13 of 35 with artificial logs “... we observed a growing trend in employing publicly-available logs, as opposed to private logs which hamper the replicability of the results due to not being accessible.” “… several methods used a selection of the logs made available by the Business Process Intelligence Challenge (BPIC).” “… great majority of the methods do not have a working or available implementation.” (Augusto et al. 2017) Type of Evaluation Data

Slide 17

Slide 17 text

“Another limitation is the use of only 24 event logs, which to some extent limits the generalizability of the conclusions. However, the event logs included in the evaluation are all real-life logs of different sizes and features, including different application domains. To mitigate this limitation, we have structured the released benchmarking toolset in such a way that the benchmark can be seamlessly rerun with additional datasets.” (Augusto et al. 2017) Can we do better? Threats to Validity

Slide 18

Slide 18 text

1. What is Process Mining? 2. Current Evaluation Practices in Process Mining? 3. Methodological Frameworks for Evaluating Algorithm 4. Epistemological Tasks for Process Mining Algorithms 5. Properties of Algorithms and Datasets 6. Research Framework for Process Mining Algorithms Agenda SEITE 18

Slide 19

Slide 19 text

Science of the Artifical SEITE 19

Slide 20

Slide 20 text

Design Science Process SEITE 20 Peffers et al. 2008

Slide 21

Slide 21 text

Artefact: sth. that is or can be transformed into matter Design theory: prescriptive -> how we should do something (Λ Knowledge) Kernel theory: descriptive -> why it works (Ω Knowledge) Grand theories are all-encompassing theories Setting the philosophical ground Gregor/Hevner 2013

Slide 22

Slide 22 text

What can DSR contribute?

Slide 23

Slide 23 text

What is the Claim? Gregor, Hevner 2013

Slide 24

Slide 24 text

Improvement: Compare with best Alternatives Gregor, Hevner 2013

Slide 25

Slide 25 text

Exaptation: Show that it works Gregor, Hevner 2013

Slide 26

Slide 26 text

Invention: Start a Business Gregor, Hevner 2013

Slide 27

Slide 27 text

Algorithm Engineering: Hypotheses and Falsification 27 Sanders 2009

Slide 28

Slide 28 text

Engineering in the Philosophy of Science 28 Staples 2014

Slide 29

Slide 29 text

1. What is Process Mining? 2. Current Evaluation Practices in Process Mining? 3. Methodological Frameworks for Evaluating Process Mining Algorithm 4. Epistemological Tasks for Process Mining Algorithms 5. Properties of Algorithms and Datasets 6. Research Framework for Process Mining Algorithms Agenda SEITE 29

Slide 30

Slide 30 text

Staples: , ; ⊢ R(x,a) Satisfaction Question: , ; ⊢ R(x,a) with Algorithm a Input data System Design Requirements Improvement Question: , ′ ; ′ > , ; Epistemological Tasks in Process Mining

Slide 31

Slide 31 text

• Internal Validity: Causality of relationship between treatment and outcome • External Validity: Generalizibility to the scope of study • Construct Validity: Correctness of operationalization Validity of Satisfaction Wohlin et al. 2000 • Internal Validity of , ; ⊢ R(x,a) Causality of , and satisfaction • External Validity: Generalizibility of , to scope of study • Construct Validity: Correctness of operationaling satisfaction

Slide 32

Slide 32 text

• Internal Validity of , 𝑎 ; 𝑎 > , ; : Causality of relationship between , , 𝑎 and improvement • External Validity: Generalizibility of , , 𝑎 to scope of study • Construct Validity: Correctness of operationaling improvement Validity of Improvement

Slide 33

Slide 33 text

• Classes of Input data C() • System • Design • Requirements (), ; ⊢ R( C(x), a) Generalization Question

Slide 34

Slide 34 text

1. What is Process Mining? 2. Current Evaluation Practices in Process Mining? 3. Methodological Frameworks for Evaluating Process Mining Algorithm 4. Epistemological Tasks for Process Mining Algorithms 5. Properties of Algorithms and Datasets 6. Research Framework for Process Mining Algorithms Agenda SEITE 34

Slide 35

Slide 35 text

• Using behavioural relationships • Using decomposition • Using frequencies • Using payload data • Using resource information Properties of Algorithm Design

Slide 36

Slide 36 text

• Fitness • Precision • Simplicity • Generalization Measuring Performance Requirements

Slide 37

Slide 37 text

• Size of the log • Number of variants • Size of alphabet • …more to be added Metrics for Classes of Input Data

Slide 38

Slide 38 text

1. What is Process Mining? 2. Current Evaluation Practices in Process Mining? 3. Methodological Frameworks for Evaluating Process Mining Algorithm 4. Epistemological Tasks for Process Mining Algorithms 5. Properties of Algorithms and Datasets 6. Research Framework for Process Mining Algorithms Agenda SEITE 38

Slide 39

Slide 39 text

• , ; ⊢ R( C(x), a ) • Better understanding of - informed by demographics of input data • Better understanding of - informed by theories on design principles • Extending the spectrum of • How to find hypotheses: Tukey on exploratory data analysis or meta-heuristics Take Aways

Slide 40

Slide 40 text

Why BPM in Vienna is the best location to meet again: 1-6 September 2019 at WU Vienna Prof. Dr. Stefanie Rinderle-Ma Dean University of Vienna Prof. Dr. Jan Mendling Deputy Head of Department WU Vienna https://bpm2019.ai.wu.ac.at