Slide 1

Slide 1 text

Applicability of Process Mining Techniques in Business Environments Annual Meeting IEEE Task Force on Process Mining Andrea Burattin  andreaburattin September 8, 2014

Slide 2

Slide 2 text

Brief Curriculum Vitæ 2009, M.Sc. Computer Science (A.I. program) University of Padova 2009  2012, Ph.D. Supervisor: Prof. Alessandro Sperduti Joint school University of BolognaPadova Thesis defended on April 2013 2013  2014, Postdoc Prompt project (prompt.processmining.it) University of Padova Specola, Padova. http://flic.kr/p/cEW5bo 2 of 17

Slide 3

Slide 3 text

Ph.D. Inception Ph.D background Inception during M.Sc. thesis ˆ Companies: study on process mining A company (Siav S.p.A.) funded my PhD www.siav.it ˆ Aim: investigate applicability of process mining techniques in business scenarios ˆ Interaction with companies: interesting! (but sometimes. . . ) Outcome ˆ Applicability of Process Mining Techniques in Business Environments 3 of 17

Slide 4

Slide 4 text

Quick Recap of Process Mining Imagination Process Mining Incarnation /Environment Observation Operational Model Analytical Model Event Logs Information System Operational Incarnation support protocol / audit Discovery Conformance Extension control augment compare compare analyze mine basis create (re-)design implement describe Source: C. Günther, Process mining in Flexible Environments. PhD thesis, TU/e, Eindhoven, 2009. 4 of 17

Slide 5

Slide 5 text

Quick Recap of Process Mining Imagination Process Mining Incarnation /Environment Observation Operational Model Analytical Model Event Logs Information System Operational Incarnation support protocol / audit Discovery Conformance Extension control augment compare compare analyze mine basis create (re-)design implement describe Source: C. Günther, Process mining in Flexible Environments. PhD thesis, TU/e, Eindhoven, 2009. 4 of 17

Slide 6

Slide 6 text

Quick Recap of Process Mining Imagination Process Mining Incarnation /Environment Observation Operational Model Analytical Model Event Logs Information System Operational Incarnation support protocol / audit Discovery Conformance Extension control augment compare compare analyze mine basis create (re-)design implement describe Source: C. Günther, Process mining in Flexible Environments. PhD thesis, TU/e, Eindhoven, 2009. 4 of 17

Slide 7

Slide 7 text

Quick Recap of Process Mining Imagination Process Mining Incarnation /Environment Observation Operational Model Analytical Model Event Logs Information System Operational Incarnation support protocol / audit Discovery Conformance Extension control augment compare compare analyze mine basis create (re-)design implement describe Source: C. Günther, Process mining in Flexible Environments. PhD thesis, TU/e, Eindhoven, 2009. 4 of 17

Slide 8

Slide 8 text

Theoretical vs. Industrial-related Open Problems Some literature open problems Duplicate tasks Exploiting all data available Holistic mining Dierent perspectives from dierent sources Noise and incompleteness 5 of 17

Slide 9

Slide 9 text

Theoretical vs. Industrial-related Open Problems Some literature open problems Duplicate tasks Exploiting all data available Holistic mining Dierent perspectives from dierent sources Noise and incompleteness Case studies open problems Using process mining tools and conguring algorithms Results interpretation Readable results Computational power and storage capacity required 5 of 17

Slide 10

Slide 10 text

Theoretical vs. Industrial-related Open Problems Some literature open problems Duplicate tasks Exploiting all data available Holistic mining Dierent perspectives from dierent sources Noise and incompleteness Case studies open problems Using process mining tools and conguring algorithms Results interpretation Readable results Computational power and storage capacity required  Not overlapping sets 5 of 17

Slide 11

Slide 11 text

Possible Industry Scenarios Four possible industry scenarios Process aware vs. Process unaware Process aware software vs. Process unaware software Company 1 Company 2 Company 3 Company 4 Process Unaware Information Systems Process Aware Information Systems Process Aware Companies Process Unaware Companies 6 of 17

Slide 12

Slide 12 text

Thesis Structure and Organization Process Mining Capable Event Logs Process Representa�on Model Evalua�on Process Mining Capable Event Stream Data Prepara�on Control‐flow Mining Stream Control‐flow Mining Results Evalua�on Process Extension 6 of 17

Slide 13

Slide 13 text

Overview  Data Preparation Process Mining Capable Event Logs Process Representa�on Model Evalua�on Process Mining Capable Event Stream Data Prepara�on Control‐flow Mining Stream Control‐flow Mining Results Evalua�on Process Extension 6 of 17

Slide 14

Slide 14 text

Problems with Data Preparation Problems at dierent complexity and abstraction levels. Examples: Adaptation of existing data (Syntax problem, easy) Introduction of new information (Dicult) 7 of 17

Slide 15

Slide 15 text

Problems with Data Preparation Problems at dierent complexity and abstraction levels. Examples: Adaptation of existing data (Syntax problem, easy) Introduction of new information (Dicult) Typical set of required elds (case-id; activity; timestamp; [process-name]; [originator]) 7 of 17

Slide 16

Slide 16 text

Problems with Data Preparation Problems at dierent complexity and abstraction levels. Examples: Adaptation of existing data (Syntax problem, easy) Introduction of new information (Dicult) Typical set of required elds (case-id; activity; timestamp; [process-name]; [originator]) Our context: Company process aware; IS process unaware Structure of available log (activity; timestamp; originator; info1 ; ...; infon) 7 of 17

Slide 17

Slide 17 text

Problems with Data Preparation (cont.) Case-id from info i elds Candidate case-id elds A-priori knowledge Events chains Strings similarity functions Selection of maximal chain Most activities or simplest chain Process name is not a problem All events belonging to the same process 8 of 17

Slide 18

Slide 18 text

Problems with Data Preparation (cont.) Case-id from info i elds Candidate case-id elds A-priori knowledge Events chains Strings similarity functions Selection of maximal chain Most activities or simplest chain Process name is not a problem All events belonging to the same process Act. info1 info2 a 1 AB-01 BB-01 a 2 AA-02 AB-01 a 3 AB-01 BB-02 a 4 AB-01 BB-03 a 1 AA-03 BB-04 a 5 AA-03 BB-05 8 of 17

Slide 19

Slide 19 text

Overview  Control-ow Mining Process Mining Capable Event Logs Process Representa�on Model Evalua�on Process Mining Capable Event Stream Data Prepara�on Control‐flow Mining Stream Control‐flow Mining Results Evalua�on Process Extension 8 of 17

Slide 20

Slide 20 text

Exploiting Data Available Events with duration instead of instantaneous event Generalization of Heuristics Miner to exploit this new information Start End Main ac�vity Sub‐ac�vity 1 Sub‐ac�vity 2 Sub‐ac�vity n‐1 Sub‐ac�vity n Time 9 of 17

Slide 21

Slide 21 text

Exploiting Data Available Events with duration instead of instantaneous event Generalization of Heuristics Miner to exploit this new information Start End Main ac�vity Sub‐ac�vity 1 Sub‐ac�vity 2 Sub‐ac�vity n‐1 Sub‐ac�vity n Time A B C D D C B A A B C D A B C D Process with events as �me intervals Process with instantaneous events Time 9 of 17

Slide 22

Slide 22 text

Not-expert Users Our users: not-expert in process mining, with notions of BPM 10 of 17

Slide 23

Slide 23 text

Not-expert Users Our users: not-expert in process mining, with notions of BPM Observations Process mining algorithms require congurations Typically, algorithm congurations are threshold on measures The mining log is nite Only a nite amount of congurations possible 10 of 17

Slide 24

Slide 24 text

Not-expert Users Our users: not-expert in process mining, with notions of BPM Observations Process mining algorithms require congurations Typically, algorithm congurations are threshold on measures The mining log is nite Only a nite amount of congurations possible We are able to discretize the parameter values F A B C D E A B C D E A B C D A B C D ? τ1 = ? τ2 = ? τ3 = ? τ4 = ? 10 of 17

Slide 25

Slide 25 text

Model Selection Approaches User-guided Approach Hierarchical clustering of models Average linkage Any model-to-model metric 0.34 0.45 0.63 0.69 0.76 0.49 0.71 0.74 0.84 Process 1 Process 10 Process 9 Process 8 Process 5 Process 6 Process 4 Process 7 Process 2 Process 3 0 0.2 0.4 0.6 0.8 1 Navigation of the dendrogram 11 of 17

Slide 26

Slide 26 text

Model Selection Approaches User-guided Approach Hierarchical clustering of models Average linkage Any model-to-model metric 0.34 0.45 0.63 0.69 0.76 0.49 0.71 0.74 0.84 Process 1 Process 10 Process 9 Process 8 Process 5 Process 6 Process 4 Process 7 Process 2 Process 3 0 0.2 0.4 0.6 0.8 1 Navigation of the dendrogram Automatic Approach Hill climbing with Maximum plateau steps Random restarts (Local optimum) hMDL = arg min h∈H L( h) + L( D| h) MDL encodings MDL by Calders et al. Simplied heuristics 11 of 17

Slide 27

Slide 27 text

Overview  Results Evaluation Process Mining Capable Event Logs Process Representa�on Model Evalua�on Process Mining Capable Event Stream Data Prepara�on Control‐flow Mining Stream Control‐flow Mining Results Evalua�on Process Extension 11 of 17

Slide 28

Slide 28 text

Evaluation Metrics Model-to-model Metric Complex process into Permitted relations Forbidden relations Generation rules (based on Alpha alg.) A → B ⇒ A > B, B ≯ A A B ⇒ A > B, B > A A # B ⇒ A ≯ B, B ≯ A Comparison as Jaccard similarity on two sets (> and ≯) 12 of 17

Slide 29

Slide 29 text

Evaluation Metrics Model-to-model Metric Complex process into Permitted relations Forbidden relations Generation rules (based on Alpha alg.) A → B ⇒ A > B, B ≯ A A B ⇒ A > B, B > A A # B ⇒ A ≯ B, B ≯ A Comparison as Jaccard similarity on two sets (> and ≯) Model-to-log Metric Declare constraint π and a trace σ ⇒ healthiness measures Activation sparsity: 1 − na(σ,π) n (σ) Violation ratio: nv (σ,π) na(σ,π) Fulllment ratio: nf (σ,π) na(σ,π) Conict ratio: nc(σ,π) na(σ,π) 12 of 17

Slide 30

Slide 30 text

Overview  Process Extension Process Mining Capable Event Logs Process Representa�on Model Evalua�on Process Mining Capable Event Stream Data Prepara�on Control‐flow Mining Stream Control‐flow Mining Results Evalua�on Process Extension 12 of 17

Slide 31

Slide 31 text

Multiperspective Mining Given Log with information on originators Process model We add roles to the model Assumption Roles are characterized by consistent set of originators 13 of 17

Slide 32

Slide 32 text

Multiperspective Mining Given Log with information on originators Process model We add roles to the model Assumption Roles are characterized by consistent set of originators 1 Dependencies as handover of roles 2 Remove dependencies below threshold Connected components are candidate roles 3 Merge candidate roles if users sets similarities above threshold  Entropy-based metric to tune thresholds 13 of 17

Slide 33

Slide 33 text

Overview  Stream Control-ow Mining Process Mining Capable Event Logs Process Representa�on Model Evalua�on Process Mining Capable Event Stream Data Prepara�on Control‐flow Mining Stream Control‐flow Mining Results Evalua�on Process Extension 13 of 17

Slide 34

Slide 34 text

Stream Context Stream Mining Peculiarities Cannot store the entire stream Approximation Backtracking not feasible One pass over data Variable system condition Ex. uctuating stream rates Adapt the model to new data Concept drifts  Completely new problems! 14 of 17

Slide 35

Slide 35 text

Stream Context Stream Mining Peculiarities Cannot store the entire stream Approximation Backtracking not feasible One pass over data Variable system condition Ex. uctuating stream rates Adapt the model to new data Concept drifts  Completely new problems! Principle Recent observations are more important than older ones 14 of 17

Slide 36

Slide 36 text

Stream Context Stream Mining Peculiarities Cannot store the entire stream Approximation Backtracking not feasible One pass over data Variable system condition Ex. uctuating stream rates Adapt the model to new data Concept drifts  Completely new problems! Principle Recent observations are more important than older ones 3 version of Heuristics Miner Based on Sliding Window Based on Lossy Counting Based on Budget Lossy Counting 14 of 17

Slide 37

Slide 37 text

Overview Process Mining Capable Event Logs Process Representa�on Model Evalua�on Process Mining Capable Event Stream Data Prepara�on Control‐flow Mining Stream Control‐flow Mining Results Evalua�on Process Extension 14 of 17

Slide 38

Slide 38 text

Extra: Processes and Logs Generator Companies are reluctant to share their data Researchers need to do tests (No BPI challenges at that time) 15 of 17

Slide 39

Slide 39 text

Extra: Processes and Logs Generator Companies are reluctant to share their data Researchers need to do tests (No BPI challenges at that time) Processes and Logs Generator Stochastic context free grammar generates random processes Rules to simulate a process and produce an event log Reference model used for evaluation control-ow mining algorithms P astart G ( G; G) A a ( G G) ( G; G) A; ( G ∧ G); A b A c A d e A f A g aend 15 of 17

Slide 40

Slide 40 text

Detailed Map of Performed Activities Process Representa�on (e.g. Dependency Graph, Petri Net) Legacy, Process‐unaware Informa�on Systems Process Mining Capable Event Logs Data Prepara�on Control‐flow Mining Algorithm Exploi�ng More Data Event Logs Generator User‐guided Discovery Algorithm Configura�on Automa�c Algorithm Configura�on Process Mining Capable Event Stream Stream Control‐flow Mining Framework Model Evalua�on (wrt Log / Original Model) Model‐to‐model Metric Model‐to‐log Metric Random Process Generator Extension of Process Models with Organiza�onal Roles 16 of 17

Slide 41

Slide 41 text

Thanks! Doing the Ph.D. has been amazing! A huge Thank you! to My supervisor, Alessandro Sperduti Siav S.p.A. and Roberto Pinelli My internal examiners: Tullio Vardanega, Paolo Baldan My external examiners: Barbara Weber, Diogo Ferreira All the process mining community! 17 of 17