Applicability of Process Mining Techniques in Business Environments (Best Process Mining Dissertation Award)
Presentation provided at the annual meeting of the IEEE Task Force on Process Mining, for the Best Dissertation Award, during BPM 2014 (in Eindhoven, the Netherlands, http://bpm2014.haifa.ac.il).
of Padova 2009 2012, Ph.D. Supervisor: Prof. Alessandro Sperduti Joint school University of BolognaPadova Thesis defended on April 2013 2013 2014, Postdoc Prompt project (prompt.processmining.it) University of Padova Specola, Padova. http://flic.kr/p/cEW5bo 2 of 17
study on process mining A company (Siav S.p.A.) funded my PhD www.siav.it Aim: investigate applicability of process mining techniques in business scenarios Interaction with companies: interesting! (but sometimes. . . ) Outcome Applicability of Process Mining Techniques in Business Environments 3 of 17
Observation Operational Model Analytical Model Event Logs Information System Operational Incarnation support protocol / audit Discovery Conformance Extension control augment compare compare analyze mine basis create (re-)design implement describe Source: C. Günther, Process mining in Flexible Environments. PhD thesis, TU/e, Eindhoven, 2009. 4 of 17
Observation Operational Model Analytical Model Event Logs Information System Operational Incarnation support protocol / audit Discovery Conformance Extension control augment compare compare analyze mine basis create (re-)design implement describe Source: C. Günther, Process mining in Flexible Environments. PhD thesis, TU/e, Eindhoven, 2009. 4 of 17
Observation Operational Model Analytical Model Event Logs Information System Operational Incarnation support protocol / audit Discovery Conformance Extension control augment compare compare analyze mine basis create (re-)design implement describe Source: C. Günther, Process mining in Flexible Environments. PhD thesis, TU/e, Eindhoven, 2009. 4 of 17
Observation Operational Model Analytical Model Event Logs Information System Operational Incarnation support protocol / audit Discovery Conformance Extension control augment compare compare analyze mine basis create (re-)design implement describe Source: C. Günther, Process mining in Flexible Environments. PhD thesis, TU/e, Eindhoven, 2009. 4 of 17
tasks Exploiting all data available Holistic mining Dierent perspectives from dierent sources Noise and incompleteness Case studies open problems Using process mining tools and conguring algorithms Results interpretation Readable results Computational power and storage capacity required 5 of 17
tasks Exploiting all data available Holistic mining Dierent perspectives from dierent sources Noise and incompleteness Case studies open problems Using process mining tools and conguring algorithms Results interpretation Readable results Computational power and storage capacity required Not overlapping sets 5 of 17
Process unaware Process aware software vs. Process unaware software Company 1 Company 2 Company 3 Company 4 Process Unaware Information Systems Process Aware Information Systems Process Aware Companies Process Unaware Companies 6 of 17
Representa�on Model Evalua�on Process Mining Capable Event Stream Data Prepara�on Control‐flow Mining Stream Control‐flow Mining Results Evalua�on Process Extension 6 of 17
Representa�on Model Evalua�on Process Mining Capable Event Stream Data Prepara�on Control‐flow Mining Stream Control‐flow Mining Results Evalua�on Process Extension 6 of 17
levels. Examples: Adaptation of existing data (Syntax problem, easy) Introduction of new information (Dicult) Typical set of required elds (case-id; activity; timestamp; [process-name]; [originator]) 7 of 17
levels. Examples: Adaptation of existing data (Syntax problem, easy) Introduction of new information (Dicult) Typical set of required elds (case-id; activity; timestamp; [process-name]; [originator]) Our context: Company process aware; IS process unaware Structure of available log (activity; timestamp; originator; info1 ; ...; infon) 7 of 17
Candidate case-id elds A-priori knowledge Events chains Strings similarity functions Selection of maximal chain Most activities or simplest chain Process name is not a problem All events belonging to the same process 8 of 17
Candidate case-id elds A-priori knowledge Events chains Strings similarity functions Selection of maximal chain Most activities or simplest chain Process name is not a problem All events belonging to the same process Act. info1 info2 a 1 AB-01 BB-01 a 2 AA-02 AB-01 a 3 AB-01 BB-02 a 4 AB-01 BB-03 a 1 AA-03 BB-04 a 5 AA-03 BB-05 8 of 17
Representa�on Model Evalua�on Process Mining Capable Event Stream Data Prepara�on Control‐flow Mining Stream Control‐flow Mining Results Evalua�on Process Extension 8 of 17
Generalization of Heuristics Miner to exploit this new information Start End Main ac�vity Sub‐ac�vity 1 Sub‐ac�vity 2 Sub‐ac�vity n‐1 Sub‐ac�vity n Time 9 of 17
Generalization of Heuristics Miner to exploit this new information Start End Main ac�vity Sub‐ac�vity 1 Sub‐ac�vity 2 Sub‐ac�vity n‐1 Sub‐ac�vity n Time A B C D D C B A A B C D A B C D Process with events as �me intervals Process with instantaneous events Time 9 of 17
of BPM Observations Process mining algorithms require congurations Typically, algorithm congurations are threshold on measures The mining log is nite Only a nite amount of congurations possible 10 of 17
of BPM Observations Process mining algorithms require congurations Typically, algorithm congurations are threshold on measures The mining log is nite Only a nite amount of congurations possible We are able to discretize the parameter values F A B C D E A B C D E A B C D A B C D ? τ1 = ? τ2 = ? τ3 = ? τ4 = ? 10 of 17
linkage Any model-to-model metric 0.34 0.45 0.63 0.69 0.76 0.49 0.71 0.74 0.84 Process 1 Process 10 Process 9 Process 8 Process 5 Process 6 Process 4 Process 7 Process 2 Process 3 0 0.2 0.4 0.6 0.8 1 Navigation of the dendrogram 11 of 17
linkage Any model-to-model metric 0.34 0.45 0.63 0.69 0.76 0.49 0.71 0.74 0.84 Process 1 Process 10 Process 9 Process 8 Process 5 Process 6 Process 4 Process 7 Process 2 Process 3 0 0.2 0.4 0.6 0.8 1 Navigation of the dendrogram Automatic Approach Hill climbing with Maximum plateau steps Random restarts (Local optimum) hMDL = arg min h∈H L( h) + L( D| h) MDL encodings MDL by Calders et al. Simplied heuristics 11 of 17
Representa�on Model Evalua�on Process Mining Capable Event Stream Data Prepara�on Control‐flow Mining Stream Control‐flow Mining Results Evalua�on Process Extension 11 of 17
relations Generation rules (based on Alpha alg.) A → B ⇒ A > B, B ≯ A A B ⇒ A > B, B > A A # B ⇒ A ≯ B, B ≯ A Comparison as Jaccard similarity on two sets (> and ≯) 12 of 17
relations Generation rules (based on Alpha alg.) A → B ⇒ A > B, B ≯ A A B ⇒ A > B, B > A A # B ⇒ A ≯ B, B ≯ A Comparison as Jaccard similarity on two sets (> and ≯) Model-to-log Metric Declare constraint π and a trace σ ⇒ healthiness measures Activation sparsity: 1 − na(σ,π) n (σ) Violation ratio: nv (σ,π) na(σ,π) Fulllment ratio: nf (σ,π) na(σ,π) Conict ratio: nc(σ,π) na(σ,π) 12 of 17
Representa�on Model Evalua�on Process Mining Capable Event Stream Data Prepara�on Control‐flow Mining Stream Control‐flow Mining Results Evalua�on Process Extension 12 of 17
We add roles to the model Assumption Roles are characterized by consistent set of originators 1 Dependencies as handover of roles 2 Remove dependencies below threshold Connected components are candidate roles 3 Merge candidate roles if users sets similarities above threshold Entropy-based metric to tune thresholds 13 of 17
Process Representa�on Model Evalua�on Process Mining Capable Event Stream Data Prepara�on Control‐flow Mining Stream Control‐flow Mining Results Evalua�on Process Extension 13 of 17
Approximation Backtracking not feasible One pass over data Variable system condition Ex. uctuating stream rates Adapt the model to new data Concept drifts Completely new problems! 14 of 17
Approximation Backtracking not feasible One pass over data Variable system condition Ex. uctuating stream rates Adapt the model to new data Concept drifts Completely new problems! Principle Recent observations are more important than older ones 14 of 17
Approximation Backtracking not feasible One pass over data Variable system condition Ex. uctuating stream rates Adapt the model to new data Concept drifts Completely new problems! Principle Recent observations are more important than older ones 3 version of Heuristics Miner Based on Sliding Window Based on Lossy Counting Based on Budget Lossy Counting 14 of 17
their data Researchers need to do tests (No BPI challenges at that time) Processes and Logs Generator Stochastic context free grammar generates random processes Rules to simulate a process and produce an event log Reference model used for evaluation control-ow mining algorithms P astart G ( G; G) A a ( G G) ( G; G) A; ( G ∧ G); A b A c A d e A f A g aend 15 of 17
Petri Net) Legacy, Process‐unaware Informa�on Systems Process Mining Capable Event Logs Data Prepara�on Control‐flow Mining Algorithm Exploi�ng More Data Event Logs Generator User‐guided Discovery Algorithm Configura�on Automa�c Algorithm Configura�on Process Mining Capable Event Stream Stream Control‐flow Mining Framework Model Evalua�on (wrt Log / Original Model) Model‐to‐model Metric Model‐to‐log Metric Random Process Generator Extension of Process Models with Organiza�onal Roles 16 of 17
you! to My supervisor, Alessandro Sperduti Siav S.p.A. and Roberto Pinelli My internal examiners: Tullio Vardanega, Paolo Baldan My external examiners: Barbara Weber, Diogo Ferreira All the process mining community! 17 of 17