Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Applicability of Process Mining Techniques in B...

Andrea Burattin
September 08, 2014

Applicability of Process Mining Techniques in Business Environments (Best Process Mining Dissertation Award)

Presentation provided at the annual meeting of the IEEE Task Force on Process Mining, for the Best Dissertation Award, during BPM 2014 (in Eindhoven, the Netherlands, http://bpm2014.haifa.ac.il).

Andrea Burattin

September 08, 2014
Tweet

More Decks by Andrea Burattin

Other Decks in Research

Transcript

  1. Applicability of Process Mining Techniques in Business Environments Annual Meeting

    IEEE Task Force on Process Mining Andrea Burattin  andreaburattin September 8, 2014
  2. Brief Curriculum Vitæ 2009, M.Sc. Computer Science (A.I. program) University

    of Padova 2009  2012, Ph.D. Supervisor: Prof. Alessandro Sperduti Joint school University of BolognaPadova Thesis defended on April 2013 2013  2014, Postdoc Prompt project (prompt.processmining.it) University of Padova Specola, Padova. http://flic.kr/p/cEW5bo 2 of 17
  3. Ph.D. Inception Ph.D background Inception during M.Sc. thesis ˆ Companies:

    study on process mining A company (Siav S.p.A.) funded my PhD www.siav.it ˆ Aim: investigate applicability of process mining techniques in business scenarios ˆ Interaction with companies: interesting! (but sometimes. . . ) Outcome ˆ Applicability of Process Mining Techniques in Business Environments 3 of 17
  4. Quick Recap of Process Mining Imagination Process Mining Incarnation /Environment

    Observation Operational Model Analytical Model Event Logs Information System Operational Incarnation support protocol / audit Discovery Conformance Extension control augment compare compare analyze mine basis create (re-)design implement describe Source: C. Günther, Process mining in Flexible Environments. PhD thesis, TU/e, Eindhoven, 2009. 4 of 17
  5. Quick Recap of Process Mining Imagination Process Mining Incarnation /Environment

    Observation Operational Model Analytical Model Event Logs Information System Operational Incarnation support protocol / audit Discovery Conformance Extension control augment compare compare analyze mine basis create (re-)design implement describe Source: C. Günther, Process mining in Flexible Environments. PhD thesis, TU/e, Eindhoven, 2009. 4 of 17
  6. Quick Recap of Process Mining Imagination Process Mining Incarnation /Environment

    Observation Operational Model Analytical Model Event Logs Information System Operational Incarnation support protocol / audit Discovery Conformance Extension control augment compare compare analyze mine basis create (re-)design implement describe Source: C. Günther, Process mining in Flexible Environments. PhD thesis, TU/e, Eindhoven, 2009. 4 of 17
  7. Quick Recap of Process Mining Imagination Process Mining Incarnation /Environment

    Observation Operational Model Analytical Model Event Logs Information System Operational Incarnation support protocol / audit Discovery Conformance Extension control augment compare compare analyze mine basis create (re-)design implement describe Source: C. Günther, Process mining in Flexible Environments. PhD thesis, TU/e, Eindhoven, 2009. 4 of 17
  8. Theoretical vs. Industrial-related Open Problems Some literature open problems Duplicate

    tasks Exploiting all data available Holistic mining Dierent perspectives from dierent sources Noise and incompleteness 5 of 17
  9. Theoretical vs. Industrial-related Open Problems Some literature open problems Duplicate

    tasks Exploiting all data available Holistic mining Dierent perspectives from dierent sources Noise and incompleteness Case studies open problems Using process mining tools and conguring algorithms Results interpretation Readable results Computational power and storage capacity required 5 of 17
  10. Theoretical vs. Industrial-related Open Problems Some literature open problems Duplicate

    tasks Exploiting all data available Holistic mining Dierent perspectives from dierent sources Noise and incompleteness Case studies open problems Using process mining tools and conguring algorithms Results interpretation Readable results Computational power and storage capacity required  Not overlapping sets 5 of 17
  11. Possible Industry Scenarios Four possible industry scenarios Process aware vs.

    Process unaware Process aware software vs. Process unaware software Company 1 Company 2 Company 3 Company 4 Process Unaware Information Systems Process Aware Information Systems Process Aware Companies Process Unaware Companies 6 of 17
  12. Thesis Structure and Organization Process Mining Capable Event Logs Process

    Representa�on Model Evalua�on Process Mining Capable Event Stream Data Prepara�on Control‐flow Mining Stream Control‐flow Mining Results Evalua�on Process Extension 6 of 17
  13. Overview  Data Preparation Process Mining Capable Event Logs Process

    Representa�on Model Evalua�on Process Mining Capable Event Stream Data Prepara�on Control‐flow Mining Stream Control‐flow Mining Results Evalua�on Process Extension 6 of 17
  14. Problems with Data Preparation Problems at dierent complexity and abstraction

    levels. Examples: Adaptation of existing data (Syntax problem, easy) Introduction of new information (Dicult) 7 of 17
  15. Problems with Data Preparation Problems at dierent complexity and abstraction

    levels. Examples: Adaptation of existing data (Syntax problem, easy) Introduction of new information (Dicult) Typical set of required elds (case-id; activity; timestamp; [process-name]; [originator]) 7 of 17
  16. Problems with Data Preparation Problems at dierent complexity and abstraction

    levels. Examples: Adaptation of existing data (Syntax problem, easy) Introduction of new information (Dicult) Typical set of required elds (case-id; activity; timestamp; [process-name]; [originator]) Our context: Company process aware; IS process unaware Structure of available log (activity; timestamp; originator; info1 ; ...; infon) 7 of 17
  17. Problems with Data Preparation (cont.) Case-id from info i elds

    Candidate case-id elds A-priori knowledge Events chains Strings similarity functions Selection of maximal chain Most activities or simplest chain Process name is not a problem All events belonging to the same process 8 of 17
  18. Problems with Data Preparation (cont.) Case-id from info i elds

    Candidate case-id elds A-priori knowledge Events chains Strings similarity functions Selection of maximal chain Most activities or simplest chain Process name is not a problem All events belonging to the same process Act. info1 info2 a 1 AB-01 BB-01 a 2 AA-02 AB-01 a 3 AB-01 BB-02 a 4 AB-01 BB-03 a 1 AA-03 BB-04 a 5 AA-03 BB-05 8 of 17
  19. Overview  Control-ow Mining Process Mining Capable Event Logs Process

    Representa�on Model Evalua�on Process Mining Capable Event Stream Data Prepara�on Control‐flow Mining Stream Control‐flow Mining Results Evalua�on Process Extension 8 of 17
  20. Exploiting Data Available Events with duration instead of instantaneous event

    Generalization of Heuristics Miner to exploit this new information Start End Main ac�vity Sub‐ac�vity 1 Sub‐ac�vity 2 Sub‐ac�vity n‐1 Sub‐ac�vity n Time 9 of 17
  21. Exploiting Data Available Events with duration instead of instantaneous event

    Generalization of Heuristics Miner to exploit this new information Start End Main ac�vity Sub‐ac�vity 1 Sub‐ac�vity 2 Sub‐ac�vity n‐1 Sub‐ac�vity n Time A B C D D C B A A B C D A B C D Process with events as �me intervals Process with instantaneous events Time 9 of 17
  22. Not-expert Users Our users: not-expert in process mining, with notions

    of BPM Observations Process mining algorithms require congurations Typically, algorithm congurations are threshold on measures The mining log is nite Only a nite amount of congurations possible 10 of 17
  23. Not-expert Users Our users: not-expert in process mining, with notions

    of BPM Observations Process mining algorithms require congurations Typically, algorithm congurations are threshold on measures The mining log is nite Only a nite amount of congurations possible We are able to discretize the parameter values F A B C D E A B C D E A B C D A B C D ? τ1 = ? τ2 = ? τ3 = ? τ4 = ? 10 of 17
  24. Model Selection Approaches User-guided Approach Hierarchical clustering of models Average

    linkage Any model-to-model metric 0.34 0.45 0.63 0.69 0.76 0.49 0.71 0.74 0.84 Process 1 Process 10 Process 9 Process 8 Process 5 Process 6 Process 4 Process 7 Process 2 Process 3 0 0.2 0.4 0.6 0.8 1 Navigation of the dendrogram 11 of 17
  25. Model Selection Approaches User-guided Approach Hierarchical clustering of models Average

    linkage Any model-to-model metric 0.34 0.45 0.63 0.69 0.76 0.49 0.71 0.74 0.84 Process 1 Process 10 Process 9 Process 8 Process 5 Process 6 Process 4 Process 7 Process 2 Process 3 0 0.2 0.4 0.6 0.8 1 Navigation of the dendrogram Automatic Approach Hill climbing with Maximum plateau steps Random restarts (Local optimum) hMDL = arg min h∈H L( h) + L( D| h) MDL encodings MDL by Calders et al. Simplied heuristics 11 of 17
  26. Overview  Results Evaluation Process Mining Capable Event Logs Process

    Representa�on Model Evalua�on Process Mining Capable Event Stream Data Prepara�on Control‐flow Mining Stream Control‐flow Mining Results Evalua�on Process Extension 11 of 17
  27. Evaluation Metrics Model-to-model Metric Complex process into Permitted relations Forbidden

    relations Generation rules (based on Alpha alg.) A → B ⇒ A > B, B ≯ A A B ⇒ A > B, B > A A # B ⇒ A ≯ B, B ≯ A Comparison as Jaccard similarity on two sets (> and ≯) 12 of 17
  28. Evaluation Metrics Model-to-model Metric Complex process into Permitted relations Forbidden

    relations Generation rules (based on Alpha alg.) A → B ⇒ A > B, B ≯ A A B ⇒ A > B, B > A A # B ⇒ A ≯ B, B ≯ A Comparison as Jaccard similarity on two sets (> and ≯) Model-to-log Metric Declare constraint π and a trace σ ⇒ healthiness measures Activation sparsity: 1 − na(σ,π) n (σ) Violation ratio: nv (σ,π) na(σ,π) Fulllment ratio: nf (σ,π) na(σ,π) Conict ratio: nc(σ,π) na(σ,π) 12 of 17
  29. Overview  Process Extension Process Mining Capable Event Logs Process

    Representa�on Model Evalua�on Process Mining Capable Event Stream Data Prepara�on Control‐flow Mining Stream Control‐flow Mining Results Evalua�on Process Extension 12 of 17
  30. Multiperspective Mining Given Log with information on originators Process model

    We add roles to the model Assumption Roles are characterized by consistent set of originators 13 of 17
  31. Multiperspective Mining Given Log with information on originators Process model

    We add roles to the model Assumption Roles are characterized by consistent set of originators 1 Dependencies as handover of roles 2 Remove dependencies below threshold Connected components are candidate roles 3 Merge candidate roles if users sets similarities above threshold  Entropy-based metric to tune thresholds 13 of 17
  32. Overview  Stream Control-ow Mining Process Mining Capable Event Logs

    Process Representa�on Model Evalua�on Process Mining Capable Event Stream Data Prepara�on Control‐flow Mining Stream Control‐flow Mining Results Evalua�on Process Extension 13 of 17
  33. Stream Context Stream Mining Peculiarities Cannot store the entire stream

    Approximation Backtracking not feasible One pass over data Variable system condition Ex. uctuating stream rates Adapt the model to new data Concept drifts  Completely new problems! 14 of 17
  34. Stream Context Stream Mining Peculiarities Cannot store the entire stream

    Approximation Backtracking not feasible One pass over data Variable system condition Ex. uctuating stream rates Adapt the model to new data Concept drifts  Completely new problems! Principle Recent observations are more important than older ones 14 of 17
  35. Stream Context Stream Mining Peculiarities Cannot store the entire stream

    Approximation Backtracking not feasible One pass over data Variable system condition Ex. uctuating stream rates Adapt the model to new data Concept drifts  Completely new problems! Principle Recent observations are more important than older ones 3 version of Heuristics Miner Based on Sliding Window Based on Lossy Counting Based on Budget Lossy Counting 14 of 17
  36. Overview Process Mining Capable Event Logs Process Representa�on Model Evalua�on

    Process Mining Capable Event Stream Data Prepara�on Control‐flow Mining Stream Control‐flow Mining Results Evalua�on Process Extension 14 of 17
  37. Extra: Processes and Logs Generator Companies are reluctant to share

    their data Researchers need to do tests (No BPI challenges at that time) 15 of 17
  38. Extra: Processes and Logs Generator Companies are reluctant to share

    their data Researchers need to do tests (No BPI challenges at that time) Processes and Logs Generator Stochastic context free grammar generates random processes Rules to simulate a process and produce an event log Reference model used for evaluation control-ow mining algorithms P astart G ( G; G) A a ( G G) ( G; G) A; ( G ∧ G); A b A c A d e A f A g aend 15 of 17
  39. Detailed Map of Performed Activities Process Representa�on (e.g. Dependency Graph,

    Petri Net) Legacy, Process‐unaware Informa�on Systems Process Mining Capable Event Logs Data Prepara�on Control‐flow Mining Algorithm Exploi�ng More Data Event Logs Generator User‐guided Discovery Algorithm Configura�on Automa�c Algorithm Configura�on Process Mining Capable Event Stream Stream Control‐flow Mining Framework Model Evalua�on (wrt Log / Original Model) Model‐to‐model Metric Model‐to‐log Metric Random Process Generator Extension of Process Models with Organiza�onal Roles 16 of 17
  40. Thanks! Doing the Ph.D. has been amazing! A huge Thank

    you! to My supervisor, Alessandro Sperduti Siav S.p.A. and Roberto Pinelli My internal examiners: Tullio Vardanega, Paolo Baldan My external examiners: Barbara Weber, Diogo Ferreira All the process mining community! 17 of 17