A Business Process Metric Based on the Alpha Algorithm Relations

Slide 1

Slide 1 text

A Business Process Metric Based on the Alpha Algorithm Relations Fabio Aiolli, Andrea Burattin, and Alessandro Sperduti Department of Pure and Applied Mathematics University of Padua, Italy August 29th, 2011

Slide 2

Slide 2 text

Introduction Typical situation Process mining algorithms and tools are designed to deal with real-world data Real-world data contain noise and can be incomplete Problem statement Many mining techniques try to solve the problem of noise with parameters. These are thresholds on speciﬁc values of the algorithm and are used to discriminate noisy behavior Process mining users are not necessarily technicians so they are not required to have deep knowledge of algorithms Process mining algorithms are implemented in tools Non expert users don’t understand algorithms and that’s why they can have diﬃculties in using tools 2 of 22

Slide 3

Slide 3 text

Slide 4

Slide 4 text

Process mining for non expert users Possible solutions to help non expert users with process mining techniques and tools 1 Simplify the algorithms (no parameters required) 2 Build a system that can choose the algorithm and its configuration 3 Don’t ask any parameters to the user, but let him interactively choose the solution that he considers the best Observations Solution 1: extremely hard (flexibility/abstraction impossible) Solution 2: hard, we tried with the application of the MDL principle (Burattin and Sperduti, IEEE WCCI 2010) Solution 3: the final aim of this work 3 of 22

Slide 5

Slide 5 text

Our proposed solution Approach to allow non expert users to benefit from process mining techniques (we consider Heuristics Miner++) 1 Discretization of the space of the values for the parameters 2 Generation of all the possible models (cartesian product of the values of the parameters) 3 Clusterize the models in a “hierarchy” that can be explored 4 Let the user navigate through the hierarchy to find the model that fits the requirements / describes the reality 4 of 22

Slide 6

Slide 6 text

Slide 7

Slide 7 text

New problems In order to perform clustering it is necessary to compare business processes models Problems Which perspectives are relevant for the comparison? Is it possible to deﬁne a metric that measures the given perspectives? 5 of 22

Slide 8

Slide 8 text

Comparison of process models Our metric is designed to work on results of control-ﬂow discovery algorithms We are interested in considering two perspectives for our metric A “trace equivalence” point of view The structure of the model (which workﬂow templates are involved) In the literature, many metrics have been proposed (e.g. van der Aalst et al. BPM 2006, Ehrig et al. APCCM 2007, Bae et al. JWSR 2007, van Dongen et al. AISE 2008, Dijkman BPM 2008, Wang et al. OTM 2010, Zha et al. Comp. in Ind. 2010, Weidlich et al., TSE 2011, . . . ) 6 of 22

Slide 9

Slide 9 text

Trace equivalence point of view Example process with inﬁnite ﬁring sequence A B C D 7 of 22

Slide 10

Slide 10 text

Trace equivalence point of view Example process with infinite firing sequence A B C D The TAR metric (Zha et al., Comp. in Ind. 2010) aims at solving the problem of comparing two processes in terms of their firing sequence 7 of 22

Slide 11

Slide 11 text

How the TAR metric works TAR (Transition Adjacency Relations) is a kind of “local ﬁring sequence” that presents all couples of activities that can occur in sequence (one directly after the other) A B C D 8 of 22

Slide 12

Slide 12 text

Slide 13

Slide 13 text

How the TAR metric works II Once the TAR sets for the two process have been generated, they are compared Comparison using the Jaccard similarity / distance J(A, B) = |A ∩ B| |A ∪ B| Jδ(A, B) = 1−J(A, B) = |A ∪ B| − |A ∩ B| |A ∪ B| Processes similarity coincide with the similarity of the corresponding TAR sets 9 of 22

Slide 14

Slide 14 text

A problem with the TAR metric A problem with the TAR metric It does not consider diﬀerences in the “structure” of the models 10 of 22

Slide 15

Slide 15 text

A problem with the TAR metric A problem with the TAR metric It does not consider differences in the “structure” of the models Example Two different processes (in terms of workflow patterns) with the same TAR sets 10 of 22

Slide 16

Slide 16 text

Slide 17

Slide 17 text

Our approach for the comparison Same approach as TAR metric 1 Conversion of processes into new representations 2 Comparison of processes in terms of their new representations But diﬀerent representation for processes 1 Conversion of a process into “derived relations” (workﬂow pattern instances) 2 Conversion of derived relations into “primitive relations” 11 of 22

Slide 18

Slide 18 text

Slide 19

Slide 19 text

Our approach for the comparison II Target representations based on relations of Alpha algorithm 12 of 22

Slide 20

Slide 20 text

Our approach for the comparison II Target representations based on relations of Alpha algorithm Process model P1 Derived relations Primitive relations Traces Process model P2 Derived relations Primitive relations Traces Actual comparison Filled lines: Alpha algorithm Dotted lines: our approach 12 of 22

Slide 21

Slide 21 text

Proposed relations Primitive relations A > B A ≯ B Same conditions as in Alpha algorithm, but considering all possible traces (not only an observed log) 13 of 22

Slide 22

Slide 22 text

Proposed relations Primitive relations A > B A ≯ B Same conditions as in Alpha algorithm, but considering all possible traces (not only an observed log) Derived relations A → B . . . . . . . . . . . . . . . . . . . . . . . generates A > B and A ≯ B A#B . . . . . . . . . . . . . . . . . . . . . . . . . generates A ≯ B and A ≯ B A B . . . . . . . . . . . . . . . . . . . . . . . . . . generates A > B and A > B 13 of 22

Slide 23

Slide 23 text

Slide 24

Slide 24 text

Slide 25

Slide 25 text

The proposed metric Steps of the proposed metric 1 Generation of derived relations for two processes P1 and P2 2 Conversion of the derived relations into two sets of primitive relations (R+ for > and R− for ≯) 3 Comparison of the processes in terms of their new representation: P1 = (R+, R−) and P2 = (R+, R−) We use Jaccard similarity / distance (as TAR) The ﬁnal metric proposed in this work d(P1, P2) = αJδ (R+(P1), R+(P2)) + (1 − α)Jδ (R−(P1), R−(P2)) With α as a weighting factor to balance the importance of the two primitive relations 14 of 22

Slide 26

Slide 26 text

Comparison of metrics Given these processes Their distances measures TAR: 0 Proposed metric: α = 1: 0; α = 0.5: 0.165; α = 0: 0.33 We have proven that, under typical process mining conditions, our metric recognizes processes that are structurally diﬀerent 15 of 22

Slide 27

Slide 27 text

Parameters configuration Recap of our possible approach 1 Discretization of the space of the values for the parameters 2 Generation of all the possible models (cartesian product of the values of the parameters) 3 Clusterize the models in a “hierarchy” that can be explored 4 Let the user navigate through the hierarchy to find the model that fits the requirements / describes the reality 16 of 22

Slide 28

Slide 28 text

Clustering process models Given A set of process models A metric for processes We can perform clustering, for example hierarchical agglomerative clustering with average linkage s(c1, c2) = 1 |c1||c2| pi ∈c1 pj ∈c2 d(pi , pj ) 17 of 22

Slide 29

Slide 29 text

Clustering example Clusters of 350 process models generated starting from a log (with Heuristics Miner++) 18 of 22

Slide 30

Slide 30 text

Exploration of the hierarchy It is possible to extract a representative for each cluster, for example considering the medoid (i.e. a process whose average dissimilarity to all the elements in the same cluster is minimal) A dendrogram is a binary tree that can be “explored” from the root to a leaf The exploration of the dendrogram is performed considering the representatives of the two children of the current node and deciding to move to one or to the other Important: each representative of each cluster is always an element of the dataset (i.e. a “leaf” of the dendrogram) 19 of 22

Slide 31

Slide 31 text

Implementation on the PLG Clustering and explorative procedure implemented in the PLG (Processes Logs Generator) tool A tool for the generation of random processes Freely available at http://www.processmining.it It is possible to clusterize the generated models Prototype for a ProM plugin is planned 20 of 22

Slide 32

Slide 32 text

Exploration example Exploration prototype 21 of 22

Slide 33

Slide 33 text

Conclusions and future work Conclusions The paper presents a new metric for the comparison of business process models The new metric is based on local ﬁring sequences but takes into account also the “structure” of the model With the given metric it is possible to do hierarchical clustering on business process models Future work Improve the metric (for example considering multisets) Implement the procedure in ProM Work on the usability of the interface to allow non expert users to interact with the system 22 of 22