METHOD AND SYSTEM FOR CLUSTERING, MODELING, AND VISUALIZING PROCESS MODELS FROM NOISY LOGS
First Claim
Patent Images
1. A computer-implemented process discovery method, comprising:
- receiving as input at least one noisy log file that contains a plurality of labeled log traces from a plurality of process models;
clustering similar log traces using non-negative matrix factorization (NMF) into a plurality of clusters, wherein each cluster represents a different process model;
learning a Conditional Random Field (CRF) model for each of the process models;
decoding new incoming log traces; and
constructing a tunable process graph, wherein one or more transitions are shown or hidden according to a tuning parameter.
7 Assignments
0 Petitions
Accused Products
Abstract
A process discovery system that includes an offline system training module configured to cluster similar process log traces using Non-negative Matrix Factorization (NMF) with each cluster representing a process model, and learn a Conditional Random Field (CRF) model for each process model and an online system usage module configured to decode new incoming log traces and construct a process graph in which transitions are shown or hidden according to a tuning parameter.
-
Citations
19 Claims
-
1. A computer-implemented process discovery method, comprising:
-
receiving as input at least one noisy log file that contains a plurality of labeled log traces from a plurality of process models; clustering similar log traces using non-negative matrix factorization (NMF) into a plurality of clusters, wherein each cluster represents a different process model; learning a Conditional Random Field (CRF) model for each of the process models; decoding new incoming log traces; and constructing a tunable process graph, wherein one or more transitions are shown or hidden according to a tuning parameter. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A process discovery system comprising:
-
an offline system training module configured to receive as input at least one noisy log file that contains a plurality of labeled log traces from a plurality of process models, cluster similar log traces using Non-negative Matrix Factorization (NMF) with each cluster representing a different process model, and learn a Conditional Random Field (CRF) model for each process model; an online system usage module configured to decode new incoming log traces and to construct a tunable process graph in which transitions are shown or hidden according to a tuning parameter. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
-
-
17. A computer-implemented process discovery method comprising:
-
receiving as input at least one noisy log file that contains a plurality of labeled trace activity log entries from a plurality of process models, wherein each trace in the log comprises a document; calculating a term frequency-inverse document frequency (TF-IDF) vector score for each document in the log file, wherein words appearing in the document comprise the features of a vector for which the TF-IDF vector score is calculated; obtaining a term-document matrix, wherein each cell contains the TF-IDF score of a given term in a given document; applying non-negative matrix factorization (NMF) to cluster similar documents; obtaining a plurality of clusters of noisy process documents via NMF, wherein each cluster contains the documents of different instances of the same process model. for each cluster and for each activity log entry in a document, associating a TF-IDF vector is performed as follows; a label for each activity log entry is assigned according to a reference annotation; the features of the vector are words occurring in the entry; for each feature, a TF-IDF score is computed by taking into account all the entries in this cluster only; a Boolean feature comprising the name of the previous activity is added; computing feature matrices, wherein the feature matrices comprise term-document matrices in which each document is a trace activity entry and is augmented with at least one Boolean feature that represents the previous activity; training a conditional random field (CRF); obtaining as output a plurality of CRFs, wherein each CRF is configured to model one or more transition probabilities between activities of one process model; storing a plurality of inverse document frequency (IDF) vectors of terms, wherein each vector is the size of a feature vocabulary for a given cluster. - View Dependent Claims (18, 19)
-
Specification