TAXONOMY-DRIVEN LUMPING FOR SEQUENCE MINING
First Claim
1. A computer implemented method for modeling event data using a pre-existing taxonomy of events, the event data representing a plurality of sequences of events, each sequence comprising an order of events initiated by a corresponding user, each event mapping to a leaf node of the taxonomy, the method comprising:
- identifying a plurality of candidate Markov models, each Markov model representing probabilities of a user transitioning from any first node in the Markov model to any second node in the Markov model according to the sequences of events, each Markov model formed from a subset of nodes in the taxonomy by merging selected nodes of the taxonomy into corresponding ancestor nodes of the taxonomy, wherein each event is represented by a node in each Markov model, and further wherein no Markov model contains both a particular node and an ancestor of that particular node;
measuring the fitness of the candidate Markov models with a fitness policy;
selecting at least some of the plurality of candidate Markov models with reference to the fitness measure and one or more resource constraints; and
choosing a preferred Markov model from the selected candidate Markov models with reference to an objective function.
10 Assignments
0 Petitions
Accused Products
Abstract
Methods and apparatus are described for modeling sequences of events with Markov models whose states correspond to nodes in a provided taxonomy. Each state represents the events in the subtree under the corresponding node. By lumping observed events into states that correspond to internal nodes in the taxonomy, more compact models are achieved that are easier to understand and visualize, at the expense of a decrease in the data likelihood. The decision for selecting the best model is taken on the basis of two competing goals: maximizing the data likelihood, while minimizing the model complexity (i.e., the number of states).
25 Citations
20 Claims
-
1. A computer implemented method for modeling event data using a pre-existing taxonomy of events, the event data representing a plurality of sequences of events, each sequence comprising an order of events initiated by a corresponding user, each event mapping to a leaf node of the taxonomy, the method comprising:
-
identifying a plurality of candidate Markov models, each Markov model representing probabilities of a user transitioning from any first node in the Markov model to any second node in the Markov model according to the sequences of events, each Markov model formed from a subset of nodes in the taxonomy by merging selected nodes of the taxonomy into corresponding ancestor nodes of the taxonomy, wherein each event is represented by a node in each Markov model, and further wherein no Markov model contains both a particular node and an ancestor of that particular node; measuring the fitness of the candidate Markov models with a fitness policy; selecting at least some of the plurality of candidate Markov models with reference to the fitness measure and one or more resource constraints; and choosing a preferred Markov model from the selected candidate Markov models with reference to an objective function. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A system for modeling event data using a pre-existing taxonomy of events, the event data representing a plurality of sequences of events, each sequence comprising an order of events initiated by a corresponding user, each event mapping to a leaf node of the taxonomy, the system comprising one or more computing devices configured to:
-
identify a plurality of candidate Markov models representing the probabilities of a user transitioning from any first node in the model to any second node in the model according to the sequences of events, each Markov model formed from a subset of nodes in the taxonomy by merging selected nodes of the taxonomy into corresponding ancestor nodes of the taxonomy, wherein each event is represented by a node in each Markov model, further wherein no Markov model contains both a particular node and an ancestor of that particular node; measure the fitness of the candidate Markov models with a fitness policy; select at least some of the plurality of candidate Markov models with reference to the fitness measure and one or more resource constraints; and choose a preferred Markov model from the selected candidate Markov models with reference to an objective function. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A computer program product for modeling event data using a pre-existing taxonomy of events, the event data representing a plurality of sequences of events, each sequence comprising an order of events initiated by a corresponding user, each event mapping to a leaf node of the taxonomy, comprising at least one computer-readable medium having computer instructions stored therein which are configured to cause one or more computing devices to:
-
identify a plurality of candidate Markov models representing the probabilities of a user transitioning from any first node in the model to any second node in the model according to the sequences of events, each Markov model formed from a subset of nodes in the taxonomy by merging selected nodes of the taxonomy into corresponding ancestor nodes of the taxonomy, wherein each event is represented by a node in each Markov model, further wherein no Markov model contains both a particular node and an ancestor of that particular node; measure the fitness of the candidate Markov models with a fitness policy; select at least some of the plurality of candidate Markov models with reference to the fitness measure and one or more resource constraints; and choose a preferred Markov model from the selected candidate Markov models with reference to an objective function. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification