Methods for identifying partial periodic patterns of infrequent events in an event sequences
First Claim
Patent Images
1. A method for mining partial periodic patterns in an event sequence, each pattern comprising a list of events from the event sequence, the method comprising the steps of:
- identifying at least one subsequence of the event sequence and at least one pattern in the at least one subsequence, such that an information gain of the at least one subsequence with respect to the at least one pattern exceeds a predefined threshold; and
storing at least one of the at least one pattern and the at least one subsequence.
1 Assignment
0 Petitions
Accused Products
Abstract
A method is provided for mining partial periodic patterns in an event sequence, wherein each pattern includes a list of events from the event sequence. At least one subsequence of the event sequence and at least one pattern in the at least one subsequence are identified, such that an information gain of the at least one subsequence with respect to the at least one pattern exceeds a predefined threshold. At least one of the at least one pattern and the at least one subsequence is stored.
11 Citations
25 Claims
-
1. A method for mining partial periodic patterns in an event sequence, each pattern comprising a list of events from the event sequence, the method comprising the steps of:
-
identifying at least one subsequence of the event sequence and at least one pattern in the at least one subsequence, such that an information gain of the at least one subsequence with respect to the at least one pattern exceeds a predefined threshold; and
storing at least one of the at least one pattern and the at least one subsequence. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
dividing the at least one subsequence into contiguous segments, wherein each contiguous segment is a same length as the at least one pattern;
for each contiguous segment;
calculating the information gain of the contiguous segment, when all of the events of the contiguous segment match the corresponding events of the at least one pattern;
calculating the information loss of the contiguous segment, when at least one of the events of the contiguous segment does not match the corresponding event of the at least one pattern; and
calculating the information gain of the at least one subsequence, by summing the information gain and the information loss of each of the contiguous segments comprised therein.
-
-
4. The method according to claim 3, wherein said step of calculating the information gain of each of the contiguous segments comprises the steps of:
-
assigning an information value to the at least one pattern; and
assigning an information loss to at least one deviated pattern from the at least one pattern.
-
-
5. The method according to claim 4, wherein said step of assigning an information value to the at least one pattern comprises the steps of:
-
determining an information value of each of the events in the at least one pattern; and
summing the information values of each of the events in the at least one pattern.
-
-
6. The method according to claim 5, wherein said step of determining the information value of each of the events in the pattern comprises the step of determining a probability of occurrence in the event sequence of each of the events in the pattern.
-
7. The method according to claim 1, further comprising the step replacing at least one of the events in the at least one pattern, to obtain a higher degree of repetitiveness of the events in the at least one pattern with respect to the at least one subsequence.
-
8. The method according to claim 1, wherein the at least one subsequence maximizes the information gain of the at least one pattern with respect to any other subsequences of the event sequence.
-
9. The method according to claim 1, wherein said step of identifying the at least one subsequence comprises the step of identifying a subsequence of the event sequence that maximizes the information gain with respect to a pattern contained therein.
-
10. The method according to claim 1, wherein said method is implemented by a program storage device readable by machine, tangibly embodying a program of instructions executable by the machine to perform said method steps.
-
11. A method for mining partial periodic patterns in an event sequence, each pattern comprising a list of events from the event sequence, the method comprising the steps of:
-
identifying singular patterns in the event sequence, wherein each singular pattern has only one position filled by one of the events in the event sequence;
identifying complex patterns in the event sequence, wherein each complex pattern has more than one position filled by one of the events in the event sequence;
identifying the complex patterns and subsequences of the event sequence that comprise the complex patterns, such that an information gain of each of the subsequences with respect to the complex pattern comprised therein exceeds a predefined threshold; and
storing at least one of the complex patterns and the subsequences that contain the complex patterns. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23)
identifying the singular patterns and subsequences of the event sequence that comprise the singular patterns, such that an information gain of each of the subsequences with respect to the singular pattern comprised therein exceeds a predefined threshold; and
storing at least one of the singular patterns and the subsequences that contain the singular patterns.
-
-
14. The method according to claim 11, further comprising the step of respectively identifying the event subsequences that maximize the information gain for at least some of the singular and complex patterns.
-
15. The method according to claim 11, wherein said step of identifying the singular patterns comprises the steps of:
-
for each singular pattern, computing an optimal information surplus (OIS) of each of the events in the singular pattern with respect to each possible period of the singular pattern in a given subsequence of the event sequence;
computing a maximum information gain (MIG) of each of the singular patterns; and
identifying at least one subsequence of the event sequence and at least one singular pattern in the at least one subsequence, such that an information gain of the at least one subsequence with respect to the at least one singular pattern exceeds a predefined threshold.
-
-
16. The method according to claim 15, wherein the step of computing the optimal information surplus (OIS) comprises the steps of:
-
for a given event ak and a given period l;
transforming the event sequence into a sequence X of real numbers, wherein each real number in the sequence X represents the information loss associated with each position in the event sequence;
transforming the event sequence into a sequence Y of real numbers, wherein each real number in the sequence Y represents the information gain associated with each position in the event sequence;
transforming the event sequence into a sequence Z of real numbers, wherein each real number in the sequence Z represents the information adjustment associated with each position in the event sequence;
transforming the event sequence into a sequence E of real numbers, wherein each real number in sequence E represents a number of occurrences of the given event ak within the previous l positions; and
computing the optimal information surplus of each of the events in the event sequence from the sequences X, Y, Z, and E.
-
-
17. The method according to claim 13, wherein the step of identifying the singular patterns further comprises the steps of:
-
for a given position in the event sequence;
identifying a set T of events, wherein each event in the set T has a positive optimal information surplus for the given position;
identifying a set E of events, wherein each event in the set E is in a singular pattern of period l, the singular pattern of period l being contained in a containing subsequence such that an information gain of the containing subsequence with respect to the singular pattern of period l exceeds the predefined threshold;
for each of the events in the set E, initialing a MIG counter for an event-position combination.
-
-
18. The method according to claim 17, wherein said step of identifying the set E comprises the step of sequentially evaluating each of the events in the set T.
-
19. The method according to claim 18, wherein said step of
sequentially evaluating each of the events in the set T comprises the step of inserting an event currently being evaluated into the set E, when there exists l− - 1 events in the set T such that the sum of the optimal information surplus of the l−
1 events and an optimal information surplus of the event currently being evaluated exceeds a predefined threshold.
- 1 events in the set T such that the sum of the optimal information surplus of the l−
-
20. The method according to claim 15, wherein said step of computing the maximum information gain (MIG) of each of the singular patterns comprises the step of:
-
for a given singular pattern having a period l;
transforming the event sequence into a sequence of real numbers Q, wherein each real number in the sequence Q represents one of the information gain and the information loss associated with each segment of l events in the event sequence;
computing the information gain associated with event subsequences of the event sequence; and
identifying a subsequence that maximizes the information gain.
-
-
21. The method according to claim 11, wherein said step of identifying complex patterns comprises the step of:
identifying a minimum common superpattern (MCSP) in a set D of singular patterns having a same period, the minimum common superpattern being such that each of the singular patterns in the set D are a subpattern of the minimum common superpattern and there does not exist another subpattern AP of the minimum common superpattern such that each of the singular patterns in the set D is also a subpattern of the other subpattern AP.
-
22. The method according to claim 21, wherein said step of identifying complex patterns further comprises the step of:
identifying the minimum common superpattern as a complex pattern, when a sum of maximum information gain values of the singular patterns in the set D exceed a predefined threshold.
-
23. The method according to claim 11, wherein said method is implemented by a program storage device readable by machine, tangibly embodying a program of instructions executable by the machine to perform said method steps.
-
24. A method for mining partial periodic patterns in an event sequence, each pattern comprising a list of events from the event sequence, the method comprising the steps of:
-
specifying a minimum information gain for a pattern of events with respect to a subsequence of the event sequence that comprises the pattern;
identifying patterns of events in the event sequence and subsequences of the event sequence that comprise the patterns; and
storing only the patterns and subsequences such that the information gain of the stored patterns with respect to the stored subsequences exceeds a predefined threshold. - View Dependent Claims (25)
-
Specification