System and method for mining surprising temporal patterns
First Claim
Patent Images
1. A method for identifying unexpected temporal patterns in a dataset containing a plurality of transactions, each transaction containing one or more items with a time-stamp, the one or more items being grouped into itemsets containing a pattern of items, the method comprising:
- generating one or more support values at predetermined times for each pattern, each support value indicating a portion of the transactions in the dataset containing each-such pattern at a predetermined time;
selecting unexpected patterns from the dataset which have a support value at a predetermined time that is higher than a predetermined threshold value; and
identifying an unexpected temporal pattern from the selected unexpected patterns, the unexpected temporal pattern having support values over a predetermined time period which change more than a second predetermined threshold value.
1 Assignment
0 Petitions
Accused Products
Abstract
A system and method for data mining is provided in which temporal patterns of itemsets in transactions having unexpected support values are identified. A surprising temporal pattern is an itemset whose support changes over time. The method may use a minimum description length formulation to discover these surprising temporal patterns.
-
Citations
44 Claims
-
1. A method for identifying unexpected temporal patterns in a dataset containing a plurality of transactions, each transaction containing one or more items with a time-stamp, the one or more items being grouped into itemsets containing a pattern of items, the method comprising:
-
generating one or more support values at predetermined times for each pattern, each support value indicating a portion of the transactions in the dataset containing each-such pattern at a predetermined time;
selecting unexpected patterns from the dataset which have a support value at a predetermined time that is higher than a predetermined threshold value; and
identifying an unexpected temporal pattern from the selected unexpected patterns, the unexpected temporal pattern having support values over a predetermined time period which change more than a second predetermined threshold value. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
wherein the selection of unexpected patterns includes selecting a support value with the first number of bits.
-
-
5. The method of claim 1 wherein identifying an unexpected temporal pattern from the selected unexpected patterns includes the second predetermined threshold value being formulated in a minimum description length (MDL) to determine the change in surprise over time.
-
6. The method of claim 5 wherein a first description length indicates a pattern which changes over time, and a second description length, shorter than the first description length, indicates a pattern that does not change over time.
-
7. The method of claim 1 further comprising:
-
following the identification of unexpected temporal patterns, determining if the additional temporal patterns are to be identified;
incrementing the first predetermined threshold; and
reselecting unexpected patterns from the dataset which have a support value at a predetermined time that is higher than the incremented threshold value.
-
-
8. The method of claim 1 in which the itemsets are scored by the set of items in the itemset, and the method further comprising:
offsetting the score of a set of items by unexpected temporal patterns that are already known.
-
9. A method for identifying unexpected temporal patterns in a dataset containing a plurality of transactions, each transaction containing one or more time-stamped items, the one or more items being grouped into itemsets having a pattern of items, the method comprising:
-
generating a support value at a predetermined time for each pattern indicating the portion of the transactions in the dataset containing the pattern of items at the predetermined time;
selecting candidate patterns from the dataset based on the variations in the support value of the pattern over a predetermined time period;
generating an unconstrained description length for each candidate pattern, the unconstrained description length corresponding to a value indicating changes in the support of the pattern over the predetermined time period;
generating a constrained description length for each candidate pattern, the constrained description length corresponding to a compressed value indicating the changes in the support of the pattern over the predetermined time period; and
identifying patterns, based on a comparison of the unconstrained description length and the constrained description length, which have a unexpected support value which changes during the predetermined time period to generate unexpected temporal patterns. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
the method further comprising;
forming a weighted graph to discover the segments lengths by reducing the description length to the shortest path.
-
-
11. The method of claim 10 wherein the generation of a constrained description length for each candidate pattern includes restricting the support value model to have the same value of surprise over all the segments.
-
12. The method of claim 9 wherein the identification of patterns based on a comparison of the unconstrained description length and the constrained description length includes:
-
forming a ration of the unconstrained segment length to the constrained segment lengths;
if the ratio between the unconstrained and restrained segments is large, identifying the pattern as a surprising pattern; and
if the ratio between the unconstrained and restrained segments is not large, identifying the pattern as an unsurprising pattern.
-
-
13. The method of claim 9 wherein the generation of a support value at a predetermined time includes encoding a sequence of items in a coding scheme where it is inexpensive to encode sequences that have a well-known correlation between items, and expensive to encode sequences with unexpected patterns.
-
14. The method of claim 13 in which the selection of candidate patterns from the dataset based on the variations in the support value of the pattern over a predetermined time period includes selecting expensively coded sequences.
-
15. The method of claim 14 wherein the identification of patterns includes ordering items by the difference, ratio, and relative difference between segment lengths.
-
16. The method of claim 9 wherein the identification of patterns, based on a comparison of the unconstrained description length and the constrained description length includes the difference between the unconstrained and constrained description lengths corresponding to the unexpectedness of a temporal pattern.
-
17. A computer system for data mining unexpected temporal patterns, the system comprising:
-
a data storage medium to store a database of a plurality of transactions, each transaction containing one or more time-stamped items, the one or more items being grouped into itemsets having a pattern of items;
an output module to output the identity of the surprising temporal patterns; and
an operating system stored in a computer readable medium with a surprising temporal identifier kernel, which accesses the data storage database, to execute a series of machine-executable instructions that identify surprising temporal patterns as follows;
generating one or more support values at predetermined times for each pattern, each support value indicating a portion of the transactions in the dataset containing each such pattern at a predetermined time;
selecting unexpected patterns from the dataset which have a support value at a predetermined time that is higher than a predetermined threshold value; and
identifying an unexpected temporal pattern from the selected unexpected patterns, the unexpected temporal pattern having support values over a predetermined time period which change more than a second predetermined threshold value. - View Dependent Claims (18, 19, 20, 21, 22, 23, 24)
wherein the selection of unexpected patterns includes selecting a support value with the first number of bits.
-
-
21. The system of claim 17 wherein identifying an unexpected temporal pattern from the selected unexpected patterns includes the second predetermined threshold value being formulated in a minimum description length (MDL) to determine the change in surprise over time.
-
22. The system of claim 21 wherein a first description length indicates a pattern which changes over time, and a second description length, shorter than the first description length, indicates a pattern that does not change over time.
-
23. The system of claim 17 in which the machine-executable instructions further include:
-
following the identification of unexpected temporal patterns, determining if the additional temporal patterns are to be identified;
incrementing the first predetermined threshold; and
reselecting unexpected patterns from the dataset which have a support value at a predetermined time that is higher than the incremented threshold value.
-
-
24. The system of claim 17 in which the itemsets are scored by the set of items in the itemset, and the machine-executable instructions further include:
offsetting the score of the set of items by unexpected temporal patterns that are already known.
-
25. A computer system for data mining unexpected temporal patterns, the system comprising:
-
a data storage medium to store a database of a plurality of transactions, each transaction containing one or more time-stamped items, the one or more items being grouped into itemsets having a pattern of items;
an output module to output the identity of the surprising temporal patterns; and
an operating system stored in a computer readable medium with a surprising temporal identifier kernel, which accesses the data storage database, to execute a series of machine-executable instructions that identify surprising temporal patterns as follows;
generating a support value at a predetermined time for each pattern indicating the portion of the transactions in the dataset containing the pattern of items at the predetermined time;
selecting candidate patterns from the dataset based on the variations in the support value of the pattern over a predetermined time period;
generating an unconstrained description length for each candidate pattern, the unconstrained description length corresponding to a value indicating changes in the support of the pattern over the predetermined time period;
generating a constrained description length for each candidate pattern, the constrained description length corresponding to a compressed value indicating the changes in the support of the pattern over the predetermined time period; and
identifying patterns, based on a comparison of the unconstrained description length and the constrained description length, which have a unexpected support value which changes during the predetermined time period to generate unexpected temporal patterns. - View Dependent Claims (26, 27, 28, 29, 30, 31, 32)
the surprising temporal identifier kernel executing further machine-executable instructions as follow;
forming a weighted graph to discover the segments lengths by reducing the description length to the shortest path.
-
-
27. The system of claim 26 wherein the generation of a constrained description length for each candidate pattern includes restricting the support value model to have the same value of surprise over all the segments.
-
28. The system of claim 25 wherein the identification of patterns based on a comparison of the unconstrained description length and the constrained description length includes:
-
forming a ration of the unconstrained segment length to the constrained segment lengths;
if the ratio between the unconstrained and restrained segments is large, identifying the pattern as a surprising pattern; and
if the ratio between the unconstrained and restrained segments is not large, identifying the pattern as an unsurprising pattern.
-
-
29. The system of claim 25 wherein the generation of a support value at a predetermined time includes encoding a sequence of items in a coding scheme where it is inexpensive to encode sequences that have a well-known correlation between items, and expensive to encode sequences with unexpected patterns.
-
30. The system of claim 29 in which the selection of candidate patterns from the dataset based on the variations in the support value of the pattern over a predetermined time period includes selecting expensively coded sequences.
-
31. The system of claim 30 wherein the identification of patterns includes ordering items by the difference, ratio, and relative difference between segment lengths.
-
32. The system of claim 25 wherein the identification of patterns, based on a comparison of the unconstrained description length and the constrained description length includes the difference between the unconstrained and constrained description lengths corresponding to the unexpectedness of a temporal pattern.
-
33. A computer system for data mining unexpected temporal patterns, the system comprising:
-
a client computer including;
an operating system stored in a computer readable medium with a surprising temporal identifier kernel to execute a series of machine-executable instructions that identify surprising temporal patterns;
an output module to output the identity of the surprising temporal patterns; and
a server computer including;
a processor;
a data storage medium to store a database of a plurality of transactions, each transaction containing one or more time-stamped items, the one or more items being grouped into itemsets having a pattern of items;
an operating system stored in a computer readable medium with a surprising temporal identifier kernel, which accesses the data storage database, to execute a series of machine-executable instructions which identify surprising temporal patterns; and
wherein the surprising temporal identifier kernel executes instructions as follows;
generating one or more support values at predetermined times for each pattern, each support value indicating a portion of the transactions in the dataset containing each such pattern at a predetermined time;
selecting unexpected patterns from the dataset which have a support value at a predetermined time that is higher than a predetermined threshold value; and
identifying an unexpected temporal pattern from the selected unexpected patterns, the unexpected temporal pattern having support values over a predetermined time period which change more than a second predetermined threshold value. - View Dependent Claims (34, 35, 36, 37, 38)
-
-
39. A computer system for data mining unexpected temporal patterns, the system comprising:
-
a client computer including;
an operating system stored in a computer readable medium with a surprising temporal identifier kernel to execute a series of machine-executable instructions which identify surprising temporal patterns;
an output module to output the identity of the surprising temporal patterns; and
a server computer including;
a processor;
a data storage medium to store a database of a plurality of transactions, each transaction containing one or more time-stamped items, the one or more items being grouped into itemsets having a pattern of items;
an operating system stored in a computer readable medium with a surprising temporal identifier kernel, which accesses the data storage database, to execute a series of machine-executable instructions that identify surprising temporal patterns; and
wherein the surprising temporal identifier kernel executes instructions as follows;
generating a support value at a predetermined time for each pattern indicating the portion of the transactions in the dataset containing the pattern of items at the predetermined time;
selecting candidate patterns from the dataset based on the variations in the support value of the pattern over a predetermined time period;
generating an unconstrained description length for each candidate pattern, the unconstrained description length corresponding to a value indicating changes in the support of the pattern over the predetermined time period;
generating a constrained description length for each candidate pattern, the constrained description length corresponding to a compressed value indicating the changes in the support of the pattern over the predetermined time period; and
identifying patterns, based on a comparison of the unconstrained description length and the constrained description length, which have a unexpected support value which changes during the predetermined time period to generate unexpected temporal patterns. - View Dependent Claims (40, 41, 42, 43, 44)
-
Specification