Method and system for mining quantitative association rules in large relational tables
First Claim
1. A method for identifying quantitative association rules from a table of records, each record having a plurality of attributes associated therewith, the attributes including quantitative and categorical attributes, each attribute having a value, the method comprising the steps of:
- partitioning the values of each quantitative attribute from a selected group of quantitative attributes into a respective plurality of intervals;
determining a support for each value of the categorical attributes and the non-partitioned quantitative attributes, and a support for each interval of the partitioned quantitative attributes, the support for a value being a number of records in the table whose attribute values include the value, the support for an interval being a number of records in the table whose attribute values are part of the interval;
for each quantitative attribute, combining adjacent values of the attribute if the attribute is not partitioned, or adjacent intervals of the attribute if the attribute is partitioned, into ranges, as long as the support for each range is less than a maximum support;
identifying items with at least a minimum support, each item representing a quantitative attribute and a range, or a categorical attribute and a value, the items with at least the minimum support making up a seed set;
generating candidate itemsets from the seed set, each itemset being a set of items and having a support, the support of the itemset being a number of records in the table which support the itemset;
determining frequent itemsets from the candidate itemsets, the frequent itemsets being those itemsets whose support is more than the minimum support, the determined frequent itemsets becoming the next seed set;
repeating the steps of generating candidate itemsets and determining frequent itemsets until all the frequent itemsets are found; and
outputting an association rule when the support of a selected frequent itemset bears a predetermined relationship to the support of a subset of the selected frequent itemset, thereby satisfying a minimum confidence constraint, the association rule being an expression of the form XY where X and Y are itemsets.
1 Assignment
0 Petitions
Accused Products
Abstract
A method and apparatus are disclosed for mining quantitative association rules from a relational table of records. The method comprises the steps of: partitioning the values of selected quantitative attributes into intervals, combining adjacent attribute values and intervals into ranges, generating candidate itemsets, determining frequent itemsets, and outputting an association rule when the support for a frequent itemset bears a predetermined relationship to the support for a subset of the frequent itemset. Preferably, the partitioning step includes determining whether to partition and the number of partitions based on a partial incompleteness measure. The candidate generation includes discarding those itemsets not meeting a user-specified interest level and those having a subset which is not a frequent itemset. The frequent itemsets are determined using super-candidates that include information of the candidate itemsets. Preferably, each super-candidate has a data structure, such as a multi-dimensional tree or array, representing quantitative attributes common to the replaced candidate itemsets.
-
Citations
30 Claims
-
1. A method for identifying quantitative association rules from a table of records, each record having a plurality of attributes associated therewith, the attributes including quantitative and categorical attributes, each attribute having a value, the method comprising the steps of:
-
partitioning the values of each quantitative attribute from a selected group of quantitative attributes into a respective plurality of intervals; determining a support for each value of the categorical attributes and the non-partitioned quantitative attributes, and a support for each interval of the partitioned quantitative attributes, the support for a value being a number of records in the table whose attribute values include the value, the support for an interval being a number of records in the table whose attribute values are part of the interval; for each quantitative attribute, combining adjacent values of the attribute if the attribute is not partitioned, or adjacent intervals of the attribute if the attribute is partitioned, into ranges, as long as the support for each range is less than a maximum support; identifying items with at least a minimum support, each item representing a quantitative attribute and a range, or a categorical attribute and a value, the items with at least the minimum support making up a seed set; generating candidate itemsets from the seed set, each itemset being a set of items and having a support, the support of the itemset being a number of records in the table which support the itemset; determining frequent itemsets from the candidate itemsets, the frequent itemsets being those itemsets whose support is more than the minimum support, the determined frequent itemsets becoming the next seed set; repeating the steps of generating candidate itemsets and determining frequent itemsets until all the frequent itemsets are found; and outputting an association rule when the support of a selected frequent itemset bears a predetermined relationship to the support of a subset of the selected frequent itemset, thereby satisfying a minimum confidence constraint, the association rule being an expression of the form XY where X and Y are itemsets. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A computer program product for use with a computer system for identifying quantitative association rules from a table of records, each record having a plurality of attributes associated therewith, the attributes including quantitative and categorical attributes, each attribute having a value, the computer program product comprising:
-
a recording medium; means, recorded on the recording medium, for directing the computer system to partition the values of each quantitative attribute from a selected group of quantitative attributes into a respective plurality of intervals; means, recorded on the recording medium, for directing the computer system to determine a support for each value of the categorical attributes and the non-partitioned quantitative attributes, and a support for each interval of the partitioned quantitative attributes, the support for a value being a number of records in the table whose attribute values include the value, the support for an interval being a number of records in the table whose attribute values are part of the interval; means, recorded on the recording medium, for directing the computer system, for each quantitative attribute, to combine adjacent values of the attribute if the attribute is not partitioned, or adjacent intervals of the attribute if the attribute is partitioned, into ranges, as long as the support for each range is less than a maximum support; means, recorded on the recording medium, for directing the computer system to identify items with at least a minimum support, each item representing a quantitative attribute and a range, or a categorical attribute and a value, the items with at least the minimum support making up a seed set; means, recorded on the recording medium, for directing the computer system to generate candidate itemsets from the seed set, each itemset being a set of items and having a support, the support of the itemset being a number of records in the table which support the itemset; means, recorded on the recording medium, for directing the computer system to determine frequent itemsets from the candidate itemsets, the frequent itemsets being those itemsets whose support is more than the minimum support, the determined frequent itemsets becoming the next seed set; means, recorded on the recording medium, for directing the computer system to repeat the steps of generating candidate itemsets and determining frequent itemsets until all the frequent itemsets are found; and means, recorded on the recording medium, for directing the computer system to output an association rule when the support of a selected frequent itemset bears a predetermined relationship to the support of a subset of the selected frequent itemset, thereby satisfying a minimum confidence constraint, the association rule being an expression of the form XY where X and Y are itemsets. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
-
-
21. A computer-based system for identifying quantitative association rules from a table of records, each record having a plurality of attributes associated therewith, the attributes including quantitative and categorical attributes, each attribute having a value, the system comprising:
-
means for partitioning the values of each quantitative attribute from a selected group of quantitative attributes into a respective plurality of intervals; means for determining a support for each value of the categorical attributes and the non-partitioned quantitative attributes, and a support for each interval of the partitioned quantitative attributes, the support for a value being a number of records in the table whose attribute values include the value, the support for an interval being a number of records in the table whose attribute values are part of the interval; means for combining, for each quantitative attribute, adjacent values of the attribute if the attribute is not partitioned, or adjacent intervals of the attribute if the attribute is partitioned, into ranges, as long as the support for each range is less than a maximum support; means for identifying items with at least a minimum support, each item representing a quantitative attribute and a range, or a categorical attribute and a value, the items with at least the minimum support making up a seed set; means for generating candidate itemsets from the seed set, each itemset being a set of items and having a support, the support of the itemset being a number of records in the table which support the itemset; means for determining frequent itemsets from the candidate itemsets, the frequent itemsets being those itemsets whose support is more than the minimum support, the determined frequent itemsets becoming the next seed set; means for repeating the operation of the means for generating candidate itemsets and means for determining frequent itemsets until a. II the frequent itemsets are found; and means for outputting an association rule when the support of a selected frequent itemset bears a predetermined relationship to the support of a subset of the selected frequent itemset, thereby satisfying a minimum confidence constraint, the association rule being an expression of the form XY where X and Y are itemsets. - View Dependent Claims (22, 23, 24, 25, 26, 27, 28, 29, 30)
-
Specification