System and method for quickly mining association rules in databases
First Claim
1. A method executable by a computer having a computer program storage device readable by the computer and a program means on the program storage device and including instructions executable by the computer for performing method steps identifying association rules in itemsets in transactions which are stored in a database, each itemset characterized by one or more items, the method comprising:
- entering an itemset into a set of large itemsets when the number of times the itemset is present in the database exceeds a predefined minimum support value;
concatenating itemsets in the set of large itemsets in accordance with a predetermined concatenation regime to generate a next set of candidate large itemsets and discarding all candidate large itemsets whose subsets are not large itemsets;
comparing each itemset in the next set of candidate large itemsets to the itemsets in the database to determine the number of times the candidate large itemset is present in the database;
entering a candidate large itemset into a next forward set of large itemsets only when the number of times the candidate large itemset is present in the database is greater than the minimum support value;
for at least some of the itemsets in the next forward set of large itemsets, determining the number of times selected subsets of the itemsets appear in the database; and
outputting an association rule when the ratio of the number of times a selected subset having a plurality of items appears in the database to the number of times the associated itemset appears in the database exceeds a predetermined minimum confidence value and thereby satisfies a minimum confidence constraint.
1 Assignment
0 Petitions
Accused Products
Abstract
A system and method for discovering consumer purchasing tendencies includes a computer-implemented program which identifies consumer transaction itemsets that are stored in a database and which appear in the database a user-defined minimum number of times, referred to as minimum support. Then, the system discovers association rules in the itemsets by comparing the ratio of the number of times each of the large itemsets appears in the database to the number of times particular subsets of the itemset appear in the database. When the ratio exceeds a predetermined minimum confidence value, the system outputs an association rule which is representative of purchasing tendencies of consumers.
-
Citations
19 Claims
-
1. A method executable by a computer having a computer program storage device readable by the computer and a program means on the program storage device and including instructions executable by the computer for performing method steps identifying association rules in itemsets in transactions which are stored in a database, each itemset characterized by one or more items, the method comprising:
-
entering an itemset into a set of large itemsets when the number of times the itemset is present in the database exceeds a predefined minimum support value; concatenating itemsets in the set of large itemsets in accordance with a predetermined concatenation regime to generate a next set of candidate large itemsets and discarding all candidate large itemsets whose subsets are not large itemsets; comparing each itemset in the next set of candidate large itemsets to the itemsets in the database to determine the number of times the candidate large itemset is present in the database; entering a candidate large itemset into a next forward set of large itemsets only when the number of times the candidate large itemset is present in the database is greater than the minimum support value; for at least some of the itemsets in the next forward set of large itemsets, determining the number of times selected subsets of the itemsets appear in the database; and outputting an association rule when the ratio of the number of times a selected subset having a plurality of items appears in the database to the number of times the associated itemset appears in the database exceeds a predetermined minimum confidence value and thereby satisfies a minimum confidence constraint. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A computer program product for use with a computer system, a central processing unit and means coupled to the central processing unit for storing a database to identify association rules in itemsets in transactions which are stored in the database, each itemset characterized by one or more items, comprising:
a data storage device including a computer usable medium having computer readable program means for identifying association rules in itemsets in transactions which are stored in a database, the computer usable code means having; computer readable code means for entering an itemset into a set of large itemsets when the number of times the itemset is present in the database exceeds a predefined minimum support value; computer readable code means for concatenating itemsets in the set of large itemsets in accordance with a predetermined concatenation regime to generate a next set of candidate large itemsets; computer readable code means for comparing each itemset in the next set of candidate large itemsets to the itemsets in the database to determine the number of times the candidate large itemset is present in the database; computer readable code means for entering a candidate large itemset into a next forward set of large itemsets only when the number of times the candidate large itemset is present in the database is greater than the minimum support value; computer readable code means for, at least for some of the itemsets in the next forward set of large itemsets, determining the number of times selected subsets of the itemsets appear in the database; and computer readable code means for outputting an association rule when the ratio of the number of times a selected subset having a plurality of items appears in the database to the number of times the associated itemset appears in the database exceeds a predetermined minimum confidence value and thereby satisfies a minimum confidence constraint. - View Dependent Claims (7, 8, 9, 10)
-
11. A method executable by a digital processing apparatus in response to a program of instructions executable by the digital processing apparatus for identifying association rules of itemsets stored in a database and having more than one item so as to discover customer purchasing tendencies, the method comprising:
-
identifying large itemsets in the database as those itemsets recurring with at least a user-defined minimum support; discovering association rules between the large itemsets and subsets thereof when the ratio of subset recurrence to itemset recurrence at least equals a user-defined confidence; and outputting the association rules as representative of customer purchasing tendencies each association rule including, as a consequent, a subset having a plurality of items. - View Dependent Claims (12, 13, 14, 15, 16, 17)
-
-
18. A database mining system for discovering association rules in itemsets stored in a database, each itemset including a plurality of items, the system comprising:
-
means for entering an itemset into a set of large itemsets when the number of times the itemset is present in the database exceeds a predefined minimum support value; means for concatenating itemsets in the set of large itemsets in accordance with a predetermined concatenation regime to generate a next set of candidate large itemsets; means for comparing each itemset in the next set of candidate large itemsets to the itemsets in the database to determine the number of times the candidate large itemset is present in the database; means for entering a candidate large itemset into a next forward set of large itemsets when the number of times the candidate large itemset is present in the database is greater than the minimum support value; means for entering at least some of the itemsets in the next forward set of large itemsets, determining the number of times selected subsets of the itemsets appear in the database; and means for outputting an association rule when the ratio of the number of times a selected subset having a plurality of items appears in the database to the number of times the associated itemset appears in the database exceeds a predetermined minimum confidence value and thereby satisfies a minimum confidence constraint.
-
-
19. A computer-based system for discovering purchasing tendencies of consumers by identifying association rules between itemsets of transactions and subsets of the itemsets, the subsets including one or more items, comprising:
-
a multi-level hash tree data structure accessible by the computer for electronically storing the itemsets; a large itemset generator accessing the database for determining a first number of times an itemset appears in the database and for designating the itemset as a large itemset when the first number of times exceeds a minimum support value; and an association rule discoverer accessing the large itemset generator for determining a second number of times at least one subset having a plurality of items of an itemset appears in the database, the association rule discoverer outputting an association rule representative of purchasing tendencies of consumers when the ratio of the first number of times to the second number of times equals a predetermined minimum confidence value.
-
Specification