System and method for parallel mining of association rules in databases
First Claim
1. A digital multiprocessor comprising a plurality of processing systems, each processing system including a respective local database having distributed therein data from a transaction database of itemsets purchased in consumer transactions, each processing system including:
- means for determining the number of times a candidate itemset appears in the associated local database to establish a local count for the candidate itemset;
means for using the local count to determine whether the number of times a candidate itemset appears in the transaction database exceeds a predefined minimum support value;
means for entering a candidate itemset into a set of large itemsets when the number of times exceeds a predetermined minimum support value, such that the set of large itemsets can be designated as frequently occurring itemsets in transactions;
means for determining, for at least some of the itemsets in the set of large itemsets the number of times selected subsets of the itemsets appear in the transaction database;
means for outputting an association rule when the ratio of the number of times a selected subset appears in the transaction database to the number of times the associated itemset appears in the transaction database exceeds a predetermined minimum confidence value and thereby satisfies a minimum confidence constraint; and
means for exchanging the local count with the other processing systems such that each processing system determines whether the number of times a candidate itemset appears in the transaction database exceeds the predefined minimum support value.
1 Assignment
0 Petitions
Accused Products
Abstract
A multiprocessor including a plurality of processing systems is disclosed for discovering consumer purchasing tendencies. Each processing system of the multiprocessor identifies consumer transaction itemsets that are stored in a database that is distributed among the processing systems and which appear in the database a user-defined minimum number of times, referred to as minimum support. Then, the system discovers association rules in the itemsets by comparing the ratio of the number of times each of the large itemsets appears in the database to the number of times particular subsets of the itemset appear in the database. When the ratio exceeds a predetermined minimum confidence value, the system outputs an association rule which is representative of purchasing tendencies of consumers.
105 Citations
19 Claims
-
1. A digital multiprocessor comprising a plurality of processing systems, each processing system including a respective local database having distributed therein data from a transaction database of itemsets purchased in consumer transactions, each processing system including:
-
means for determining the number of times a candidate itemset appears in the associated local database to establish a local count for the candidate itemset; means for using the local count to determine whether the number of times a candidate itemset appears in the transaction database exceeds a predefined minimum support value; means for entering a candidate itemset into a set of large itemsets when the number of times exceeds a predetermined minimum support value, such that the set of large itemsets can be designated as frequently occurring itemsets in transactions; means for determining, for at least some of the itemsets in the set of large itemsets the number of times selected subsets of the itemsets appear in the transaction database; means for outputting an association rule when the ratio of the number of times a selected subset appears in the transaction database to the number of times the associated itemset appears in the transaction database exceeds a predetermined minimum confidence value and thereby satisfies a minimum confidence constraint; and means for exchanging the local count with the other processing systems such that each processing system determines whether the number of times a candidate itemset appears in the transaction database exceeds the predefined minimum support value. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A computer-implemented method for identifying frequently recurring itemsets in transactions which are stored in a transaction database, the transaction database being distributed among a first local database associated with a first digital processing system and at least a second local database associated with a second digital processing system, comprising:
-
(a) determining the number of times a candidate itemset appears in the first local database to establish a local count for the candidate itemset; (b) exchanging the local counts with the at least second processing system; (c) using the local count to determine whether the number of times a candidate itemset appears in the transaction database exceeds a predefined minimum support value; (d) entering a candidate itemset into a set of large itemsets when the number of times exceeds a predetermined minimum support value; (e) for at least some of the itemsets in the set of large itemsets, determining the number of times selected subsets of the itemsets appear in the transaction database; and (f) outputting an association rule when the ratio of the number of times a selected subset appears in the transaction database to the number of times the associated itemset appears in the transaction database exceeds a predetermined minimum confidence value and thereby satisfies a minimum confidence constraint. - View Dependent Claims (7, 8)
-
-
9. A system for determining association rules in a transaction database of itemsets distributed among a plurality of local databases, each being associated with a respective processing system, each processing system including:
-
a large itemset generator for generating candidate large itemsets in accordance with a predetermined concatenation regime; a counter for counting the number of times selected of the candidate large itemsets appear in the local database to establish a local count; an exchanger for exchanging data representative of the local count or local database between the processing system and the other processing systems, wherein the system includes; at least one rule generator for generating association rules based on the local counts. - View Dependent Claims (10, 11, 12, 13, 14)
-
-
15. A computer program device comprising:
-
a computer program storage device readable by a first digital processing system; and a program means on the program storage device and including instructions executable by the digital processing system for performing method steps for identifying frequently recurring itemsets in transactions which are stored in a transaction database, the transaction database being distributed among a first local database associated with the first digital processing system and at least a second local database associated with a second digital processing system, the method steps comprising; determining the number of times a candidate itemset appears in the first local database to establish a local count for the candidate itemset; using the local count to determine whether the number of times a candidate itemset appears in the transaction database exceeds a predefined minimum support value; entering a candidate itemset into a set of large itemsets when the number of times exceeds a predetermined minimum support value, such that the set of large itemsets can be designated as frequently occurring itemsets in transactions; for at least some of the itemsets in the set of large itemsets, determining the number of times selected subsets of the itemsets appear in the transaction database; outputting an association rule when the ratio of the number of times a selected subset appears in the transaction database to the number of times the associated itemset appears in the transaction database exceeds a predetermined minimum confidence value and thereby satisfies a minimum confidence constraint; and after establishing the local count, exchanging the local counts with the at least second processing system such that each processing system determines whether the number of times a candidate itemset appears in the transaction database exceeds the predefined minimum support value. - View Dependent Claims (16, 17, 18, 19)
-
Specification