System and method for mining generalized association rules in databases
First Claim
1. A computer program device comprising:
- a computer program storage device readable by a digital processing apparatus; and
a program means on the program storage device and including instructions executable by the digital processing apparatus for performing method steps for identifying association rules in itemsets with a hierarchical taxonomy on items of the itemsets, the taxonomy defining descendant and ancestor relationships between the items, the method steps comprising;
entering an itemset into a set of large itemsets when the number of times the itemset is present in a database of transactions establishes a support value that exceeds a predefined minimum support value;
for at least some of the itemsets in the set of large itemsets, determining the number of times selected subsets of the itemsets appear in transactions in the database; and
(d) outputting an association rule when the number of times a selected subset appears in the database bears a predetermined relationship to the number of times the associated itemset appears in the database and thereby satisfies a minimum confidence constraint.
3 Assignments
0 Petitions
Accused Products
Abstract
A system and method for discovering consumer purchasing tendencies includes a computer-implemented program which identifies consumer transaction itemsets that are stored in a database and which appear in the database a user-defined minimum number of times, referred to as minimum support. The itemsets contain items that are characterized by a hierarchical taxonomy. Then, the system discovers association rules, potentially across different levels of the taxonomy, in the itemsets by comparing the number of times each of the large itemsets appears in the database to the number of times particular subsets of the itemset appear in the database. When the relationship exceeds a predetermined minimum confidence value, the system outputs a generalized association rule which is representative of purchasing tendencies of consumers. The set of generalized association rules can be pruned of uninteresting rules, i.e., association rules which do not occur at a frequency that is significantly different than what is expected based upon the frequency of occurrence of the rule'"'"'s ancestors.
-
Citations
24 Claims
-
1. A computer program device comprising:
-
a computer program storage device readable by a digital processing apparatus; and a program means on the program storage device and including instructions executable by the digital processing apparatus for performing method steps for identifying association rules in itemsets with a hierarchical taxonomy on items of the itemsets, the taxonomy defining descendant and ancestor relationships between the items, the method steps comprising; entering an itemset into a set of large itemsets when the number of times the itemset is present in a database of transactions establishes a support value that exceeds a predefined minimum support value; for at least some of the itemsets in the set of large itemsets, determining the number of times selected subsets of the itemsets appear in transactions in the database; and (d) outputting an association rule when the number of times a selected subset appears in the database bears a predetermined relationship to the number of times the associated itemset appears in the database and thereby satisfies a minimum confidence constraint. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A computer program product for use with a computer system, a central processing unit and means coupled to the central processing unit for storing a database to identify association rules in itemsets of transactions which are stored in the database, the itemsets being characterized by items in a hierarchical taxonomy, comprising:
a data storage device including a computer usable medium having computer readable program means for identifying association rules in itemsets having items anywhere in the hierarchy of the taxonomy, the computer usable code means having; computer readable code means for accessing an itemset; computer readable code means for entering the itemset into a set of large itemsets when the number of times the itemset is present in the database exceeds a predefined minimum support value; computer readable code means for determining for at least some of the itemsets in the set of large itemsets, the number of times selected subsets of the itemsets appear in transactions in the database; and computer readable code means for outputting an association rule when the number of times a selected subset appears in the database bears a predetermined minimum confidence relationship to the number of times the associated itemset appears in the database, thereby satisfying a minimum confidence constraint. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19)
-
20. A program storage device readable by a digital processing apparatus and tangibly embodying a program of instructions executable by the digital processing apparatus to perform method steps for identifying generalized association rules of itemsets in transactions stored in a database and having more than one item, the items being characterized by a taxonomic structure, so as to discover customer purchasing tendencies, the method steps comprising:
-
identifying as large itemsets those itemsets having items located anywhere in the taxonomic structure and recurring with at least a user-defined minimum support; discovering association rules between the large itemsets and subsets thereof when the subset recurrence bears a predetermined relationship to itemset recurrence; and outputting the association rules as representative of customer purchasing tendencies. - View Dependent Claims (21, 22)
-
-
23. A database mining system for discovering association rules in itemsets having items stored in a taxonomically structured database, comprising:
-
a large itemset generator for generating large itemsets when the itemsets have a support in a transaction database at least equal to a user-defined minimum support value; an association rule generator for receiving the large itemsets and outputting an association rule when an itemset bears a confidence relationship to at least one of its subsets equal to or greater than a predetermined confidence relationship; and a rule pruner for identifying an association rule as interesting when the support and the confidence relationship respectively exceed an expected support and an expected confidence relationship by a preselected factor.
-
-
24. A computer-based system for discovering purchasing tendencies of consumers by identifying association rules between itemsets of transactions and subsets of the itemsets, the subsets including one or more items, comprising:
-
a multi-level taxonomic structure accessible by the computer for storing the items in a hierarchical relationship; a large itemset generator accessing the taxonomic structure and the transactions for determining a first number of times an itemset appears in the transactions and for designating the itemset as a large itemset when the first number of times exceeds a minimum support value; and an association rule discoverer accessing the large itemset generator for determining a second number of times at least one subset of an itemset appears in the transactions, the association rule discoverer outputting an association rule representative of purchasing tendencies of consumers when the first number of times bears a predetermined minimum confidence relationship to the second number of times.
-
Specification