Method and apparatus for generating weighted association rules
First Claim
Patent Images
1. In a data mining system, a method for identifying the presence of selected items and transactions contained in a plurality of records collectively stored in an electronic database, wherein said method comprises:
- assigning preselected value weights to items and transactions;
reading each record in the electronic database in a substantially sequential flow;
counting the number of times each item appears throughout the plurality of records;
for each item counted, comparing a fraction of the cumulative weight of the records that include such item divided by the cumulative weight of all items in all records (weighted support), to a preselected support threshold;
generating sets of items including at least some of the items having said weighted support exceeding said preselected support threshold;
reading said records having a set of items whose weighted support exceeds said preselected support threshold;
counting the number of times each of said set of items appears throughout the plurality of records;
for each generated set of items, comparing the weighted support of said generated set of items to said preselected support threshold;
repeating the steps of generating new sets of items, reading records and comparing the weighted support of said generated set of items to said preselected support threshold until no new sets of items exceeding said preselected support threshold can be detected.
1 Assignment
0 Petitions
Accused Products
Abstract
The present invention discloses a data mining method and apparatus that assigns weight values to items and/or transactions based on the value to the user, thereby resulting in association rules of greater importance. A conservative method, aggressive method, or a combination of the two can be used when generating supersets.
87 Citations
20 Claims
-
1. In a data mining system, a method for identifying the presence of selected items and transactions contained in a plurality of records collectively stored in an electronic database, wherein said method comprises:
-
assigning preselected value weights to items and transactions;
reading each record in the electronic database in a substantially sequential flow;
counting the number of times each item appears throughout the plurality of records;
for each item counted, comparing a fraction of the cumulative weight of the records that include such item divided by the cumulative weight of all items in all records (weighted support), to a preselected support threshold;
generating sets of items including at least some of the items having said weighted support exceeding said preselected support threshold;
reading said records having a set of items whose weighted support exceeds said preselected support threshold;
counting the number of times each of said set of items appears throughout the plurality of records;
for each generated set of items, comparing the weighted support of said generated set of items to said preselected support threshold;
repeating the steps of generating new sets of items, reading records and comparing the weighted support of said generated set of items to said preselected support threshold until no new sets of items exceeding said preselected support threshold can be detected. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
4. In a data mining system, the method according to claim 1 comprising the assignment of weights to items, further comprises scaling said weighted support of a set of items by a ratio of the cumulative weight of the items in the itemset, divided by the cumulative weight of the items in the largest weight of the generated sets of items having at least some of the items of a weighted support exceeding said preselected support threshold.
-
5. In a data mining system, the method according to claim 4 comprising the assignment of weights to items, wherein said scaling of said weighted support is applied to generated sets of items up to a preselected number of items per set.
-
6. In a data mining system, the method according to claim 1 comprising the assignment of weights to items, further comprises scaling said weighted support of a set of items by a ratio of the cumulative weight of the items in the itemset, divided by the cumulative weight of the items in the largest weight of the generated sets of items having at least some of the items of a weighted support exceeding said preselected support threshold and which includes additional items of lower weights.
-
7. In a data mining system, the method according to claim 6 comprising the assignment of weights to items, wherein said scaling of said weighted support is applied to generated sets of items up to a preselected number of items per set.
-
8. In a data mining system, the method according to claim 1 comprising the assignment of weights to items, further comprises scaling the weighted support of a set of items for all sets below a preselected item size by a ratio of the cumulative weight of the items in the set of items, divided by the cumulative weight of the items in the largest weight of the generated sets of items, and scaling the weighted support of a set of items for all sets of items above the preselected item size by a ratio of the cumulative weight of the items in the set of items, divided by the cumulative weight of the items in the largest weight of a direct generated set of items up to a preselected item size.
-
9. In a data mining system, the method according to claim 1 comprising the assignment of weights to items, further comprises storing said value weights values in an electronic storage means.
-
10. A method of searching a collection of data records to detect records having sets of items (itemsets) and/or selected transactions, to form association rules corresponding to the itemsets detected based on weighted values assigned to the items and transactions, said method comprising the steps of:
-
a. reading data records in a seriatim manner;
b. assigning said weights to items and transactions;
c. incrementing a separate weight counter for each itemset and for each selected transaction detected in a record;
d. comparing the weighted support of an itemset and a selected transaction to a preselected support threshold;
e. generating new supersets from itemsets having a weighted support greater than the preselected support threshold;
f. reading the records identified as containing itemsets with a weighted support greater than the preselected support threshold;
g. incrementing a separate weight counter for each superset detected in a record;
h. comparing the weighted support of each superset to the preselected support threshold;
i. repeating steps a through h until every itemset has been counted; and
j. creating association rules from the itemsets and selected transactions in steps a through i. - View Dependent Claims (11, 12, 13)
-
-
14. A programmable general purpose computer apparatus for searching a file of records collectively stored in an electronic database, wherein said records contain at least one item, the search determining sets of items (itemsets) and searching for the generated itemsets among the records based on pre-selected weighted values assigned to the items and/or records, said apparatus comprising:
-
a processor means for performing decision making, control operations and data manipulation;
an array of memory storage means having address inputs and data inputs and outputs, for storing said records within said memory storage means during the search;
an address generation means having address outputs coupled to the address inputs of said memory storage means, for generating addresses to access different locations within said memory storage means; and
an interface means having address inputs connected to the address outputs of said address generation unit. - View Dependent Claims (15, 16)
-
-
17. A method of searching a collection of data records to detect records having sets of items (itemsets) and/or selected transactions, to form association rules corresponding to the itemsets detected based on weighted values assigned to the items and transactions, said method comprising the steps of:
-
a) assigning said weights to items and transactions;
b) reading data records in a seriatim manner;
c) incrementing a separate weight counter for each itemset and a selected transaction detected in a record;
d) comparing the weighted support of an itemset and for each selected transaction to a preselected support threshold;
e) generating new supersets from itemsets having a weighted support greater than the preselected support threshold;
f) reading the records identified as containing itemsets with a weighted support greater than the preselected support threshold;
g) incrementing a separate weight counter for each superset detected in a record;
h) comparing the weighted support of each superset to the preselected support threshold;
i) repeating steps a through h until every itemset has been counted; and
j) creating association rules from the itemsets and selected transactions in steps a through i. - View Dependent Claims (18, 19, 20)
-
Specification