Method and apparatus for dynamically counting large itemsets
First Claim
Patent Images
1. A method for selectively retrieving records that contain recognizable items from a plurality of records collectively stored seriatim in an electronic database, wherein said method comprises:
- reading each record in the electronic database in a substantially sequential flow;
detecting each record that contains a first recognized item;
incrementing a first register to keep track of the records identified as containing said first recognized item;
continuously comparing the number of records in the first register to a preset value to determine if a first threshold has been reached;
reading each record that has been previously identified as containing a first recognized item, wherein the reading of said records containing a first recognized item begins at any record that has been previously identified as containing a first recognized item, and continues in a substantially sequential flow;
detecting in said records containing a first recognized item those records that also contain a second recognized item;
incrementing a second register to keep track of the records identified as containing a set having both of said first and said second recognized items;
continuously comparing the number of records in the second register to a preset value to determine if a second threshold has been reached; and
repeating the above recited steps for sets having a plurality of recognized items until all recognized sets of items having a record count exceeding preset thresholds are detected.
1 Assignment
0 Petitions
Accused Products
Abstract
The present invention is directed to a data mining method and apparatus that dynamically initiates the counting of sets of items (itemsets) at any time during the pass over the records of a database and terminates the counting at the same location in the next pass. In this manner, the present invention begins to count itemsets early and finishes counting early while keeping the number of different itemsets which are being counted in any pass relatively low.
-
Citations
20 Claims
-
1. A method for selectively retrieving records that contain recognizable items from a plurality of records collectively stored seriatim in an electronic database, wherein said method comprises:
-
reading each record in the electronic database in a substantially sequential flow;
detecting each record that contains a first recognized item;
incrementing a first register to keep track of the records identified as containing said first recognized item;
continuously comparing the number of records in the first register to a preset value to determine if a first threshold has been reached;
reading each record that has been previously identified as containing a first recognized item, wherein the reading of said records containing a first recognized item begins at any record that has been previously identified as containing a first recognized item, and continues in a substantially sequential flow;
detecting in said records containing a first recognized item those records that also contain a second recognized item;
incrementing a second register to keep track of the records identified as containing a set having both of said first and said second recognized items;
continuously comparing the number of records in the second register to a preset value to determine if a second threshold has been reached; and
repeating the above recited steps for sets having a plurality of recognized items until all recognized sets of items having a record count exceeding preset thresholds are detected. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A method of dynamically searching a collection of data records to detect records having sets of items (itemsets) and to form association rules corresponding to the itemsets detected, said method comprising the steps of:
-
a. reading data records in a seriatim manner;
b. incrementing a separate counter for each itemset detected in a record;
c. reading records that have already been detected as containing an itemset in step a, starting with any record having a first itemset and continuing in a sequential manner through records identified as having a first item set;
d. generating new itemsets;
e. incrementing separate counters for each itemset detected in a record;
f. returning to the beginning of any unread records after reading the last record in the collection of data records;
g. repeating steps a through f until every itemset has been counted; and
h. creating association rules from the itemsets counted in steps a through g. - View Dependent Claims (9, 10, 11, 12, 13, 14, 15, 16, 17)
-
-
18. A programmable general purpose computer apparatus for dynamically searching a file of records collectively stored in an electronic database, wherein said records contain at least one item, the search dynamically determining sets of items (itemsets) and searching for the generated itemsets among the records by starting at any record and reading the records in a sequential flow until all records before and after the initial record in the file are read, said apparatus comprising:
-
a processor means for performing decision making, control operations and data manipulation;
an array of memory storage means having address inputs and data inputs and outputs, for storing said records in different locations within said memory storage means during the search;
an address generation means having address outputs coupled to the address inputs of said memory storage means, for generating addresses to access different locations within said memory storage means; and
an interface means having address inputs connected to the address outputs of said address generation unit. - View Dependent Claims (19, 20)
-
Specification