Methods and system for mining frequent patterns
First Claim
Patent Images
1. A method for identifying patterns from a database of records, each record having a plurality of items, the method comprising:
- constructing an FP-tree for the database; and
, mining the FP-tree to obtain frequent patterns wherein constructing the FP-tree comprises;
scanning the database to obtain an ordered list of frequent items in the database;
for each record in the database;
creating a list of any frequent items occurring in that record in the same order as the frequent items occur in the ordered list;
setting a root node of the FP-tree as a current node; and
, for each item in the list of any frequent items, determining whether there is a nods directly linked to the current node which corresponds to the item, if so, incrementing a counter for the node and setting the node as the current node; and
, if not, creating a node corresponding to the item and linked to the current node and setting the created node as the current node.
2 Assignments
0 Petitions
Accused Products
Abstract
This invention provides methods apparatus and data structures useful for mining databases for frequent items. The invention uses a frequent pattern tree to represent the contents of a database in a manner which is conducive to data mining. The frequent pattern tree tends to be smaller than the original database. A frequent pattern tree can be mined recursively. The frequent pattern tree and associated methods and apparatus of this invention is relatively fast, efficient and scalable and can be used to mine both long and short frequent patterns.
-
Citations
34 Claims
-
1. A method for identifying patterns from a database of records, each record having a plurality of items, the method comprising:
-
constructing an FP-tree for the database; and
,mining the FP-tree to obtain frequent patterns wherein constructing the FP-tree comprises;
scanning the database to obtain an ordered list of frequent items in the database;
for each record in the database;
creating a list of any frequent items occurring in that record in the same order as the frequent items occur in the ordered list;
setting a root node of the FP-tree as a current node; and
,for each item in the list of any frequent items, determining whether there is a nods directly linked to the current node which corresponds to the item, if so, incrementing a counter for the node and setting the node as the current node; and
,if not, creating a node corresponding to the item and linked to the current node and setting the created node as the current node. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 33)
a) for each frequent item constructing a conditional pattern-base, and constructing a conditional FP-tree from the conditional pattern-base;
b) recursively constructing a conditional pattern-base, and constructing a conditional FP-tree from the conditional pattern-base on each newly created conditional FP-tree until the resulting FP-tree is empty; and
,c) after creating each FP-tree, collecting frequent itemsets from the FP-tree.
-
-
7. The method of claim 6 comprising, determining whether a conditional FP-tree contains only one path and, of so, generating all combinations of sub-paths of the FP-tree and recording each sub-paths as a frequent pattern.
-
8. The method of claim 7 comprising counting the occurrences of each frequent itemset.
-
9. The method of claim 6 wherein collecting frequent itemsets from each FP-tree comprises:
if the FP-tree contains a single path, P, then, for each combination, β
, of the nodes in path P generating the pattern β
∪
α
.
-
10. The method of claim 9 comprising assigning to the pattern a support equal to the minimum support of the nodes in path P.
-
11. The method of claim 9 comprising, for each item ai in the header data structure generating a pattern β
- =ai ∪
α
;constructing a conditional pattern base for β and
constructing a conditional FP-tree based on the conditional pattern base; and
,if the conditional FP-tree is not empty, collecting frequent itemsets from the conditional FP-tree.
- =ai ∪
-
33. A program product comprising a medium carrying a set of computer-readable signals containing computer-executable instructions which, when run by a computer, cause the computer to execute the method of claim 1.
-
12. A method for identifying patterns from a database of records, each record having a plurality of items, the method comprising:
-
constructing an FP-tree for the database; and
,mining the FP-tree to obtain frequent patterns wherein mining the FP-tree to obtain frequent patterns comprises;
a) for each frequent item constructing a conditional pattern-base, and constructing a conditional FP-tree from the conditional pattern-base;
b) recursively constructing a conditional pattern-base, and constructing a conditional FP-tree from the conditional pattern-base on each newly created conditional FP-tree until the resulting FP-tree is empty; and
,c) after creating each FP-tree, collecting frequent itemsets from the FP-tree. - View Dependent Claims (13, 14, 15, 16, 17)
if the FP-tree contains a single path, P, then, for each combination, β
, of the nodes in path P generating the pattern β
∪
α
.
-
-
16. The method of claim 15 comprising assigning to the pattern a support equal to the minimum support of the nodes in path P.
-
17. The method of claim 15 comprising, for each item ai in the header data structure generating a pattern β
- =ai ∪
α
;constructing a conditional pattern base for β and
constructing a conditional FP-tree based on the conditional pattern base; and
,if the conditional FP-tree is not empty, collecting frequent itemsets from the conditional FP-tree.
- =ai ∪
-
18. A method for identifying patterns from a database of records, each record having a plurality of items, the method comprising providing an FP-tree corresponding to the database and mining the FP-tree to obtain frequent patterns by:
-
a) for each frequent item constructing a conditional pattern-base, and constructing a conditional FP-tree from the conditional pattern-base;
b) recursively constructing a conditional pattern-base, and constructing a conditional FP-tree from the conditional pattern-base on each newly created conditional FP-tree until the resulting FP-tree is empty; and
,c) after creating each FP-tree, collecting frequent itemsets from the FP-tree. - View Dependent Claims (19, 20, 21, 22, 23)
if the FP-tree contains a single path, P, then, for each combination, β
, of the nodes in path P generating the pattern β
∪
α
.
-
-
22. The method of claim 21 comprising assigning to the pattern a support equal to the minimum support of the nodes in path P.
-
23. The method of claim 21 comprising, for each item ai in the header data structure generating a pattern β
- =ai ∪
α
;constructing a conditional pattern base for β and
constructing a conditional FP-tree based on the conditional pattern base; and
,if the conditional FP-tree is not empty, collecting frequent itemsets from the conditional FP-tree.
- =ai ∪
-
24. A method for constructing an FP-tree corresponding to a database and containing information useful for identifying frequent patterns in the database, the method comprising:
-
scanning the database to obtain an ordered list of frequent items in the database;
for each record in the database;
creating a list of any frequent items occurring in that record in the same order as the frequent items occur in the ordered list;
setting a root node of the FP-tree as a current node; and
,for each item in the list of any frequent items, determining whether there is a node directly linked to the current node which corresponds to the item, if so, incrementing a counter for the node and setting the node as the current node; and
,if not, creating a node corresponding to the item and linked to the current node and setting the created node as the current node. - View Dependent Claims (25, 26, 27, 28)
-
-
29. An FP-tree data structure for use in mining frequent patterns from a database containing a plurality of records, the FP-tree data structure resident in a storage device accessible to a computer and comprising
a plurality of linked nodes and a header structure; -
the header structure comprising an ordered list of frequent items from a database and a pointer to at least one node associated with each of the frequent items;
each of the linked nodes associated with one of the frequent items of the header structure, the nodes linked to form a plurality of paths, the nodes on each of the paths corresponding to frequent items present in a record of the database;
wherein nodes associated with a selected one of the frequent items of the header structure are accessible by traversing the nodes beginning at the pointer in the header structure corresponding to the selected one of the frequent items, each of the nodes comprises a pointer capable of identifying another one of the nodes associated with the same one of the frequent items and traversing the nodes comprises sequentially following the pointers in the nodes discovered by beginning at the pointer in the header structure corresponding to the selected one of the frequent items. - View Dependent Claims (30, 31)
-
-
32. Apparatus for mining frequent patterns from information in a database comprising a plurality of records, the apparatus comprising:
-
a) a computer processor;
b) a database having records accessible to the computer processor;
c) a program store accessible to the computer processor;
d) a data store accessible to the computer processor;
e) software instructions recorded in the program store, the software instructions, executable by the computer processor, the software instructions, when executed, causing the computer processor to;
i) scan the database to obtain a list of frequent items in the database;
ii) create an ordered list of the frequent items ordered in order of frequency in the database; and
,iii) based on the ordered list of the frequent items and information from the database, create an FP-tree data structure corresponding to the database in the data store wherein a set of the software instructions which cause the processor to create an FP-tree data structure in the data store cause the processor to retrieve each record in the database;
select any frequent items occurring in that record in the same order as the frequent items occur in the ordered list;
set a root node of the FP-tree as a current node; and
,for each selected frequent item, determine whether the FP-tree includes a node directly linked to the current node which corresponds to the item, if so, increment a counter for the node and set the node as the current node; and
,if not, create a new node corresponding to the item and linked to the current node and setting the new node as the current node.
-
-
34. A program product comprising a medium carrying a set of computer-readable signals containing computer-executable instructions which, when run by a computer, cause the computer to scan a database to obtain an ordered list of frequent items in the database;
- and, to create an FP-tree in a storage medium accessible to the computer by;
for each record in the database;
creating a list of any frequent items occurring in that record in the same order as the frequent items occur in the ordered list;
setting a root node of the FP-tree as a current node; and
,for each item in the list of any frequent items, determining whether there is a node directly linked to the current node which corresponds to the item, if so, incrementing a counter for the node and setting the node as the current node; and
,if not, creating a node corresponding to the item and linked to the current node and setting the created node as the current node.
- and, to create an FP-tree in a storage medium accessible to the computer by;
Specification