Method and apparatus for finding maximal frequent itemsets over data streams
First Claim
1. A method for finding maximal frequent itemsets over data streams including continuously generated transactions, the method comprising:
- when itemsets included in previously generated transactions and appearance frequencies of the itemsets are managed by a prefix tree and each of nodes of the prefix tree has information about an appearance frequency of an itemset corresponding to each node, a maximum lifetime which is a maximum point in time that allows the corresponding itemset to remain in a frequent state even if the corresponding itemset does not appear later, and a mark indicating whether the corresponding itemset is a maximal frequent itemset,(a) receiving transaction Tk generated at a current point in time;
(b) updating the information owned by each node corresponding to the itemset of the transaction Tk among the nodes of the prefix tree;
(c) adding each node, which is not managed in the prefix tree among nodes corresponding to the itemset of the transaction Tk, to the prefix tree, and setting the information on the added nodes; and
(d) finding maximal frequent itemsets by visiting each node of the prefix tree which has the mark indicating the maximal frequent itemset, and checking whether the corresponding itemset is frequent;
wherein the step (b) allows the mark owned by the node corresponding to the maximal frequent itemset, which belongs to subsets of the itemset of the transaction Tk, among the nodes of the prefix tree, to indicate the maximal frequent itemset only when a maximum error generated in a process of estimating a support of the maximal frequent itemset is within a maximum error threshold, andwherein the node corresponding to the maximal frequent itemset is a node whose maximum lifetime is updated from before the current point in time k to after the current point in time k or a node not having children nodes corresponding to the itemset of the transaction Tk, whose support according to the appearance frequency is equal to or more than a minimum support having a predetermined value.
2 Assignments
0 Petitions
Accused Products
Abstract
A method and apparatus to find maximal frequent itemsets over data streams. A prefix tree manages itemsets and appearance frequencies of the itemsets, and each of nodes of the prefix tree has information about an appearance frequency, a maximum lifetime, and a mark indicating whether the corresponding itemset is a maximal frequent itemset. The method includes: receiving transaction Tk generated at a current point in time; updating the information owned by each node corresponding to the itemset of the transaction Tk among the nodes of the prefix tree; adding each node that is not managed in the prefix tree among nodes corresponding to the itemset of the transaction Tk, to the prefix tree and setting the information on the added nodes; and finding maximal frequent itemsets by visiting each node of the prefix tree that has the mark indicating the maximal frequent itemset and checking whether the corresponding itemset is frequent.
-
Citations
13 Claims
-
1. A method for finding maximal frequent itemsets over data streams including continuously generated transactions, the method comprising:
-
when itemsets included in previously generated transactions and appearance frequencies of the itemsets are managed by a prefix tree and each of nodes of the prefix tree has information about an appearance frequency of an itemset corresponding to each node, a maximum lifetime which is a maximum point in time that allows the corresponding itemset to remain in a frequent state even if the corresponding itemset does not appear later, and a mark indicating whether the corresponding itemset is a maximal frequent itemset, (a) receiving transaction Tk generated at a current point in time; (b) updating the information owned by each node corresponding to the itemset of the transaction Tk among the nodes of the prefix tree; (c) adding each node, which is not managed in the prefix tree among nodes corresponding to the itemset of the transaction Tk, to the prefix tree, and setting the information on the added nodes; and (d) finding maximal frequent itemsets by visiting each node of the prefix tree which has the mark indicating the maximal frequent itemset, and checking whether the corresponding itemset is frequent; wherein the step (b) allows the mark owned by the node corresponding to the maximal frequent itemset, which belongs to subsets of the itemset of the transaction Tk, among the nodes of the prefix tree, to indicate the maximal frequent itemset only when a maximum error generated in a process of estimating a support of the maximal frequent itemset is within a maximum error threshold, and wherein the node corresponding to the maximal frequent itemset is a node whose maximum lifetime is updated from before the current point in time k to after the current point in time k or a node not having children nodes corresponding to the itemset of the transaction Tk, whose support according to the appearance frequency is equal to or more than a minimum support having a predetermined value. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A non-transitory recording medium readable by a computer recording a program for running a method for finding maximal frequent itemsets over data streams, the method comprising:
-
when itemsets included in previously generated transactions and appearance frequencies of the itemsets are managed by a prefix tree and each of nodes of the prefix tree has information about an appearance frequency of an itemset corresponding to each node, a maximum lifetime which is a maximum point in time that allows the corresponding itemset to remain in a frequent state even if the corresponding itemset does not appear later, and a mark indicating whether the corresponding itemset is a maximal frequent itemset, (a) receiving transaction Tk generated at a current point in time; (b) updating the information owned by each node corresponding to the itemset of the transaction Tk among the nodes of the prefix tree; (c) adding each node, which is not managed in the prefix tree among nodes corresponding to the itemset of the transaction Tk, to the prefix tree, and setting the information on the added nodes; and (d) finding maximal frequent itemsets by visiting each node of the prefix tree which has the mark indicating the maximal frequent itemset, and checking whether the corresponding itemset is frequent, wherein the step (b) allows the mark owned by the node corresponding to the maximal frequent itemset, which belongs to subsets of the itemset of the transaction Tk, among the nodes of the prefix tree, to indicate the maximal frequent itemset only when a maximum error generated in a process of estimating a support of the maximal frequent itemset is within a maximum error threshold, and wherein the node corresponding to the maximal frequent itemset is a node whose maximum lifetime is updated from before the current point in time k to after the current point in time k or a node not having children nodes corresponding to the itemset of the transaction Tk, whose support according to the appearance frequency is equal to or more than a minimum support having a predetermined value.
-
-
8. An apparatus for finding maximal frequent itemsets over data streams including continuously generated transactions including:
-
a memory for storing a prefix tree which manages itemsets included in previously generated transactions and appearance frequencies of the itemsets, wherein each of nodes of the prefix tree has information about an appearance frequency of an itemset corresponding to each node, a maximum lifetime which is a maximum point in time that allows the corresponding itemset to remain in a frequent state even if the corresponding itemset does not appear later, and a mark indicating whether the corresponding itemset is a maximal frequent itemset; an information updating unit configured to update the information owned by each node corresponding to an itemset of a transaction Tk generated at a current point in time among the nodes of the prefix tree; an information setting unit configured to add each node which is not managed in the prefix tree among nodes corresponding to the itemset of the transaction Tk, to the prefix tree, and to set the information on the added nodes; and a maximal frequent itemsets finding unit configured to find maximal frequent itemsets by visiting each node of the prefix tree which has the mark indicating the maximal frequent itemset and to check whether the corresponding itemset is frequent, wherein the information updating unit allows the mark owned by the node corresponding to the maximal frequent itemset, which belongs to subsets of the itemset of the transaction Tk, among the nodes of the prefix tree, to indicate the maximal frequent itemset only when a maximum error generated in a process of estimating a support of the maximal frequent itemset is within a maximum error threshold, and wherein the node corresponding to the maximal frequent itemset is a node whose maximum lifetime is updated from before the current point in time k to after the current point in time k or a node not having children nodes corresponding to the itemset of the transaction Tk, whose support according to the appearance frequency is equal to or more than a minimum support having a predetermined value. - View Dependent Claims (9, 10, 11, 12, 13)
-
Specification