Frequent itemset counting using clustered prefixes and index support
First Claim
1. A method for performing a frequent itemset operation, the method comprising the steps of:
- performing the frequent itemset operation in a plurality of phases, wherein each phase is associated with combinations that have a particular number of items;
during at least one phase of the plurality of phases, performing the steps of determining candidate combinations that are to be evaluated during the phase;
grouping the candidate combinations into clusters, wherein each cluster corresponds to a common combination of items, and wherein all candidate combinations in a given cluster include the common combination of items associated with the cluster;
processing said candidate combinations, based on said clusters, to determine whether the candidate combinations satisfy a frequency criteria associated with said frequent itemset operation; and
storing, in a computer-readable medium, data that indicates which candidate combinations satisfy the frequency criteria associated with said frequent itemset operation.
1 Assignment
0 Petitions
Accused Products
Abstract
Techniques are provided for (1) extending SQL to support direct invocation of frequent itemset operations, (2) improving the performance of frequent itemset operations by clustering itemset combinations to more efficiently use previously produced results, and (3) making on-the-fly selection of the occurrence counting technique to use during each phase of a multiple phase frequent itemset operation. When directly invoked in an SQL statement, a frequent itemset operation may receive input from results of operations specified in the SQL statement, and provide its results directly to other operations specified in the SQL statement. By clustering itemset combinations, resources may be used more efficiently by retaining intermediate information as long as it is useful, and then discarding it to free up volatile memory. Dynamically selecting an occurrence counting technique allows a single frequent itemset operation to change the occurrence counting technique that it is using midstream, based on cost considerations and/or environmental conditions.
-
Citations
14 Claims
-
1. A method for performing a frequent itemset operation, the method comprising the steps of:
-
performing the frequent itemset operation in a plurality of phases, wherein each phase is associated with combinations that have a particular number of items; during at least one phase of the plurality of phases, performing the steps of determining candidate combinations that are to be evaluated during the phase; grouping the candidate combinations into clusters, wherein each cluster corresponds to a common combination of items, and wherein all candidate combinations in a given cluster include the common combination of items associated with the cluster; processing said candidate combinations, based on said clusters, to determine whether the candidate combinations satisfy a frequency criteria associated with said frequent itemset operation; and storing, in a computer-readable medium, data that indicates which candidate combinations satisfy the frequency criteria associated with said frequent itemset operation. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
Specification