FREQUENT ITEMSET COUNTING USING CLUSTERED PREFIXES AND INDEX SUPPORT
First Claim
1. A method for performing a frequent itemset operation, the method comprising the steps of:
- performing the frequent itemset operation in a plurality of phases, wherein each phase is associated with combinations that have a particular number of items;
during at least one phase of the plurality of phases, performing the steps of;
determining candidate combinations that are to be evaluated during said at least one phase;
processing said candidate combinations to determine whether said candidate combinations satisfy a frequency criteria associated with said frequent itemset operation, wherein the step of processing the candidate combinations includes generating bitmaps for said candidate combinations; and
storing a set of bitmaps that are generated during said at least one phase;
during a subsequent phase of said plurality of phases, performing the steps of;
retrieving bitmaps, from said set of bitmaps, from storage; and
using the retrieved bitmaps to generate bitmaps for candidate combinations of said subsequent phase;
wherein the steps of the method are performed by one or more computing devices.
0 Assignments
0 Petitions
Accused Products
Abstract
Techniques are provided for (2) extending SQL to support direct invocation of frequent itemset operations, (3) improving the performance of frequent itemset operations by clustering itemset combinations to more efficiently use previously produced results, and (4) making on-the-fly selection of the occurrence counting technique to use during each phase of a multiple phase frequent itemset operation. When directly invoked in an SQL statement, a frequent itemset operation may receive input from results of operations specified in the SQL statement, and provide its results directly to other operations specified in the SQL statement. By clustering itemset combinations, resources may be used more efficiently by retaining intermediate information as long as it is useful, and then discarding it to free up volatile memory. Dynamically selecting an occurrence counting technique allows a single frequent itemset operation to change the occurrence counting technique that it is using midstream, based on cost considerations and/or environmental conditions.
31 Citations
22 Claims
-
1. A method for performing a frequent itemset operation, the method comprising the steps of:
-
performing the frequent itemset operation in a plurality of phases, wherein each phase is associated with combinations that have a particular number of items; during at least one phase of the plurality of phases, performing the steps of; determining candidate combinations that are to be evaluated during said at least one phase; processing said candidate combinations to determine whether said candidate combinations satisfy a frequency criteria associated with said frequent itemset operation, wherein the step of processing the candidate combinations includes generating bitmaps for said candidate combinations; and storing a set of bitmaps that are generated during said at least one phase; during a subsequent phase of said plurality of phases, performing the steps of; retrieving bitmaps, from said set of bitmaps, from storage; and using the retrieved bitmaps to generate bitmaps for candidate combinations of said subsequent phase; wherein the steps of the method are performed by one or more computing devices. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A computer-readable storage medium storing one or more sequences of instructions comprising instructions which, when executed by one or more processors, cause:
-
performing a frequent itemset operation in a plurality of phases, wherein each phase is associated with combinations that have a particular number of items; during at least one phase of the plurality of phases, performing instructions that cause; determining candidate combinations that are to be evaluated during said at least one phase; processing said candidate combinations to determine whether said candidate combinations satisfy a frequency criteria associated with said frequent itemset operation, wherein processing the candidate combinations includes generating bitmaps for said candidate combinations; and storing a set of bitmaps that are generated during said at least one phase; during a subsequent phase of said plurality of phases, performing instructions that cause; retrieving bitmaps, from said set of bitmaps, from storage; and using the retrieved bitmaps to generate bitmaps for candidate combinations of said subsequent phase. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20, 21, 22)
-
Specification