Dynamic selection of frequent itemset counting technique
First Claim
1. A machine-implemented method comprising the steps of:
- dynamically selecting which occurrence counting technique to use from a plurality of available occurrence counting techniques by performing the steps of;
generating cost estimates for each of the plurality of available occurrence counting techniques based on an estimated I/O cost of using the available occurrence counting technique,wherein generating cost estimates comprises performing;
determining a size of a candidate prefix tree;
determining an amount of memory that can be used for the candidate prefix tree;
comparing the size of the candidate prefix tree to the amount of memory that can be used to store the candidate prefix tree; and
generating an I/O cost estimate for a prefix tree technique based, at least in part, on the size of the candidate prefix tree and the amount of memory that can be used to store the candidate prefix tree;
selecting the occurrence counting technique that has the lowest cost estimate; and
during a frequent itemset operation, using said selected occurrence counting technique to count occurrences of at least one combination to determine whether said at least one combination satisfies frequency criteria associated with said frequent itemset operation;
wherein the steps of dynamically selecting and using said selected occurrence counting technique are performed by one or more computing devices.
1 Assignment
0 Petitions
Accused Products
Abstract
Techniques are provided for (1) extending SQL to support direct invocation of frequent itemset operations, (2) improving the performance of frequent itemset operations by clustering itemset combinations to more efficiently use previously produced results, and (3) making on-the-fly selection of the occurrence counting technique to use during each phase of a multiple phase frequent itemset operation. When directly invoked in an SQL statement, a frequent itemset operation may receive input from results of operations specified in the SQL statement, and provide its results directly to other operations specified in the SQL statement. By clustering itemset combinations, resources may be used more efficiently by retaining intermediate information as long as it is useful, and then discarding it to free up volatile memory. Dynamically selecting an occurrence counting technique allows a single frequent itemset operation to change the occurrence counting technique that it is using midstream, based on cost considerations and/or environmental conditions.
30 Citations
18 Claims
-
1. A machine-implemented method comprising the steps of:
-
dynamically selecting which occurrence counting technique to use from a plurality of available occurrence counting techniques by performing the steps of; generating cost estimates for each of the plurality of available occurrence counting techniques based on an estimated I/O cost of using the available occurrence counting technique, wherein generating cost estimates comprises performing; determining a size of a candidate prefix tree; determining an amount of memory that can be used for the candidate prefix tree; comparing the size of the candidate prefix tree to the amount of memory that can be used to store the candidate prefix tree; and generating an I/O cost estimate for a prefix tree technique based, at least in part, on the size of the candidate prefix tree and the amount of memory that can be used to store the candidate prefix tree; selecting the occurrence counting technique that has the lowest cost estimate; and during a frequent itemset operation, using said selected occurrence counting technique to count occurrences of at least one combination to determine whether said at least one combination satisfies frequency criteria associated with said frequent itemset operation; wherein the steps of dynamically selecting and using said selected occurrence counting technique are performed by one or more computing devices. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A machine-implemented method comprising the steps of:
-
dynamically selecting which occurrence counting technique to use from a plurality of available occurrence counting techniques based on conditions existing before a frequent itemset operation is performed in a computing environment in which the frequent itemset operation is to be performed, wherein the conditions include how busy a computer system in which the frequent itemset operation is to be performed currently is; and during said frequent itemset operation, using said selected occurrence counting technique to count occurrences of at least one combination to determine whether said at least one combination satisfies frequency criteria associated with said frequent itemset operation; wherein the steps of dynamically selecting and using said selected occurrence counting technique are performed by one or more computing devices; wherein; the frequent itemset operation is performed in a plurality of phases, wherein each phase is associated with combinations that have a particular number of items; the step of dynamically selecting includes dynamically selecting which occurrence counting technique to use for at least one phase of said plurality of phases; and the step of using includes using said selected occurrence counting technique to determine whether candidate combinations for said at least one phase satisfy said frequency criteria; said at least one phase is a phase during which combinations having N items are processed; a first occurrence counting technique is selected for said phase of said frequent itemset operation; dynamically selecting a second occurrence counting technique in the phase of a subsequent frequent itemset operation during which combinations having N items are processed; and wherein the first occurrence counting technique is different from said second occurrence counting technique. - View Dependent Claims (8, 9)
-
-
10. A volatile or non-volatile computer-readable storage medium storing one or more sequences of instruction, wherein execution of the one or more sequences of instruction by one or more processors causes the one or more processors to perform:
-
dynamically selecting which occurrence counting technique to use from a plurality of available occurrence counting techniques by performing the steps of; generating cost estimates for each of the plurality of available occurrence counting techniques based on an estimated I/O cost of using the available occurrence counting technique, wherein generating cost estimates comprises performing; determining a size of a candidate prefix tree; determining an amount of memory that can be used for the candidate prefix tree; comparing the size of the candidate prefix tree to the amount of memory that can be used to store the candidate prefix tree; and generating an I/O cost estimate for a prefix tree technique based, at least in part, on the size of the candidate prefix tree and the amount of memory that can be used to store the candidate prefix tree; selecting the occurrence counting technique that has the lowest cost estimate; and during a frequent itemset operation, using said selected occurrence counting technique to count occurrences of at least one combination to determine whether said at least one combination satisfies frequency criteria associated with said frequent itemset operation. - View Dependent Claims (11, 12, 13, 14, 15)
-
-
16. A volatile or non-volatile computer-readable storage medium storing one or more sequences of instruction, wherein execution of the one or more sequences of instruction by one or more processors causes the one or more processors to perform:
-
dynamically selecting which occurrence counting technique to use from a plurality of available occurrence counting techniques based on conditions existing before a frequent itemset operation is performed in a computing environment in which the frequent itemset operation is to be performed, wherein the conditions include how busy a computer system in which the frequent itemset operation is to be performed currently is, and an amount of volatile memory available to store a candidate prefix tree; and during said frequent itemset operation, using said selected occurrence counting technique to count occurrences of at least one combination to determine whether said at least one combination satisfies frequency criteria associated with said frequent itemset operation; wherein; the frequent itemset operation is performed in a plurality of phases, wherein each phase is associated with combinations that have a particular number of items; the dynamically selecting includes dynamically selecting which occurrence counting technique to use for at least one phase of said plurality of phases; and the using includes using said selected occurrence counting technique to determine whether candidate combinations for said at least one phase satisfy said frequency criteria; said at least one phase is a phase during which combinations having N items are processed; a first occurrence counting technique is selected for said phase of said frequent itemset operation; dynamically selecting a second occurrence counting technique in the phase of a subsequent frequent itemset operation during which combinations having N items are processed; and wherein the first occurrence counting technique is different from said second occurrence counting technique. - View Dependent Claims (17, 18)
-
Specification