Expressing frequent itemset counting operations
First Claim
1. A method for performing a frequent itemset operation, the method comprising the steps of:
- receiving a database statement that specifies (1) a function name of a table function that identifies which itemsets occur together most frequently in a particular item group population and (2) a plurality of input parameters that are input parameters to the table function;
wherein the plurality of input parameters includesa parameter for a support threshold that indicates a ratio,a parameter for a cursor that indicates the particular item group population, anda parameter for a minimum length that indicates a minimum length for frequent itemsets that are identified by the table function;
wherein the ratio indicates what percentage of transactions, of a particular set of transactions, must contain a given itemset for the given itemset to qualify as a frequent itemset;
in response to receiving the database statement, calling the table function and passing, as input to the table function, values for each of the plurality of input parameters;
wherein results returned by the table function, in response to calling the table function, identify which itemsets occur together most frequently in the particular item group population, excluding all itemsets that (a) include fewer items than the minimum length and (b) do not satisfy the support threshold;
wherein the method is performed by one or more computing devices.
1 Assignment
0 Petitions
Accused Products
Abstract
Techniques are provided for (1) extending SQL to support direct invocation of frequent itemset operations, (2) improving the performance of frequent itemset operations by clustering itemset combinations to more efficiently use previously produced results, and (3) making on-the-fly selection of the occurrence counting technique to use during each phase of a multiple phase frequent itemset operation. When directly invoked in an SQL statement, a frequent itemset operation may receive input from results of operations specified in the SQL statement, and provide its results directly to other operations specified in the SQL statement. By clustering itemset combinations, resources may be used more efficiently by retaining intermediate information as long as it is useful, and then discarding it to free up volatile memory. Dynamically selecting an occurrence counting technique allows a single frequent itemset operation to change the occurrence counting technique that it is using midstream, based on cost considerations and/or environmental conditions.
-
Citations
28 Claims
-
1. A method for performing a frequent itemset operation, the method comprising the steps of:
-
receiving a database statement that specifies (1) a function name of a table function that identifies which itemsets occur together most frequently in a particular item group population and (2) a plurality of input parameters that are input parameters to the table function; wherein the plurality of input parameters includes a parameter for a support threshold that indicates a ratio, a parameter for a cursor that indicates the particular item group population, and a parameter for a minimum length that indicates a minimum length for frequent itemsets that are identified by the table function; wherein the ratio indicates what percentage of transactions, of a particular set of transactions, must contain a given itemset for the given itemset to qualify as a frequent itemset; in response to receiving the database statement, calling the table function and passing, as input to the table function, values for each of the plurality of input parameters; wherein results returned by the table function, in response to calling the table function, identify which itemsets occur together most frequently in the particular item group population, excluding all itemsets that (a) include fewer items than the minimum length and (b) do not satisfy the support threshold; wherein the method is performed by one or more computing devices. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. One or more non-transitory storage media storing instructions for performing a frequent itemset operation, wherein the instructions, when executed by one or more processors cause:
-
receiving a database statement that specifies (1) a function name of a table function that identifies which itemsets occur together most frequently in a particular item group population and (2) a plurality of input parameters that are input parameters to the table function; wherein the plurality of input parameters includes a parameter for a support threshold that indicates a ratio, a parameter for a cursor that indicates the particular item group population, and a parameter for a minimum length that indicates a minimum length for frequent itemsets that are identified by the table function; wherein the ratio indicates what percentage of transactions, of a particular set of transactions, must contain a given itemset for the given itemset to qualify as a frequent itemset; in response to receiving the database statement, calling the table function and passing, as input to the table function, values for each of the plurality of input parameters; wherein results returned by the table function, in response to calling the table function, identify which itemsets occur together most frequently in the particular item group population, excluding all itemsets that (a) include fewer items than the minimum length and (b) do not satisfy the support threshold. - View Dependent Claims (16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28)
-
Specification