Frequent itemset counting using subsets of bitmaps
First Claim
1. A machine-implemented method, comprising:
- assigning each item group, within a set of item groups, to one of a plurality of item group subsets based at least in part on an amount of volatile memory available in a computer system, wherein each item group subset includes one or more entire item groups,wherein each item group is associated with an item group bitmap that indicates which items, of a set of items, belong in said item group;
determining a set of candidate combinations of two or more items from the set of items;
for each candidate combination of two or more items in said set of candidate combinations, performing the step of;
in each item group subset, comparing an item bitmap associated with an item in the candidate combination with each other item bitmap associated with one or more other items in the candidate combination, to determine a partial frequent itemset count for the candidate combination in the item group subset, wherein the partial frequent itemset count comprises the total number of item groups in the item group subset that contain the candidate combination;
combining the partial frequent itemset counts of the item group subsets to derive a total frequent itemset count for the set of candidate combinations; and
storing said total frequent itemset count in a volatile or non-volatile computer-readable storage medium;
wherein the method is performed by one or more computing devices.
1 Assignment
0 Petitions
Accused Products
Abstract
A method and mechanism for performing improved frequent itemset operations is provided. A set of item groups are divided into a plurality of subsets. Each item group is composed of a set of data items. Possible combinations of data items that may frequently appear together in the same item group are referred to as candidate combinations. Candidate combinations comprising a first set of data items are identified, and thereafter the occurrence of each candidate combination in any item group in each subset is counted by comparing item bitmaps, associated with items in the candidate combination, in each subset in turn. The comparison of item bitmaps is performed in volatile memory. A total frequent itemset count that describes the frequency of candidate combinations in items groups across all subsets is obtained. Thereafter, the total frequent itemset count for candidate combinations having a larger number of data items may be determined.
20 Citations
12 Claims
-
1. A machine-implemented method, comprising:
-
assigning each item group, within a set of item groups, to one of a plurality of item group subsets based at least in part on an amount of volatile memory available in a computer system, wherein each item group subset includes one or more entire item groups, wherein each item group is associated with an item group bitmap that indicates which items, of a set of items, belong in said item group; determining a set of candidate combinations of two or more items from the set of items; for each candidate combination of two or more items in said set of candidate combinations, performing the step of; in each item group subset, comparing an item bitmap associated with an item in the candidate combination with each other item bitmap associated with one or more other items in the candidate combination, to determine a partial frequent itemset count for the candidate combination in the item group subset, wherein the partial frequent itemset count comprises the total number of item groups in the item group subset that contain the candidate combination; combining the partial frequent itemset counts of the item group subsets to derive a total frequent itemset count for the set of candidate combinations; and storing said total frequent itemset count in a volatile or non-volatile computer-readable storage medium; wherein the method is performed by one or more computing devices. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A non-transitory machine-readable storage medium carrying one or more sequences of instructions, wherein execution of the one or more sequences of instructions by one or more processors causes the one or more processors to perform the steps of:
-
assigning each item group, within a set of item groups, to one of a plurality of item group subsets based at least in part on an amount of volatile memory available in a computer system, wherein each item group subset includes one or more entire item groups, wherein each item group is associated with an item group bitmap that indicates which items, of a set of items, belong in said item group; determining a set of candidate combinations of two or more items from the set of items; for each candidate combination of two or more items in the set of candidate combinations, performing the step of; in each item group subset, comparing an item bitmap associated with an item in the candidate combination with each other item bitmap associated with one or more other items in the candidate combination, to determine a partial frequent itemset count for the candidate combination in the item group subset, wherein the partial frequent itemset count comprises the total number of item groups in the item group subset that contain the candidate combination; combining the partial frequent itemset counts of the item group subsets to derive a total frequent itemset count for the set of candidate combinations; and storing said total frequent itemset count in a volatile or non-volatile computer-readable storage medium. - View Dependent Claims (8, 9, 10, 11, 12)
-
Specification