Method and apparatus for determining distinct cardinality dual hash bitmaps
First Claim
1. In a computer system having a processor coupled to a bus and a computer readable memory unit coupled to said bus, a method for determining distinct cardinality of a data sample, said method comprising the steps of:
- (a) receiving said data sample, said data sample containing C entries;
(b) determining a first distinct cardinality value of said data sample using a hashing function and a first bitmap, said first bitmap comprising x entries;
(c) determining a second distinct cardinality value of said data sample using a hashing function and a fractional bitmap that is a fraction of a second bitmap, said second bitmap comprising y logical entries and said fractional bitmap comprising z entries wherein y is larger than x; and
(d) selecting between said first distinct cardinality value and said second distinct cardinality value as said distinct cardinality of said data sample, wherein said first bitmap and said fractional bitmap are separate bitmaps and are stored within said computer readable memory unit.
2 Assignments
0 Petitions
Accused Products
Abstract
A method and apparatus for determining distinct cardinality of a data sample using dual hash bitmaps. Two different bitmaps determine the distinct cardinality of the data sample (e.g., a column of data within a data table of a database). A small sized bitmap, M*sqrt(C*K), is used where M is a programmable value that reduces collision error, C is the size of the column ("data sample size"), and K is a key density value. Once selected, both M and K are constant. Sample entry values are hashed by a hash function and a modulo function determines an entry into the first bitmap. Based on the bitmap'"'"'s bit entries, a first counter is updated, or not, to maintain a first distinct cardinality value. A large bitmap is used having a size, M*C, but only a small fraction is actually used, M*sqrt(C*K). Only hashed column entries falling inside the fraction are processed as above to maintain a second counter. At the end of the data sample entry processing, the second counter is extrapolated to the large bitmap size. Expected collision error compensation is computed and compensated for the first and second counters. Distribution error is computed for the second counter and added with its compensated collision error. The total errors of the first and second counters are compared to select an output distinct cardinality. The distinct cardinality measurement requires sub-linear increase in memory for each linear increase in sample size.
-
Citations
23 Claims
-
1. In a computer system having a processor coupled to a bus and a computer readable memory unit coupled to said bus, a method for determining distinct cardinality of a data sample, said method comprising the steps of:
-
(a) receiving said data sample, said data sample containing C entries; (b) determining a first distinct cardinality value of said data sample using a hashing function and a first bitmap, said first bitmap comprising x entries; (c) determining a second distinct cardinality value of said data sample using a hashing function and a fractional bitmap that is a fraction of a second bitmap, said second bitmap comprising y logical entries and said fractional bitmap comprising z entries wherein y is larger than x; and (d) selecting between said first distinct cardinality value and said second distinct cardinality value as said distinct cardinality of said data sample, wherein said first bitmap and said fractional bitmap are separate bitmaps and are stored within said computer readable memory unit. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. In a computer system having a processor coupled to a bus and a computer readable memory unit coupled to said bus, a method for determining distinct cardinality of a data sample, said method comprising the steps of:
-
(a) receiving said data sample, said data sample containing C entries; (b) determining a first distinct cardinality value of said data sample using a hashing function and a first bitmap, said first bitmap comprising M* sqrt(C*K) ! entries where M is a multiplier constant value and K is a key density constant value; (c) determining a second distinct cardinality value of said data sample using a hashing function and a fractional bitmap that is a fraction of a second bitmap, said second bitmap comprising M*C logical entries and said fractional bitmap containing M* sqrt(C*K)! entries; and (d) selecting between said first distinct cardinality value and said second distinct cardinality value as said distinct cardinality of said data sample, wherein said first bitmap and said fractional bitmap are separate bitmaps stored within said computer readable memory unit. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
-
-
17. A computer system comprising a processor coupled to a bus and a computer readable memory unit coupled to said bus, said computer readable memory unit containing a set of instructions that when executed by said processor causing said computer system to implement a method for determining distinct cardinality of a data sample, said method comprising the steps of:
-
(a) receiving said data sample, said data sample containing C entries; (b) determining a first distinct cardinality value of said data sample using a hash function and a first bitmap, said first bitmap comprising x entries; (c) determining a second distinct cardinality value of said data sample using a hash function and a fractional bitmap that is a fraction of a second bitmap, said second bitmap comprising y logical entries and said fractional bitmap comprising x entries wherein y is larger than x; and (d) selecting between said first distinct cardinality value and said second distinct cardinality value as said distinct cardinality of said data sample, wherein said first bitmap and said fractional bitmap are separate bitmaps and are stored within said computer readable memory unit using different memory locations. - View Dependent Claims (18, 19, 20, 21, 22, 23)
-
Specification