Method and apparatus determining and using hash functions and hash values
First Claim
Patent Images
1. A method of determining a pair of hash values, by a data processing system having a memory, comprising:
- choosing four 32-bit random values and storing them in the memory;
determining two hash values from pairwise independent hash functions h1(x)=cx+d (mod p) and h2(x)=dx+c (mod p), where c=a first 32-bit result value c and where d=a second 32-bit result value d and where p is a prime number, wherein said hash functions are determined in accordance with the four 32-bit random values from the memory and a 32-bit value x also from the memory, using only linear arithmetic and 4-byte machine register operations.
4 Assignments
0 Petitions
Accused Products
Abstract
A method and apparatus that determines and uses two nearly uniform independent hash functions. The hash functions are created using only linear arithmetic and 4-byte machine register operations and, thus, can be created very quickly. The first uniform hashing function hi and the second uniform hashing function h2 are pairwise independent;
117 Citations
27 Claims
-
1. A method of determining a pair of hash values, by a data processing system having a memory, comprising:
-
choosing four 32-bit random values and storing them in the memory;
determining two hash values from pairwise independent hash functions h1(x)=cx+d (mod p) and h2(x)=dx+c (mod p), where c=a first 32-bit result value c and where d=a second 32-bit result value d and where p is a prime number, wherein said hash functions are determined in accordance with the four 32-bit random values from the memory and a 32-bit value x also from the memory, using only linear arithmetic and 4-byte machine register operations. - View Dependent Claims (2, 3, 4, 5)
determining the 32-bit value x for a value that is larger than 32-bits using a cipher block chaining method.
-
-
3. The method of claim 1, wherein determining two hash values includes:
-
determining a product of a first one of the 32-bit random values and the 32-bit value x to yield a 64-bit value, which is stored in two 32-bit machine registers;
exclusive ORing the 32-bit values in the machine registers to yield a result R1 and exclusive ORing the result R1 with a second one of the 32-bit random values to yield the first 32-bit result value c.
-
-
4. The method of claim 3, wherein determining two hash values further includes:
-
determining a product of a third one of the 32-bit random values and the 32-bit value x to yield a 64-bit value, which is stored in two 32-bit machine registers;
exclusive ORing the 32-bit values in the machine registers to yield a result R2; and
exclusive ORing the result R2 with a fourth one of the 32-bit random values to yield the second 32-bit result value d.
-
-
5. The method of claim 1, further comprising determining a number of unique values in a large number of data items stored in the memory of the data processing system, which includes:
-
providing in the memory a linear bitmap of a first predetermined size and a second plurality of bitmaps of a second predetermined size;
using a first one of the hash functions to update the first bitmap;
using both hash functions to update the second plurality of bitmaps;
determining a first cardinality from the linear bitmap;
if the cardinality is less than or equal to a predetermined number, deeming the first cardinality to be the number of unique items; and
if the cardinality is greater than the predetermined number, determining a second cardinality in accordance with the plurality of bitmaps.
-
-
6. A method of determining a number of unique values in a large number of data items stored in a memory of a data processing system, the method comprising:
-
providing in the memory a linear bitmap of a first predetermined size and a second plurality of bitmaps of a second predetermined size;
creating, using only linear arithmetic and 4-byte machine register operations, a first uniform hashing function h1 and a second uniform hashing function h2, which are pairwise independent;
using a first hash function h1(x)=cx+d (mod p) to update the first bitmap;
using a second hash function h2(x)=dx+c (mod p) to update the second plurality of bitmaps, where c=a first 32-bit result value c and where d=a second 32-bit result value d and where p is a prime number;
determining a first cardinality from the linear bitmap;
if the cardinality is less than or equal to a predetermined number, deeming the first cardinality to be the number of unique items; and
if the cardinality is greater than the predetermined number, determining a second cardinality in accordance with the plurality of bitmaps and deeming the second cardinality to be the number of unique items. - View Dependent Claims (7, 8, 9, 10)
if the cardinality is small, using a larger of a pair of linear bitmap counts; and
if the expected cardinality is small but larger than a square root of a bitmap size, using a mean of the plurality of linear bitmaps.
-
-
10. The method of claim 6, wherein the items are divided into p partitions and a plurality of bitmaps is stored in memory for each partition p.
-
11. An apparatus, having a memory and determining a pair of hash values, comprising a processor executing the following software portions:
-
a software portion configured to choose four 32-bit random values and store them in the memory;
a software portion configured to determine two hash values from pairwise independent hash functions h1(x)=cx+d (mod p) and h2(x)=dx+c (mod p), where c=a first 32-bit result value c and where d=a second 32-bit result value d and where p is a prime number, wherein said hash functions are determined in accordance with the four 32-bit random values from the memory and a 32-bit value x also from the memory, using only linear arithmetic and 4-byte machine register operations. - View Dependent Claims (12, 13, 14, 15, 16)
a software portion configured to determine the 32-bit value x for a value that is larger than 32-bits using a cipher block chaining method.
-
-
13. The apparatus of claim 11, wherein the software portion executed by the processor and configured to determine two hash values includes:
-
a software portion configured to determine a product of a first one of the 32-bit random values and the 32-bit value x to yield a 64-bit value, which is stored in two 32-bit machine registers;
a software portion configured to exclusive OR the 32-bit values in the machine registers to yield a result R1; and
a software portion configured to exclusive OR the result R1 with a second one of the 32-bit random values to yield the first 32-bit result value c.
-
-
14. The apparatus of claim 13, wherein the software portion executed by the processor and configured to determine two hash values further includes:
-
a software portion configured to determine a product of a third one of the 32-bit random values and the 32-bit value x to yield a 64-bit value, which is stored in two 32-bit machine registers;
a software portion configured to exclusive OR the 32-bit values in the machine registers to yield a result R2; and
a software portion configured to exclusive OR the result R2 with a fourth one of the 32-bit random values to yield the second 32-bit result value d.
-
-
15. The apparatus of claim 14, wherein the software portion executed by the processor and configured to determine two hash values includes:
-
a software portion configured to determine the first hash function h1(x)=cx+d (mod p); and
a software portion configured to determine the second hash function h2(x)=dx+c (mod p), where c=the first 32-bit result value c and where d=the second 32-bit result value d and where p is a prime number.
-
-
16. The apparatus of claim 11, further comprising a software portion executed by the processor and configured to determine a number of unique values in a large number of data items stored in the memory of the data processing system, including:
-
a software portion configured to provide in the memory a linear bitmap of a first predetermined size and a second plurality of bitmaps of a second predetermined size;
a software portion configured to use the first hash function to update the first bitmap;
a software portion configured to use the second hash function to update the second plurality of bitmaps;
a software portion configured to determine a first cardinality from the linear bitmap;
a software portion configured to, if the cardinality is less than or equal to a predetermined number, deem the first cardinality to be the number of unique items; and
a software portion configured to, if the cardinality is greater than the predetermined number, determine a second cardinality in accordance with the plurality of bitmaps.
-
-
17. An apparatus having a memory and determining a number of unique values in a large number of data items stored in a memory of a data processing system, the apparatus comprising:
-
a software portion configured to provide in the memory a linear bitmap of a first predetermined size and a second plurality of bitmaps of a second predetermined size;
a software portion configured to create, using only linear arithmetic and 4-byte machine register operations, a first uniform hashing function h1 and a second uniform hashing function h2, which are pairwise independent;
a software portion configured to use a first hash function h1(x)=cx+d (mod p) to update the first bitmap;
a software portion configured to use a second hash function h2(x)=dx+c (mod p) to update the second plurality of bitmaps, where c=a first 32-bit result value c and where d=a second 32-bit result value d and where p is a prime number;
a software portion configured to determine a first cardinality from the linear bitmap;
a software portion configured to, if the cardinality is less than or equal to a predetermined number, deem the first cardinality to be the number of unique items; and
a software portion configured to, if the cardinality is greater than the predetermined number, determine a second cardinality in accordance with the plurality of bitmaps and deem the second cardinality to be the number of unique items. - View Dependent Claims (18, 19, 20, 21)
a software portion configured to, if the cardinality is small, use a larger of a pair of linear bitmap counts; and
a software portion configured to, if the expected cardinality is small but larger than a square root of a bitmap size, use a mean of the plurality of linear bitmaps.
-
-
21. The apparatus of claim 17, wherein the items are divided into p partitions and a plurality of bitmaps are stored in memory for each partition p.
-
22. A computer program product that determines a pair of hash values, comprising:
-
a computer readable medium, including program instructions, the program instructions including;
computer program code devices configured to choose four 32-bit random values and store them in a memory;
computer program code devices configured to determine two hash values from pairwise independent hash functions h1(x)=cx+d (mod p) and h2(x)=dx+c (mod p), where c=a first 32-bit result value c and where d=a second 32-bit result value d and where p is a prime number, wherein said hash functions are determined in accordance with the four 32-bit random values from the memory and a 32-bit value x also from the memory, using only linear arithmetic and 4-byte machine register operations.
-
-
23. A computer program product that determines a number of unique values in a large number of data items stored in a memory of a data processing system, comprising:
-
a computer readable medium, including program instructions, the program instructions including;
computer program code devices configured to provide in a memory a linear bitmap of a first predetermined size and a second plurality of bitmaps of a second predetermined size;
computer program code devices configured to create, using only linear arithmetic and 4-byte machine register operations, a first uniform hashing function h1 and a second uniform hashing function h2, which are pairwise independent;
computer program code devices configured to use a first hash function h1(x)=cx+d (mod p) to update the first bitmap;
computer program code devices configured to use a second hash function h2(x)=dx+c (mod p) to update the second plurality of bitmaps, where c=a first 32-bit result value c and where d=a second 32-bit result value d and where p is a prime number;
computer program code devices configured to determine a first cardinality from the linear bitmap;
computer program code devices configured to, if the cardinality is less than or equal to a predetermined number, deem the first cardinality to be the number of unique items; and
computer program code devices configured to, if the cardinality is greater than the predetermined number, determine a second cardinality in accordance with the plurality of bitmaps and deem the second cardinality to be the number of unique items.
-
-
24. A method of determining a pair of hash functions, by a data processing system having a memory, comprising:
-
choosing four 32-bit random values and storing them in the memory;
determining two pairwise independent hash functions h1(x)=cx+d (mod p) and h2(x)=dx+c (mod p), where c=a first 32-bit result value c and where d=a second 32-bit result value d and where p is a prime number, wherein said hash functions are determined in accordance with the four 32-bit random values from the memory and a 32-bit value x also from the memory, using only linear arithmetic and 4-byte machine register operations. - View Dependent Claims (25, 26, 27)
determining a product of a first one of the 32-bit random values and the 32-bit value x to yield a 64-bit value, which is stored in two 32-bit machine registers;
exclusive ORing the 32-bit values in the machine registers to yield a result R1 and exclusive ORing the result R1 with a second one of the 32-bit random values to yield the first 32-bit result value c.
-
-
26. The method of claim 25, wherein determining hash function h2 includes:
-
determining a product of a third one of the 32-bit random values and the 32-bit value x to yield a 64-bit value, which is stored in two 32-bit machine registers;
exclusive ORing the 32-bit values in the machine registers to yield a result R2; and
exclusive ORing the result R2 with a fourth one of the 32-bit random values to yield the second 32-bit result value d.
-
-
27. The method of claim 26, wherein determining two hash functions h1 and h2 includes:
-
determining the first hash function h1(x)=cx+d (mod p); and
determining the second hash function h2(x)=dx+c (mod p), where c=the first 32-bit result value c and where d=the second 32-bit result value d and where p is a prime number.
-
Specification