Estimation of unique database values
First Claim
Patent Images
1. A method for estimating unique values in a database, comprising:
- providing a data field in the database, the data field comprising a plurality of information values;
partitioning the data field into a plurality of intervals, each interval comprising a range of information values from the plurality of information values;
computing an interval specific Bloom filter for each of the plurality of intervals using a processor;
applying a hash function on an information value in an interval;
evaluating whether a result of the hash function returns a value of zero;
calculating a binary Bloom filter value for an information value within an interval specific Bloom filter, wherein the binary Bloom filter value represents whether information values are unique;
constructing credible sets to determine how many samples to use to determine a number of unique values; and
determining the number of unique values in the database based on calculated binary Bloom filter values.
2 Assignments
0 Petitions
Accused Products
Abstract
Estimation of unique values in a database can be performed where a data field having multiple information values is provided in the database. The data field can be partitioned into multiple intervals such that each interval includes a range of information values. An interval specific Bloom filter can be calculated for each of the multiple intervals. A binary Bloom filter value can be calculated for an information value within an interval specific Bloom filter. The binary Bloom filter value can represent whether the information value is unique. A number of unique values in the database can be determined based on calculated binary Bloom filter values.
-
Citations
20 Claims
-
1. A method for estimating unique values in a database, comprising:
-
providing a data field in the database, the data field comprising a plurality of information values; partitioning the data field into a plurality of intervals, each interval comprising a range of information values from the plurality of information values; computing an interval specific Bloom filter for each of the plurality of intervals using a processor; applying a hash function on an information value in an interval; evaluating whether a result of the hash function returns a value of zero; calculating a binary Bloom filter value for an information value within an interval specific Bloom filter, wherein the binary Bloom filter value represents whether information values are unique; constructing credible sets to determine how many samples to use to determine a number of unique values; and determining the number of unique values in the database based on calculated binary Bloom filter values. - View Dependent Claims (2, 3, 4, 5, 6, 7, 16, 17, 18, 19, 20)
-
-
8. A system implemented at least in part in program code stored in a non-transitory machine-readable medium and executable by a computer to estimate unique values in a database, the system comprising:
-
a database in communication with the information server and configured to store the information values received over a computing network; a data loader executed by the computer and configured to load the information values from the computing network to the database to be stored; a data partitioning module executed by the computer and configured to partition the information values into intervals; applying a hash function on an information value in an interval; evaluating whether a result of the hash function returns a value of zero; a Bloom filter module executed by the computer and configured to compute an interval specific Bloom filter for the intervals; a unique value calculation module executed by the computer and configured to calculate a binary Bloom filter value for an information value within an interval specific Bloom filter, wherein the binary Bloom filter value represents whether the information value is unique; and an estimation module executed by the computer and configured to estimate a number of unique values in the database based on calculated binary Bloom filter values, wherein the estimation module constructs credible sets to determine how many samples to use. - View Dependent Claims (9, 10, 11, 12)
-
-
13. A method for estimating unique values in a database, comprising:
-
storing a data field in the database, the data field comprising a plurality of information values; partitioning the data field into a plurality of intervals, each interval comprising a range of information values from the plurality of information values; applying a hash function on an information value in an interval; evaluating whether a result of the hash function returns a value of zero; constructing credible sets to determine how many samples to use to determine a unique value; determining whether the information, value When subjected to a Bloom filter is the unique value based on the result of the hash function returning a value of one; calculating a false positive probability of the information value being the unique value. - View Dependent Claims (14, 15)
-
Specification