Data partitioning via bucketing bloom filters
First Claim
1. One or more computer storage media having stored thereon instructions that, when executed by one or more processors of a computing device, cause the one or more processors to:
- identify a first data set of elements and a second data set of elements, wherein the first data set and the second data set are disjoint;
assign each element in the first data set to a bucket of a first set of buckets;
assign each element in the second data set to a bucket of a second set of buckets; and
generate, for each bucket of the first set of buckets, a Bloom filter that indicates that each element assigned to the bucket of the first set of buckets is part of the first data set, and that indicates that each element assigned to a corresponding bucket of the second set of buckets is not part of the first data set.
2 Assignments
0 Petitions
Accused Products
Abstract
Multiple Bloom filters are generated to partition data between first and second disjoint data sets of elements. Each element in the first data set is assigned to a bucket of a first set of buckets, and each element in the second data set is assigned to a bucket of a second set of buckets. A Bloom filter is generated for each bucket of the first set of buckets. The Bloom filter generated for a bucket indicates that each element assigned to that bucket is part of the first data set, and that each element assigned to a corresponding bucket of the second set of buckets is not part of the first data set. Additionally, a Bloom filter corresponding to a subsequently received element can be determined and used to identify whether that subsequently received element is part of the first data set or the second data set.
27 Citations
20 Claims
-
1. One or more computer storage media having stored thereon instructions that, when executed by one or more processors of a computing device, cause the one or more processors to:
-
identify a first data set of elements and a second data set of elements, wherein the first data set and the second data set are disjoint; assign each element in the first data set to a bucket of a first set of buckets; assign each element in the second data set to a bucket of a second set of buckets; and generate, for each bucket of the first set of buckets, a Bloom filter that indicates that each element assigned to the bucket of the first set of buckets is part of the first data set, and that indicates that each element assigned to a corresponding bucket of the second set of buckets is not part of the first data set. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A method of creating a set of Bloom filters to partition data into two disjoint data sets, the method comprising:
-
assigning elements of each of the two disjoint data sets to a bucket of a corresponding set of buckets; and generating the set of Bloom filters, each Bloom filter corresponding to one bucket corresponding to a first data set of the two disjoint data sets, each Bloom filter indicating that each element in the first data set is part of the first data set, and further indicating that each element in a second data set of the two data sets is not part of the first data set. - View Dependent Claims (12, 13, 14, 15, 16, 17)
-
-
18. One or more computer storage media having stored thereon instructions that, when executed by one or more processors of a computing device, cause the one or more processors to:
-
obtain data that is an element of either a first data set of elements or a second data set of elements, wherein the first data set and the second data set are disjoint; determine a single Bloom filter that corresponds to the element, the single Bloom filter being one of multiple Bloom filters associated with the first data set, each of the multiple Bloom filters corresponding to different elements of the first data set; and use the single Bloom filter to determine whether the element is part of the first data set of elements or the second data set of elements. - View Dependent Claims (19, 20)
-
Specification