Parallel data redundancy removal
First Claim
1. A computer implemented method for parallel data redundancy removal, the computer implemented method comprising:
- computing a plurality of values for a record in a plurality of records stored in a storage device;
distributing the plurality of values for the record to corresponding queues in a plurality of queues, wherein each of the plurality of queues is associated with a corresponding section of a Bloom filter;
determining whether each value distributed to the corresponding queues for the record is indicated by a corresponding value in the corresponding section of the Bloom filter; and
identifying the record as a redundant record in response to a determination that each value distributed to the corresponding queues for the record is indicated by a corresponding value in the corresponding section of the Bloom filter.
1 Assignment
0 Petitions
Accused Products
Abstract
A method, system, and computer usable program product for parallel data redundancy removal are provided in the illustrative embodiments. A plurality of values is computed for a record in a plurality of records stored in a storage device. The plurality of values for the record is distributed to corresponding queues in a plurality of queues, wherein each of the plurality of queues is associated with a corresponding section of a Bloom filter. A determination is made whether each value distributed to the corresponding queues for the record is indicated by a corresponding value in the corresponding section of the Bloom filter. The record is identified as a redundant record in response to a determination that each value distributed to the corresponding queues for the record is indicated by a corresponding value in the corresponding section of the Bloom filter.
393 Citations
20 Claims
-
1. A computer implemented method for parallel data redundancy removal, the computer implemented method comprising:
-
computing a plurality of values for a record in a plurality of records stored in a storage device; distributing the plurality of values for the record to corresponding queues in a plurality of queues, wherein each of the plurality of queues is associated with a corresponding section of a Bloom filter; determining whether each value distributed to the corresponding queues for the record is indicated by a corresponding value in the corresponding section of the Bloom filter; and identifying the record as a redundant record in response to a determination that each value distributed to the corresponding queues for the record is indicated by a corresponding value in the corresponding section of the Bloom filter. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A computer usable program product comprising a computer usable storage medium including computer usable code for parallel data redundancy removal, the computer usable code comprising:
-
computer usable code for computing a plurality of values for a record in a plurality of records stored in a storage device; computer usable code for distributing the plurality of values for the record to corresponding queues in a plurality of queues, wherein each of the plurality of queues is associated with a corresponding section of a Bloom filter; computer usable code for determining whether each value distributed to the corresponding queues for the record is indicated by a corresponding value in the corresponding section of the Bloom filter; and computer usable code for identifying the record as a redundant record in response to a determination that each value distributed to the corresponding queues for the record is indicated by the corresponding value in the corresponding section of the Bloom filter. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. A data processing system for parallel data redundancy removal, the data processing system comprising:
-
a storage device including a storage medium, wherein the storage device stores computer usable program code; and a processor, wherein the processor executes the computer usable program code, and wherein the computer usable program code comprises; computer usable code for computing a plurality of values for a record in a plurality of records; computer usable code for distributing the plurality of values for the record to corresponding queues in a plurality of queues, wherein each of the plurality of queues is associated with a corresponding section of a Bloom filter; computer usable code for determining whether each value distributed to the corresponding queues for the record is indicated by a corresponding value in the corresponding section of the Bloom filter; and computer usable code for identifying the record as a redundant record in response to a determination that each value distributed to the corresponding queues for the record is indicated by the corresponding value in the corresponding section of the Bloom filter. - View Dependent Claims (20)
-
Specification