Deduplicating sets of data blocks
First Claim
Patent Images
1. A method for use in deduplicating sets of data blocks, the method comprising:
- identifying a candidate data object for deduplicating a data object, wherein a digest associated with the candidate data object matches a digest associated with the data object wherein an index table stores digest information for the candidate data object and digest information for at least one data object residing adjacent to the candidate data object;
upon detecting a digest collision, determining whether to apply a deduplicating technique to the candidate data object and the data object by evaluating digest information of a set of data blocks residing adjacent to the candidate data block, wherein evaluating the digest information of the set of data blocks residing adjacent to the candidate data block includes determining whether a first set of data objects residing adjacent to the data object is identical to a second set of data objects residing at adjacent to the candidate data object, wherein the first and second sets of data objects are selected based on a reference of locality indicating a likelihood of the data object being identical to the candidate data object, wherein whether the first set of data objects is identical to the second set of data objects is determined by evaluating respective digest information of the first and second sets of data objects, wherein each data object of a set of data objects is associated with a respective digest, wherein the digest collision indicates that at least two data objects storing different contents are associated with the same digest; and
based on the determination that the first and second sets of data objects are identical to each other, applying a deduplicating technique to the data object and the candidate data object for deduplicating the data object to the candidate data object.
9 Assignments
0 Petitions
Accused Products
Abstract
A method is used in deduplicating sets of data blocks. A candidate data object is identified for deduplicating a data object. A digest associated with the candidate data object matches a digest associated with the data object. Digest information of a set of data objects is evaluated. The set of data objects are selected for evaluation based on an association between location of the set of data objects and location of the candidate data object. Based on the evaluation, a deduplicating technique is applied for deduplicating the data object.
-
Citations
20 Claims
-
1. A method for use in deduplicating sets of data blocks, the method comprising:
-
identifying a candidate data object for deduplicating a data object, wherein a digest associated with the candidate data object matches a digest associated with the data object wherein an index table stores digest information for the candidate data object and digest information for at least one data object residing adjacent to the candidate data object; upon detecting a digest collision, determining whether to apply a deduplicating technique to the candidate data object and the data object by evaluating digest information of a set of data blocks residing adjacent to the candidate data block, wherein evaluating the digest information of the set of data blocks residing adjacent to the candidate data block includes determining whether a first set of data objects residing adjacent to the data object is identical to a second set of data objects residing at adjacent to the candidate data object, wherein the first and second sets of data objects are selected based on a reference of locality indicating a likelihood of the data object being identical to the candidate data object, wherein whether the first set of data objects is identical to the second set of data objects is determined by evaluating respective digest information of the first and second sets of data objects, wherein each data object of a set of data objects is associated with a respective digest, wherein the digest collision indicates that at least two data objects storing different contents are associated with the same digest; and based on the determination that the first and second sets of data objects are identical to each other, applying a deduplicating technique to the data object and the candidate data object for deduplicating the data object to the candidate data object. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A system for use in deduplicating sets of data blocks, the system comprising a processor configured to:
-
identify a candidate data object for deduplicating a data object, wherein a digest associated with the candidate data object matches a digest associated with the data object wherein an index table stores digest information for the candidate data object and digest information for at least one data object residing adjacent to the candidate data object; upon detecting a digest collision, determine whether to apply a deduplicating technique to the candidate data object and the data object by evaluating digest information of a set of data blocks residing adjacent to the candidate data block, wherein evaluating the digest information of the set of data blocks residing adjacent to the candidate data block includes determining whether a first set of data objects residing adjacent to the data object is identical to a second set of data objects residing at adjacent to the candidate data object, wherein the first and second sets of data objects are selected based on a reference of locality indicating a likelihood of the data object being identical to the candidate data object, wherein whether the first set of data objects is identical to the second set of data objects is determined by evaluating respective digest information of the first and second sets of data objects wherein each data object of a set of data objects is associated with a respective digest, wherein the digest collision indicates that at least two data objects storing different contents are associated with the same digest; and apply, based on the determination that the first and second sets of data objects are identical to each other, a deduplicating technique to the data object and the candidate data object for deduplicating the data object to the candidate data object. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
-
Specification