Removing overlapping ranges from a flat sorted data structure
First Claim
1. A computer implemented method for removing one or more stale fingerprints from a fingerprint database the method comprising:
- sorting a list of stale fingerprints stored in a stale fingerprint data structure into an inode number order;
performing an attributes intersect range calculation on the sorted list that computes a non-overlapping and latest consistency point (CP) range, wherein the attributes intersect range calculation comprises determining a tuple that defines the non-overlapping and latest CP range, wherein the tuple includes a starting value, an ending value, and a CP count associated with a given inode; and
comparing the non-overlapping and latest CP range with the fingerprint database wherein the comparing includes;
determining whether an identifier, associated with a block of data, is located between the starting value and the ending value,in response to determining that the identifier is located between the starting value and the ending value, determining whether one or more selected fingerprints associated with the block of data and in the fingerprint database have a specific CP count that is less than or equal to the CP count contained in the non-overlapping and latest CP range, andremoving from the fingerprint database the one or more selected fingerprints associated with the block of data and in the fingerprint database that have the specific CP count that is less than or equal to the CP count contained in the non-overlapping and latest CP range, wherein the one or more selected fingerprints that have the specific CP count that is less than or equal to the CP count are stale fingerprints.
1 Assignment
0 Petitions
Accused Products
Abstract
A system can efficiently removes ranges of entries from a flat sorted data structure that represent stale fingerprints As part of fingerprint verification during deduplication, the system performs an attributes intersect range calculation (AIRC) procedure on the stale fingerprint data structure to compute a set of non-overlapping and latest consistency point (CP) ranges. During the AIRC procedure, an inode associated with a data container is selected and the FBN tuple of each deleted data block in the file is sorted in a predefined FBN order. The AIRC procedure then identifies the most recent fingerprint associated with a deleted data block. The set of non-overlapping and latest CP ranges is then used to remove stale fingerprints associated with that deleted block from the fingerprint database. A single pass through the fingerprint database identifies the set of non-overlapping and latest CP ranges, thereby improving efficiency of the storage system.
-
Citations
20 Claims
-
1. A computer implemented method for removing one or more stale fingerprints from a fingerprint database the method comprising:
-
sorting a list of stale fingerprints stored in a stale fingerprint data structure into an inode number order; performing an attributes intersect range calculation on the sorted list that computes a non-overlapping and latest consistency point (CP) range, wherein the attributes intersect range calculation comprises determining a tuple that defines the non-overlapping and latest CP range, wherein the tuple includes a starting value, an ending value, and a CP count associated with a given inode; and comparing the non-overlapping and latest CP range with the fingerprint database wherein the comparing includes; determining whether an identifier, associated with a block of data, is located between the starting value and the ending value, in response to determining that the identifier is located between the starting value and the ending value, determining whether one or more selected fingerprints associated with the block of data and in the fingerprint database have a specific CP count that is less than or equal to the CP count contained in the non-overlapping and latest CP range, and removing from the fingerprint database the one or more selected fingerprints associated with the block of data and in the fingerprint database that have the specific CP count that is less than or equal to the CP count contained in the non-overlapping and latest CP range, wherein the one or more selected fingerprints that have the specific CP count that is less than or equal to the CP count are stale fingerprints. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A system comprising:
-
a processor; and a computer-readable medium comprising program instructions executable by the processor to cause the system to, perform an attributes intersection range calculation on a stale fingerprint data structure, wherein the program instructions to perform the attributes intersection range calculation comprise instructions executable to cause the system to determine a tuple that defines a non-overlapping and latest consistency point (CP) range including a starting value, an ending value, and a CP count associated with a given inode, compare a fingerprint database with the non-overlapping and latest CP range, wherein the program instructions to compare comprise program instructions executable by the processor to cause the system to; determine whether an identifier, associated with a block of data, is located between the starting value and the ending value, in response to determining that the identifier is located between the starting value and the ending value, determine whether one or more selected fingerprints associated with the block of data and in the fingerprint database have a specific CP count that is less than or equal to the CP count of the non-overlapping and latest CP range, and remove from the fingerprint database the one or more selected fingerprints associated with the block of data and in the fingerprint database that have the specific CP count that is less than or equal to the CP count of the non-overlapping and latest CP range, wherein the one or more selected fingerprints that have the specific CP count that is less than or equal to the CP count are stale fingerprints. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19)
-
-
20. A non-transitory computer readable storage medium containing executable program instructions for execution by a processor, the program instructions executable to:
-
sort a list of stale fingerprints into a predefined order, wherein the stale fingerprints are associated with file block numbers to which a specified operation has been performed; perform an attributes intersect range calculation on the sorted list, wherein the program instructions to perform the attributes intersect range calculation comprises program instructions executable to determine a tuple that defines a non-overlapping and latest consistency point (CP) range, wherein the tuple includes a starting value, an ending value, and a CP count associated with a given inode; and compare the non-overlapping and latest CP range with a fingerprint database, wherein the program instructions that compare comprise program instructions executable to; determine whether an identifier, associated with a block of data, is located between the starting value and the ending value, in response to determining that the identifier is located between the starting value and the ending value, determine whether one or more selected fingerprints associated with the block of data and in the fingerprint database have a specific CP count that is less than or equal to the CP count contained in the non-overlapping and latest CP range, remove from the fingerprint database the one or more selected fingerprints associated with the block of data and in the fingerprint database that have the specific CP count that is less than or equal to the CP count contained in the non-overlapping and latest CP range, wherein the one or more selected fingerprints that have the specific CP count that is less than or equal to the CP count are stale fingerprints.
-
Specification