Computer storage deduplication
First Claim
1. A method of performing deduplication operations in a computer system having multiple host systems connected to a common storage system, the method comprising:
- maintaining a hierarchical data structure including a low-level data structure and one or more higher level data structures, each containing hash values and being maintained as a sorted data structure according to the hash values,wherein the low-level data structure contains hash values of storage blocks and the one or more higher level data structures contain some of, but not all of, the hash values in the low-level data structure that are to be used when locating hash values in the low-level data structure or adding new hash values into the low-level data structure; and
at each host system, tracking write operations to the common storage system during a period of time and asynchronously performing deduplication operations on storage blocks that are written in connection with the write operations using the hierarchical data structure,wherein the low-level data structure is divided into pages including a first page and a second page, and one of the higher-level data structures that is at a next higher level than the low-level data structure contains a first hash value contained in the first page, a first pointer to the first page, a second hash value contained in the second page, and a second pointer to the second page.
2 Assignments
0 Petitions
Accused Products
Abstract
Decentralized deduplication operations in a computer system employ a hash index that is a variant of a B+ tree to support both efficient sequential updates as well as efficient random updates. Sequential update is selected when deduplication is infrequently performed, such as on the order of days, and random update is selected when deduplication is performed more frequently, such as on the order of seconds. More frequent deduplication may be beneficial during periods when large amounts of temporary duplicate data are created, and the system may not have enough storage space to accommodate the temporary spike in demand.
59 Citations
17 Claims
-
1. A method of performing deduplication operations in a computer system having multiple host systems connected to a common storage system, the method comprising:
-
maintaining a hierarchical data structure including a low-level data structure and one or more higher level data structures, each containing hash values and being maintained as a sorted data structure according to the hash values, wherein the low-level data structure contains hash values of storage blocks and the one or more higher level data structures contain some of, but not all of, the hash values in the low-level data structure that are to be used when locating hash values in the low-level data structure or adding new hash values into the low-level data structure; and at each host system, tracking write operations to the common storage system during a period of time and asynchronously performing deduplication operations on storage blocks that are written in connection with the write operations using the hierarchical data structure, wherein the low-level data structure is divided into pages including a first page and a second page, and one of the higher-level data structures that is at a next higher level than the low-level data structure contains a first hash value contained in the first page, a first pointer to the first page, a second hash value contained in the second page, and a second pointer to the second page. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A non-transitory computer readable storage medium having stored therein instructions for causing a computer system to maintain a hierarchical data structure including a low-level data structure and one or more higher level data structures, each containing hash values and being maintained as a sorted data structure according to the hash values, and to track write operations to a storage system during a period of time and asynchronously perform deduplication operations on storage blocks that are written in connection with the write operations using the hierarchical data structure,
wherein the low-level data structure contains hash values of storage blocks and the one or more higher level data structures contain some of, but not all of, the hash values in the low-level data structure that are to be used when locating hash values in the low-level data structure or adding new hash values into the low-level data structure, and wherein the low-level data structure is divided into pages including a first page and a second page, and one of the higher-level data structures that is at a next higher level than the low-level data structure contains a first hash value contained in the first page, a first pointer to the first page, a second hash value contained in the second page, and a second pointer to the second page.
-
16. A computer system comprising:
-
a plurality of host systems connected to a common storage system, wherein each host system is programmed to track, via one or more processors, write operations to the common storage system during a period of time and asynchronously perform deduplication operations on storage blocks that are written in connection with the write operations using a hierarchical data structure that is stored in the common storage system, wherein the hierarchical data structure includes a low-level data structure and one or more higher level data structures, each containing hash values and being maintained as a sorted data structure according to the hash values, the low-level data structure containing hash values of storage blocks and the one or more higher level data structures containing some of, but not all of, the hash values in the low-level data structure that are to be used when locating hash values in the low-level data structure or adding new hash values into the low-level data structure, and wherein the low-level data structure is divided into pages including a first page and a second page, and one of the higher-level data structures that is at a next higher level than the low-level data structure contains a first hash value contained in the first page, a first pointer to the first page, a second hash value contained in the second page, and a second pointer to the second page. - View Dependent Claims (17)
-
Specification