Computer storage deduplication
First Claim
Patent Images
1. A data center comprising:
- a plurality of host computers including a first host computer; and
a storage system external to and accessible by the plurality of host computers, wherein the storage system includes a plurality of storage blocks, a hash table, a write log and a merge log stored therein;
wherein each storage block in the plurality of storage blocks stores a data block and a reference count indicating a number of references in the storage system to the data block;
wherein the hash table contains hashes corresponding to used storage blocks, wherein a used storage block is a storage block with a reference count greater than zero;
wherein the write log contains write records, wherein each write record includes a reference to a storage block storing a data block written by the first host computer and a hash for the written data block; and
wherein the merge log is configured to store one or more merge requests;
wherein the first host computer is configured to;
retrieve one of the hashes from one of the write records in the write log;
determine a match between the retrieved hash and one of the hashes in the hash table for a used storage block other than the storage block storing the written data block corresponding to the retrieved hash;
determine that one of the plurality of host computers other than the first host computer has exclusive access to the storage block corresponding to the matching hash in the hash table, the other host having exclusive access by having a lock on a file containing the storage block; and
store a merge request in the merge log instead of performing a deduplication of the written data block and continue with deduplication operations on other files accessible to the first host computer, wherein the other host computer discovers the merge request stored in the merge log and based on the stored merge request performs the deduplication of the written data block by increasing the reference count for the storage block matching the hash in the hash table and freeing for reuse by the storage system the storage block containing the written data block.
2 Assignments
0 Petitions
Accused Products
Abstract
A data center comprising plural computer hosts and a storage system external to said hosts is disclosed. The storage system includes storage blocks for storing tangibly encoded data blocks. Each of said hosts includes a deduplicating file system for identifying and merging identical data blocks stored in respective storage blocks into one of said storage blocks so that a first file exclusively accessed by a first host of said hosts and a second file accessed exclusively by a second host of said hosts concurrently refer to the same one of said storage blocks.
-
Citations
14 Claims
-
1. A data center comprising:
-
a plurality of host computers including a first host computer; and a storage system external to and accessible by the plurality of host computers, wherein the storage system includes a plurality of storage blocks, a hash table, a write log and a merge log stored therein; wherein each storage block in the plurality of storage blocks stores a data block and a reference count indicating a number of references in the storage system to the data block; wherein the hash table contains hashes corresponding to used storage blocks, wherein a used storage block is a storage block with a reference count greater than zero; wherein the write log contains write records, wherein each write record includes a reference to a storage block storing a data block written by the first host computer and a hash for the written data block; and wherein the merge log is configured to store one or more merge requests; wherein the first host computer is configured to; retrieve one of the hashes from one of the write records in the write log; determine a match between the retrieved hash and one of the hashes in the hash table for a used storage block other than the storage block storing the written data block corresponding to the retrieved hash; determine that one of the plurality of host computers other than the first host computer has exclusive access to the storage block corresponding to the matching hash in the hash table, the other host having exclusive access by having a lock on a file containing the storage block; and store a merge request in the merge log instead of performing a deduplication of the written data block and continue with deduplication operations on other files accessible to the first host computer, wherein the other host computer discovers the merge request stored in the merge log and based on the stored merge request performs the deduplication of the written data block by increasing the reference count for the storage block matching the hash in the hash table and freeing for reuse by the storage system the storage block containing the written data block. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A method for performing a deduplication operation in a storage system connected to a plurality of host computers including a first host computer, the storage system including a plurality of storage blocks a hash table, a write log, and a merge log stored therein,
wherein each storage block in the plurality of storage blocks stores a data block and a reference count indicating a number of references in the storage system to the data block, wherein the hash table contains hashes corresponding to used storage blocks, wherein a used storage block is a storage block with a reference count greater than zero, wherein the write log contains write records, wherein each write record includes a reference to a storage block storage a data block written by the first host computer and a hash for the written data block, and wherein the merge log is configured to store one or more merge requests; the method comprising; retrieving by the first host computer one of the hashes from one of the write records in the write log; determining a match between the retrieved hash and one of the hashes in the hash table for a used storage block other than storage block storing the written data block corresponding to the retrieved hash; determining that one of the plurality of host computers other than the first host computer has exclusive access to the storage block corresponding to the matching hash in the hash table, the other host having exclusive access by having a lock on a file containing the storage block; and storing a merge request in the merge log instead of performing a deduplication of the written data block and continuing with deduplication operations on other files accessible to the first host computer, wherein the other host computer discovers the merge request stored in the merge log and based on the stored merge request performs the deduplication of the written data block by increasing the reference count for the storage block matching the hash in the hash table and freeing for reuse by the storage system the storage block containing the written data block. - View Dependent Claims (9, 10, 11, 12, 13, 14)
Specification