Methods and apparatus for deduplication in storage system
First Claim
1. A computerized data storage system comprising:
- a. At least one host computer;
b. A management terminal; and
c. A storage system comprising;
i. An interface operable to communicate with the at least one host computer;
ii. A storage device comprising a plurality of data objects; and
iii. A deduplication controller operable to perform a deduplication of data stored in the storage device, wherein the deduplication controller maintains a threshold with respect to allowed degree of deduplication, counts a number of links for each data object and does not perform deduplication when the counted number of links for the data object exceeds the threshold even if duplication is detected;
wherein performing a deduplication comprises creating a plurality of links pointing from a plurality of virtual objects to one data object;
wherein the data object is stored without performing deduplication when the counted number of links for the data object exceeds the threshold even if duplication is detected.
1 Assignment
0 Petitions
Accused Products
Abstract
In one implementation, a storage system comprises host computers, a management terminal and a storage system having block interface to communicate with the host computers/clients. The storage system also incorporates a deduplication capability using chunks (divided storage area). The storage system maintains a threshold (upper limit) with respect to the degree of deduplication (i.e. number of virtual data for one real data) specified by users or the management software. The storage system counts the number of links for each chunk and does not perform deduplication when the number of reduced data for a chunk exceeds the threshold, even if duplication is detected. In another implementation, the storage system additionally incorporates a data migration capability and migrates physical data to high reliability area such as area protected with double parity (i.e. RAID6) when the deduplication level for a chunk exceeds the threshold.
311 Citations
21 Claims
-
1. A computerized data storage system comprising:
-
a. At least one host computer; b. A management terminal; and c. A storage system comprising; i. An interface operable to communicate with the at least one host computer; ii. A storage device comprising a plurality of data objects; and iii. A deduplication controller operable to perform a deduplication of data stored in the storage device, wherein the deduplication controller maintains a threshold with respect to allowed degree of deduplication, counts a number of links for each data object and does not perform deduplication when the counted number of links for the data object exceeds the threshold even if duplication is detected; wherein performing a deduplication comprises creating a plurality of links pointing from a plurality of virtual objects to one data object; wherein the data object is stored without performing deduplication when the counted number of links for the data object exceeds the threshold even if duplication is detected. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A computerized data storage system comprising:
-
a. At least one host computer; b. A management terminal; and c. A storage system comprising; i. An interface operable to communicate with the at least one host computer; ii. A normal reliability storage area; iii. A high reliability storage area; iv. A data migration controller operable to migrate data between the normal reliability storage area and the high reliability storage area; and v. A deduplication controller operable to perform deduplication of data stored in the normal reliability storage area or the high reliability data storage area, wherein the deduplication controller maintains a threshold with respect to allowed degree of deduplication and counts a number of links for each object; and
wherein the deduplication controller is operable to cause the data migration controller to migrate a data object to the high reliability storage area when the counted number of links for the data object exceeds the threshold;wherein performing a deduplication comprises creating a plurality of links pointing from a plurality of virtual objects to one data object. - View Dependent Claims (7, 8, 9, 10, 11)
-
-
12. A method performed by a storage system comprising an interface operable to communicate with at least one host computer and at least one storage device comprising a plurality of data objects, the method comprising:
-
a. Determining whether a first data is duplicated in at least one duplicate data object; b. Maintaining a threshold with respect to allowed degree of deduplication; c. Counting a number of links for the at least one duplicate data object; d. If the first data is duplicated in the at least one duplicate data object and if the counted number of links does not exceed the threshold, performing deduplication of the data in the at least one duplicate data object; and e. If the counted number of links exceeds the threshold, not performing the deduplication of the data in the at least one duplicate data object; wherein performing a deduplication comprises creating a plurality of links pointing from a plurality of virtual objects to one data object; wherein the at least one duplicate data object is stored without performing deduplication when the counted number of links for the data object exceeds the threshold eve if deduplication is detected. - View Dependent Claims (13, 14, 15, 16)
-
-
17. A method performed by a storage system comprising an interface operable to communicate with at least one host computer and at least one storage device comprising a plurality of data objects, the method comprising:
-
a. Determining whether a first data is duplicated in at least one duplicate data object of the plurality of data objects; b. Maintaining a threshold with respect to allowed degree of deduplication; c. Counting a number of links for the at least one duplicate data object; d. If the first data is duplicated in the at least one duplicate data object, performing deduplication of the data in the at least one duplicate data object; and e. If the counted number of links exceeds the threshold, migrating the at least one duplicate data object to a high reliability storage area; wherein performing a deduplication comprises creating a plurality of links pointing from a plurality of virtual objects to the at least one duplicate data object. - View Dependent Claims (18, 19, 20, 21)
-
Specification