Integrated approach for deduplicating data in a distributed environment that involves a source and a target
First Claim
1. A method for deduplicating a data file at each of a source and a target location in a distributed storage management system, the storage management system containing a source computing system connected to a target computing system and a target data store located within the target computing system, the method comprising:
- maintaining, by use of a processor, a shared index for tracking deduplicated data chunks stored within the target data store, wherein the shared index is accessed by the source computing system and the target computing system;
deduplicating a data file that is located at the source computing system into a set of deduplicated data chunks with the source computing system and transmitting the data file to the target system as a result of determining that the data file satisfies a policy;
transmitting another data file to the target computing system and deduplicating the other data file into the set of deduplicated data chunks with the target computing system as a result of determining that the other data file does not satisfy the policy, wherein the policy is not satisfied if the data file contains sensitive data;
wherein deduplicating at the source computer comprises fingerprinting and hashing the data chunks with a first set of fingerprinting and hashing algorithms on the source computing system if the data file satisfies the policy;
wherein deduplicating at the target computing system comprises fingerprinting and hashing the data chunks with a second set of fingerprinting and hashing algorithms on the target computing system if the data file does not satisfy the policy;
storing the set of deduplicated data chunks within the target data store; and
updating deduplication information for the set of deduplicated data chunks within the shared index.
1 Assignment
0 Petitions
Accused Products
Abstract
One aspect of the present invention includes a configuration of a storage management system that enables the performance of deduplication activities at both the client (source) and at the server (target) locations. The location of deduplication operations can then be optimized based on system conditions or predefined policies. In one embodiment, seamless switching of deduplication activities between the client and the server is enabled by utilizing uniform deduplication process algorithms and accessing the same deduplication index (containing information on the hashed data chunks). Additionally, any data transformations on the chunks are performed subsequent to identification of the data chunks. Accordingly, with use of this storage configuration, the storage system can find and utilize matching chunks generated with either client- or server-side deduplication.
19 Citations
18 Claims
-
1. A method for deduplicating a data file at each of a source and a target location in a distributed storage management system, the storage management system containing a source computing system connected to a target computing system and a target data store located within the target computing system, the method comprising:
-
maintaining, by use of a processor, a shared index for tracking deduplicated data chunks stored within the target data store, wherein the shared index is accessed by the source computing system and the target computing system; deduplicating a data file that is located at the source computing system into a set of deduplicated data chunks with the source computing system and transmitting the data file to the target system as a result of determining that the data file satisfies a policy; transmitting another data file to the target computing system and deduplicating the other data file into the set of deduplicated data chunks with the target computing system as a result of determining that the other data file does not satisfy the policy, wherein the policy is not satisfied if the data file contains sensitive data; wherein deduplicating at the source computer comprises fingerprinting and hashing the data chunks with a first set of fingerprinting and hashing algorithms on the source computing system if the data file satisfies the policy; wherein deduplicating at the target computing system comprises fingerprinting and hashing the data chunks with a second set of fingerprinting and hashing algorithms on the target computing system if the data file does not satisfy the policy; storing the set of deduplicated data chunks within the target data store; and updating deduplication information for the set of deduplicated data chunks within the shared index. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A method of enabling deduplication of a data file at a selected source or target location in a distributed storage management system, the storage management system containing a source computing system connected to a target computing system and a target data store located within the target computing system, the method comprising:
-
tracking, by use of a processor, deduplication information for deduplicated data chunks stored within the target data store with a shared index, wherein the shared index is accessed by the source computing system and the target computing system; deduplicating the data file into a set of deduplicated data chunks at the source computing system and transmitting the data file to the target system as a result of determining that the data file satisfies a policy transmitting another data file to the target computing system and deduplicating the other data file into the set of deduplicated data chunks with the target computing system as a result of determining that the other data file does not satisfy the policy, wherein the policy is not satisfied if the data file contains sensitive data; wherein deduplicating at the source computer comprises fingerprinting and hashing the data chunks with a first set of fingerprinting and hashing algorithms on the source computing system if the data file satisfies the policy; wherein deduplicating at the target computing system comprises fingerprinting and hashing the data chunks with a second set of fingerprinting and hashing algorithms on the target computing system if the data file does not satisfy the policy; and updating the tracked deduplication information for the data file and the other data file. - View Dependent Claims (11)
-
-
12. A storage management system, comprising:
-
a source computing system; a target computing system connected to the source computing system; a target data store located within the target computing system; at least one processor within the storage management system; at least one memory within the storage management system storing instructions operable with the at least one processor for enabling deduplication of a data file at each of a source and a target location in the storage management system, the instructions being executed for; maintaining a shared index for tracking deduplicated data chunks stored within the target data store, wherein the shared index is accessed by the source computing system and the target computing system; deduplicating a data file that is located at the source computing system into a set of deduplicated data chunks with the source computing system and transmitting the data file to the target system as a result of determining that the data file satisfies a policy; transmitting another data file to the target computing system and deduplicating the other data file into the set of deduplicated data chunks with the target computing system as a result of determining that the other data file does not satisfy the policy, wherein the policy is not satisfied if the data file contains sensitive data; wherein deduplicating at the source computer comprises fingerprinting and hashing the data chunks with a first set of fingerprinting and hashing algorithms on the source computing system if the data file satisfies the policy; wherein deduplicating at the target computing system comprises fingerprinting and hashing the data chunks with a second set of fingerprinting and hashing algorithms on the target computing system if the data file does not satisfy the policy; storing the set of deduplicated data chunks within the target data store; and updating deduplication information for the set of deduplicated data chunks within the shared index. - View Dependent Claims (13, 14, 15, 16)
-
-
17. A storage management system, comprising:
-
a source computing system; a target computing system connected to the source computing system; a target data store located within the target computing system; at least one processor within the storage management system; at least one memory within the storage management system storing instructions operable with the at least one processor for enabling deduplication of a data file at a selected source or target location in the storage management system, the instructions being executed for; tracking deduplication information for deduplicated data chunks stored within the target data store with a shared index, wherein the shared index is accessed by the source computing system and the target computing system; deduplicating the data file into a set of deduplicated data chunks at the source computing system and transmitting the data file to the target system as a result of determining that the data file satisfies a policy; transmitting another data file to the target computing system and deduplicating the other data file into the set of deduplicated data chunks with the target computing system as a result of determining that the other data file does not satisfy the policy, wherein the policy is not satisfied if the data file contains sensitive data; wherein deduplicating at the source computer comprises fingerprinting and hashing the data chunks with a first set of fingerprinting and hashing algorithms on the source computing system if the data file satisfies the policy; and wherein deduplicating at the target computing system comprises fingerprinting and hashing the data chunks with a second set of fingerprinting and hashing algorithms on the target computing system if the data file does not satisfy the policy; and updating the tracked deduplication information for the data file and the other data file. - View Dependent Claims (18)
-
Specification