Policy based tiered data deduplication strategy
First Claim
1. A method for applying a deduplication strategy to a data object based on a data storage policy, comprising:
- defining a plurality of data storage policies for a deduplication pool, each data storage policy containing settings including a maximum reference count for data chunks;
classifying the data object within a selected data storage policy;
dividing the data object into a plurality of data chunks, each data chunk having reference count data to track a number of references thereto;
storing each data chunk of the data object in the deduplication pool if the selected data storage policy does not allow deduplication of the data object; and
performing deduplication on the data object if the selected data storage policy allows deduplication of the data object, including for each data chunk of the data object;
initializing the data chunk reference count data and storing the data chunk in the deduplication pool if a previously stored identical copy of the data chunk does not exist in the deduplication pool,updating the reference count data of and creating a pointer to a previously stored identical copy of the data chunk if the previously stored identical copy of the data chunk exists in the deduplication pool and has a reference count less than the selected data storage policy maximum reference count, andinitializing the data chunk reference count data and storing the data chunk in the deduplication pool if each previously stored identical copy of the data chunk existing within the deduplication pool contains a reference count equal to or greater than the selected data storage policy maximum reference count.
2 Assignments
0 Petitions
Accused Products
Abstract
The present invention provides for a method, system, and computer program for the application of data deduplication according to a policy-based strategy of tiered data. The method operates by defining a plurality of data storage policies for data in a deduplication system, policies which may be arranged in tiers. Data objects are classified according to a selected data storage policy and are split into data chunks. If the selected data storage policy for the data object does not allow deduplication, the data chunks are stored in a deduplication pool. If the selected data storage policy for the data object allows deduplication, deduplication is performed. The data storage policy may specify a maximum number of references to data chunks, facilitating storage of new copies of the data chunks when the maximum number of references is met.
246 Citations
25 Claims
-
1. A method for applying a deduplication strategy to a data object based on a data storage policy, comprising:
-
defining a plurality of data storage policies for a deduplication pool, each data storage policy containing settings including a maximum reference count for data chunks; classifying the data object within a selected data storage policy; dividing the data object into a plurality of data chunks, each data chunk having reference count data to track a number of references thereto; storing each data chunk of the data object in the deduplication pool if the selected data storage policy does not allow deduplication of the data object; and performing deduplication on the data object if the selected data storage policy allows deduplication of the data object, including for each data chunk of the data object; initializing the data chunk reference count data and storing the data chunk in the deduplication pool if a previously stored identical copy of the data chunk does not exist in the deduplication pool, updating the reference count data of and creating a pointer to a previously stored identical copy of the data chunk if the previously stored identical copy of the data chunk exists in the deduplication pool and has a reference count less than the selected data storage policy maximum reference count, and initializing the data chunk reference count data and storing the data chunk in the deduplication pool if each previously stored identical copy of the data chunk existing within the deduplication pool contains a reference count equal to or greater than the selected data storage policy maximum reference count. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A method for applying a deduplication strategy to a data object based on a data storage policy, comprising:
-
defining a first data storage policy for a deduplication pool; classifying a first data object within the first data storage policy; dividing the first data object into a plurality of data chunks, each data chunk having reference count data to track a number of references by storage policy thereto; storing each data chunk of the first data object in the deduplication pool in accordance with the first data storage policy; defining a second data storage policy for the deduplication pool, the second data storage policy including a maximum reference count for data chunks; classifying a second data object within the second data storage policy; dividing the second data object into a plurality of data chunks, each data chunk having reference count data to track a number of references by storage policy thereto; and performing deduplication on the second data object in accordance with the second data storage policy, including for each data chunk of the second data object; initializing the data chunk reference count data and storing the data chunk in the deduplication pool if a previously stored identical copy of the data chunk does not exist in the deduplication pool, updating the reference count data of and creating a reference to a previously stored identical copy of the data chunk if the previously stored identical copy of the data chunk exists in the deduplication pool and has a reference count less than the second data storage policy maximum reference count, and initializing the data chunk reference count data and storing the data chunk in the deduplication pool if each previously stored identical copy of the data chunk existing within the deduplication pool contains a reference count equal to or greater than the second data storage policy maximum reference count.
-
-
11. A method for applying a deduplication strategy to a data object based on a data storage policy, comprising:
performing deduplication on the data object if the data storage policy associated with the data object allows deduplication, including for each data chunk of the data object; updating reference count data of and creating a reference to a previously stored identical copy of the data chunk if the previously stored identical copy of the data chunk has a reference count less than a maximum reference count of the data storage policy, and storing the data chunk and initializing the reference count data of the data chunk if each previously stored identical copy of the data chunk contains a reference count equal to or greater than the maximum reference count of the data storage policy.
-
12. A computer program product comprising a computer useable medium having a computer readable program for applying a deduplication strategy to a data object based on a data storage policy, wherein the computer readable program when executed on a computer causes the computer to:
-
define a plurality of data storage policies for a deduplication pool, each data storage policy containing settings including a maximum reference count for data chunks; classify the data object within a selected data storage policy; divide the data object into a plurality of data chunks, each data chunk having reference count data to track a number of references thereto; store each data chunk of the data object in the deduplication pool if the selected data storage policy does not allow deduplication of the data object; and perform deduplication on the data object if the selected data storage policy allows deduplication of the data object, including for each data chunk of the data object; initializing the data chunk reference count data and storing the data chunk in the deduplication pool if a previously stored identical copy of the data chunk does not exist in the deduplication pool, updating the reference count data of and creating a pointer to a previously stored identical copy of the data chunk if the previously stored identical copy of the data chunk exists in the deduplication pool and has a reference count less than the selected data storage policy maximum reference count, and initializing the data chunk reference count data and storing the data chunk in the deduplication pool if each previously stored identical copy of the data chunk existing within the deduplication pool contains a reference count equal to or greater than the selected data storage policy maximum reference count. - View Dependent Claims (13, 14, 15, 16, 17, 18)
-
-
19. A system, comprising:
-
at least one processor; and at least one memory storing instructions operable with the at least one processor for applying a deduplication strategy to a data object based on a data storage policy, the instructions being executed for; defining a plurality of data storage policies for a deduplication pool, each data storage policy containing settings including a maximum reference count for data chunks; classifying the data object within a selected data storage policy; dividing the data object into a plurality of data chunks, each data chunk having a reference count data to track a number of references thereto; storing each data chunk of the data object in the deduplication pool if the selected data storage policy does not allow deduplication of the data object; and performing deduplication on the data object if the selected data storage policy allows deduplication of the data object, including for each data chunk of the data object; initializing the data chunk reference count data and storing the data chunk in the deduplication pool if a previously stored identical copy of the data chunk does not exist in the deduplication pool, updating the reference count data of and creating a pointer to a previously stored identical copy of the data chunk if the previously stored identical copy of the data chunk exists in the deduplication pool and has a reference count less than the selected data storage policy maximum reference count, and initializing the data chunk reference count data and storing the data chunk in the deduplication pool if each previously stored identical copy of the data chunk existing within the deduplication pool contains a reference count equal to or greater than the selected data storage policy maximum reference count. - View Dependent Claims (20, 21, 22, 23, 24, 25)
-
Specification