Selective deduplication
First Claim
Patent Images
1. A method comprising:
- determining in a storage system a probability of deduplication for a data object, the probability of deduplication for the data object determined based on a characteristic of the data object, wherein the probability of deduplication for the data object is a statistical projection indicating a likelihood that the data object will provide a storage space benefit as a result of deduplication;
determining in the storage system a deduplication probability threshold, the deduplication probability threshold determined based on a performance metric of the storage system and adjusted based on availability of resources of the storage system and recent performance of the storage system relative to the performance metric;
determining in the storage system whether the probability of deduplication for the data object satisfies the deduplication probability threshold; and
performing a deduplication operation on the data object in the storage system prior to the data object being stored in a persistent storage of the storage system in an event it is determined that the probability of deduplication for the data object satisfies the deduplication probability threshold.
1 Assignment
0 Petitions
Accused Products
Abstract
Methods and apparatuses for performing selective deduplication in a storage system are introduced here. Techniques are provided for determining a probability of deduplication for a data object based on a characteristic of the data object and performing a deduplication operation on the data object in the storage system prior to the data object being stored in persistent storage of the storage system if the probability of deduplication for the data object has a specified relationship to a specified threshold.
232 Citations
27 Claims
-
1. A method comprising:
-
determining in a storage system a probability of deduplication for a data object, the probability of deduplication for the data object determined based on a characteristic of the data object, wherein the probability of deduplication for the data object is a statistical projection indicating a likelihood that the data object will provide a storage space benefit as a result of deduplication; determining in the storage system a deduplication probability threshold, the deduplication probability threshold determined based on a performance metric of the storage system and adjusted based on availability of resources of the storage system and recent performance of the storage system relative to the performance metric; determining in the storage system whether the probability of deduplication for the data object satisfies the deduplication probability threshold; and performing a deduplication operation on the data object in the storage system prior to the data object being stored in a persistent storage of the storage system in an event it is determined that the probability of deduplication for the data object satisfies the deduplication probability threshold. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A data storage system comprising:
-
a processor; and a memory coupled with the processor and including a storage manager that directs the processor to; determine a deduplication probability threshold, the deduplication probability threshold determined based on a performance metric of the data storage system and adjusted based on availability of resources of the data storage system and recent performance of the data storage system relative to the performance metric; determine, prior to a data object being stored in a persistent storage, a probability of deduplication for the data object, the probability of deduplication for the data object determined based on a characteristic of the data object, wherein the probability of deduplication for the data object is a statistical projection indicating a likelihood that the data object will provide a storage space benefit as a result of deduplication; determine whether the probability of deduplication for the data object satisfies the deduplication probability threshold; and perform a deduplication operation on the data object prior to the data object being stored in the persistent storage in an event it is determined that the probability of deduplication for the data object satisfies the deduplication probability threshold. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21)
-
-
22. A method of performing selective deduplication comprising:
-
adaptively adjusting a deduplication probability threshold based on availability of resources in a data storage system and recent performance of the data storage system relative to a performance metric for the data storage system, the deduplication probability threshold determined based on the performance metric; determining a probability of deduplication for a data object based on-a one or more characteristics of the data object, wherein the probability of deduplication for the data object is a statistical projection indicating a likelihood that the data object will provide a storage space benefit as a result of deduplication; determining whether the probability of deduplication for the data object exceeds the deduplication probability threshold; performing a deduplication operation on the data object prior to storing the data object in persistent storage of the data storage system in an event it is determined that the deduplication probability for the data object exceeds the deduplication probability threshold; and performing the deduplication operation on the data object after storing the data object in the persistent storage in an event it is determined that the deduplication probability for the data object does not exceed the deduplication probability threshold. - View Dependent Claims (23, 24, 25, 26, 27)
-
Specification