Deduplicating data for a data storage system using similarity determinations
First Claim
1. A computing device comprising:
- a memory resource to store instructions;
one or more processors using the instructions stored in the memory resource to;
receive data to be stored at a data storage system;
determine similarities between the received data and data stored on each of a plurality of storage elements at the data storage system;
select a first storage element and a second storage element of the plurality of storage elements based on the first storage element having a highest similarity and the second storage element having a next highest similarity;
write the received data to the first storage element, wherein the received data is deduplicated with data stored on the first storage element; and
write a replica of the received data to the second storage element, wherein the replica is deduplicated with data stored on the second storage element.
1 Assignment
0 Petitions
Accused Products
Abstract
A method and system for deduplicating data for a data storage system using similarity determinations are described. A tape library is arranged in a hierarchy of tape groups and tape plexes. Tape groups are an admin visible entity and are comprised of multiple tape plexes (at least equal to the number of replicas in a tape group). Tape plexes in turn comprise multiple tape cartridges. Data files and objects received within a time period are initially staged in a disk cache where they are logically segregated into cliques based on their expected deduplication ratios. These cliques are then evaluated for the amount of duplication they have with data existing in tape plexes. Based on the number of replicas being written, the top few tape plexes are selected from within the tape group. The cliques are deduplicated with data on the selected tape plexes, compressed, and written to tape.
-
Citations
20 Claims
-
1. A computing device comprising:
-
a memory resource to store instructions; one or more processors using the instructions stored in the memory resource to; receive data to be stored at a data storage system; determine similarities between the received data and data stored on each of a plurality of storage elements at the data storage system; select a first storage element and a second storage element of the plurality of storage elements based on the first storage element having a highest similarity and the second storage element having a next highest similarity; write the received data to the first storage element, wherein the received data is deduplicated with data stored on the first storage element; and write a replica of the received data to the second storage element, wherein the replica is deduplicated with data stored on the second storage element. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A method of writing data in a data storage system, the method being implemented by one or more processors and comprising:
-
receiving data to be stored at the data storage system; determining similarities between the received data and data stored on each of a plurality of storage elements at the data storage system; selecting a first storage element and a second storage element of the plurality of storage elements based on the first storage element having a highest similarity and the second storage element having a next highest similarity; writing the received data to the first storage element, wherein the received data is deduplicated with data stored on the first storage element; and writing a replica of the received data to the second storage element, wherein the replica is deduplicated with data stored on the second storage element. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A non-transitory computer-readable medium that stores instructions, executable by one or more processors, to cause the one or more processors to perform operations that comprise:
-
receiving data to be stored at a data storage system; determining similarities between the received data and data stored on each of a plurality of storage elements at the data storage system; selecting a first storage element and a second storage element of the plurality of storage elements based on the first storage element having a highest similarity and the second storage element having a next highest similarity; writing the received data to the first storage element, wherein the received data is deduplicated with data stored on the first storage element; and writing a replica of the received data to the second storage element, wherein the replica is deduplicated with data stored on the second storage element. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification