DEDUPLICATING DATA FOR A DATA STORAGE SYSTEM USING SIMILARITY DETERMINATIONS
First Claim
1. A data storage system comprising:
- a memory resource to store instructions;
one or more processors using the instructions stored in the memory resource to;
receive data to be stored at the data storage system;
determine a similarity between the received data and data stored on each of a plurality of storage elements at the data storage system;
select one or more of the plurality of storage elements based on the determined similarity; and
write the received data to the one or more selected storage elements, including, for each of the selected storage elements, deduplicating the received data with the data stored on that storage element.
1 Assignment
0 Petitions
Accused Products
Abstract
A method and system for deduplicating data for a data storage system using similarity determinations are described. A tape library is arranged in a hierarchy of tape groups and tape plexes. Tape groups are an admin visible entity and are comprised of multiple tape plexes (at least equal to the number of replicas in a tape group). Tape plexes in turn comprise multiple tape cartridges. Data files and objects received within a time period are initially staged in a disk cache where they are logically segregated into cliques based on their expected deduplication ratios. These cliques are then evaluated for the amount of duplication they have with data existing in tape plexes. Based on the number of replicas being written, the top few tape plexes are selected from within the tape group. The cliques are deduplicated with data on the selected tape plexes, compressed, and written to tape.
17 Citations
20 Claims
-
1. A data storage system comprising:
-
a memory resource to store instructions; one or more processors using the instructions stored in the memory resource to; receive data to be stored at the data storage system; determine a similarity between the received data and data stored on each of a plurality of storage elements at the data storage system; select one or more of the plurality of storage elements based on the determined similarity; and write the received data to the one or more selected storage elements, including, for each of the selected storage elements, deduplicating the received data with the data stored on that storage element. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A method of writing data in a data storage system, the method being implemented by one or more processors and comprising:
-
receiving data to be stored at the data storage system; determining a similarity between the received data and data stored on each of a plurality of storage elements at the data storage system; selecting one or more of the plurality of storage elements based on the determined similarity; and writing the received data to the one or more selected storage elements, including, for each of the selected storage elements, deduplicating the received data with the data stored on that storage element. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A non-transitory computer-readable medium that stores instructions, executable by one or more processors, to cause the one or more processors to perform operations that comprise:
-
receiving data to be stored at a data storage system; determining a similarity between the received data and data stored on each of a plurality of storage elements at the data storage system; selecting one or more of the plurality of storage elements based on the determined similarity; and writing the received data to the one or more selected storage elements, including, for each of the selected storage elements, deduplicating the received data with the data stored on that storage element. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification