SYSTEM AND METHOD FOR STORING REDUNDANT INFORMATION
First Claim
1. A method in a source computer for reducing redundant storage of a data object, the method comprising:
- receiving at the source computer a first request from a server computer to perform a storage operation on multiple data objects;
processing at the source computer the data objects specified by the request to produce a hash of each data object, wherein the hash of each data object provides an identifier of the data object that can be compared with identifiers of other data objects to determine if the data objects match;
sending from the source computer in response to the first request the hash of each data object produced by the source computer;
receiving at the source computer a second request from the server computer to send each data object for which the hash sent does not identify a data object previously processed by the server computer and not to send each data object for which the hash sent identifies a data object previously processed by the server computer.
4 Assignments
0 Petitions
Accused Products
Abstract
A method and system for reducing storage requirements and speeding up storage operations by reducing the storage of redundant data includes receiving a request that identifies one or more data objects to which to apply a storage operation. For each data object, the storage system determines if the data object contains data that matches another data object to which the storage operation was previously applied. If the data objects do not match, then the storage system performs the storage operation in a usual manner. However, if the data objects do match, then the storage system may avoid performing the storage operation.
-
Citations
20 Claims
-
1. A method in a source computer for reducing redundant storage of a data object, the method comprising:
-
receiving at the source computer a first request from a server computer to perform a storage operation on multiple data objects; processing at the source computer the data objects specified by the request to produce a hash of each data object, wherein the hash of each data object provides an identifier of the data object that can be compared with identifiers of other data objects to determine if the data objects match; sending from the source computer in response to the first request the hash of each data object produced by the source computer; receiving at the source computer a second request from the server computer to send each data object for which the hash sent does not identify a data object previously processed by the server computer and not to send each data object for which the hash sent identifies a data object previously processed by the server computer. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A system for reducing redundant copies of files in a storage environment, the system comprising:
-
a hash receiving component configured to receive digest values from client computer systems, wherein the digest values provide a summary of one or more files stored on the client computer systems, wherein the digest values are computed before a request to perform a data storage operation on the one or more files is received; a hash indexing component configured to maintain an index of digest values for files managed by the system; a hash comparison component configured to compare received digest values from a client computer with digest values maintained by the index; and a storage operation component configured to perform storage operations based on the result of the comparison of the digest values. - View Dependent Claims (9, 10, 11, 12, 13)
-
-
14. A computer-readable medium containing instructions for controlling a computer system to reduce redundant data, by a method comprising:
-
receiving a list of files from a client computer, wherein the list contains information for each file for determining if other instances of the file are stored within the system, wherein the information comprises at least a hash value; comparing the list of files and the hash values received with an index of files stored by the system, wherein the index contains hash values for the first instance of each of the files stored by the system; for each file in the list of files for which the hash value of the file matches a hash value in the index, storing a reference to the file at a destination location; and for each file in the list of files for which the hash value of the file does not match any hash value in the index, storing the file at the destination location and updating the index. - View Dependent Claims (15, 16, 17, 18, 19, 20)
-
Specification