System and method for storing redundant information
First Claim
Patent Images
1. A method in a source computer for reducing redundant storage of a data object, the method comprising:
- receiving at the source computer a first request from a server computer to perform a storage operation on multiple data objects;
processing at the source computer the data objects specified by the first request to produce a hash, a size, and security information of each data object, wherein the hash of each data object provides an identifier of the data object, and wherein the identifier, the size, and the security information is compared with identifiers, sizes, and security information of other data objects to determine if the data objects match;
sending from the source computer in response to the first request the hash, the size, and the security information of each data object produced by the source computer; and
receiving at the source computer a second request from the server computer to send each data object for which the hash, the size and the security information sent does not identify a data object previously processed by the server computer and not to send each data object for which the hash, the size and the security information sent identifies a data object previously processed by the server computer,wherein, for every data object, the server computer utilizes the hash, the size, and the security information to determine whether the server computer previously processed the data object,wherein the likelihood of collisions, which occur when two data objects containing different data have the same hash, is reduced.
4 Assignments
0 Petitions
Accused Products
Abstract
A method and system for reducing storage requirements and speeding up storage operations by reducing the storage of redundant data includes receiving a request that identifies one or more data objects to which to apply a storage operation. For each data object, the storage system determines if the data object contains data that matches another data object to which the storage operation was previously applied. If the data objects do not match, then the storage system performs the storage operation in a usual manner. However, if the data objects do match, then the storage system may avoid performing the storage operation.
197 Citations
23 Claims
-
1. A method in a source computer for reducing redundant storage of a data object, the method comprising:
-
receiving at the source computer a first request from a server computer to perform a storage operation on multiple data objects; processing at the source computer the data objects specified by the first request to produce a hash, a size, and security information of each data object, wherein the hash of each data object provides an identifier of the data object, and wherein the identifier, the size, and the security information is compared with identifiers, sizes, and security information of other data objects to determine if the data objects match; sending from the source computer in response to the first request the hash, the size, and the security information of each data object produced by the source computer; and receiving at the source computer a second request from the server computer to send each data object for which the hash, the size and the security information sent does not identify a data object previously processed by the server computer and not to send each data object for which the hash, the size and the security information sent identifies a data object previously processed by the server computer, wherein, for every data object, the server computer utilizes the hash, the size, and the security information to determine whether the server computer previously processed the data object, wherein the likelihood of collisions, which occur when two data objects containing different data have the same hash, is reduced. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A system for reducing redundant copies of files in a storage environment, the system comprising:
-
a hash receiving component configured to receive digest values, file sizes, and security information from client computer systems, wherein the digest values provide a summary of one or more files stored on the client computer systems, wherein the digest values are computed before a request to perform a data storage operation on the one or more files is received; a hash indexing component configured to maintain an index of digest values, file sizes, and security information for files managed by the system; a hash comparison component configured to compare received digest values, file sizes, and security information from a client computer with digest values, file sizes, and security information maintained by the index; and a storage operation component configured to perform storage operations based on the result of the comparison of the digest values, wherein the hash comparison component compares a received digest value, a file size, and security information with an index digest value, a file size, and security information to determine whether a file stored on a client computer is an instance of a file tracked within the index, wherein each comparison performed by the hash comparison component utilizes the digest value, the file size, and the security information, wherein the hash comparison component determines that a file stored on a client computer is an instance of a file tracked within the index if the received digest value, file size, and security information match the index digest value, file size, and security information, and wherein the hash comparison component determines that a file stored on a client computer is not an instance of a file tracked within the index if the received digest value matches the index digest value but the received file size does not match the index file size or the received security information does not match the index security information, wherein the likelihood of collisions, which occur when two files containing different data have the same digest value, is reduced, and wherein the storage operation component stores data based at least in part on the result of the comparison of the digest values. - View Dependent Claims (9, 10, 11, 12, 19, 20, 21)
-
-
13. A non-transitory computer-readable storage medium containing instructions for controlling a computer system to reduce redundant data, by a method comprising:
-
receiving a list of files from a client computer, wherein the list contains information for each file for determining if other instances of the file are stored within the system, wherein the information comprises at least a hash value, a file size, and security information; comparing the list of files and the hash values, file sizes, and security information received with an index of files stored by the system, wherein the index contains hash values, file sizes, and security information for the first instance of each of the files stored by the system, wherein the comparison utilizes the hash value, file size, and security information for each file in the list; for each file in the list of files for which the hash value, the file size, and the security information of the file matches a hash value, a file size, and security information in the index, storing a reference to the file at a destination location; and for each file in the list of files for which the hash value, the file size, and the security information of the file does not match any hash value, file size, and security information in the index, storing the file at the destination location and updating the index, wherein the likelihood of collisions, which occur when two files containing different data have the same digest value, is reduced. - View Dependent Claims (14, 15, 16, 17, 18, 22, 23)
-
Specification