SYSTEM AND METHOD FOR STORING REDUNDANT INFORMATION

US 20080243957A1
Filed: 03/28/2008
Published: 10/02/2008
Est. Priority Date: 12/22/2006
Status: Active Grant

First Claim

Patent Images

1. A method in a source computer for reducing redundant storage of a data object, the method comprising:

receiving at the source computer a first request from a server computer to perform a storage operation on multiple data objects;

processing at the source computer the data objects specified by the request to produce a hash of each data object, wherein the hash of each data object provides an identifier of the data object that can be compared with identifiers of other data objects to determine if the data objects match;

sending from the source computer in response to the first request the hash of each data object produced by the source computer;

receiving at the source computer a second request from the server computer to send each data object for which the hash sent does not identify a data object previously processed by the server computer and not to send each data object for which the hash sent identifies a data object previously processed by the server computer.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method and system for reducing storage requirements and speeding up storage operations by reducing the storage of redundant data includes receiving a request that identifies one or more data objects to which to apply a storage operation. For each data object, the storage system determines if the data object contains data that matches another data object to which the storage operation was previously applied. If the data objects do not match, then the storage system performs the storage operation in a usual manner. However, if the data objects do match, then the storage system may avoid performing the storage operation.

Citations

20 Claims

1. A method in a source computer for reducing redundant storage of a data object, the method comprising:
- receiving at the source computer a first request from a server computer to perform a storage operation on multiple data objects;
  
  processing at the source computer the data objects specified by the request to produce a hash of each data object, wherein the hash of each data object provides an identifier of the data object that can be compared with identifiers of other data objects to determine if the data objects match;
  
  sending from the source computer in response to the first request the hash of each data object produced by the source computer;
  
  receiving at the source computer a second request from the server computer to send each data object for which the hash sent does not identify a data object previously processed by the server computer and not to send each data object for which the hash sent identifies a data object previously processed by the server computer.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1 wherein the processing comprises detecting that the data object has changed and computing a new hash of the data object.
  - 3. The method of claim 1 wherein receiving the first request comprises receiving a request to create a single instanced copy of the data object by copying those data objects that differ from data objects already copied and not copying those data objects that are the same as data objects already copied.
  - 4. The method of claim 1 wherein receiving the first request comprises receiving a request to perform a storage operation from a storage manager that manages copying files from multiple source computer systems to one or more storage servers.
  - 5. The method of claim 1 wherein the hash is computed using the SHA hashing algorithm, and wherein the server determines whether the data object is an instance of a previously processed data object by comparing the hash values.
  - 6. The method of claim 1 wherein the second request includes a request to send a reference to each data object for which the hash sent identifies a data object previously processed by the server computer.
  - 7. The method of claim 1 wherein the processing at the client computer of the data object to produce a hash of the data object is performed before the first request is received to spread out the resource load of responding the first request.

8. A system for reducing redundant copies of files in a storage environment, the system comprising:
- a hash receiving component configured to receive digest values from client computer systems, wherein the digest values provide a summary of one or more files stored on the client computer systems, wherein the digest values are computed before a request to perform a data storage operation on the one or more files is received;
  
  a hash indexing component configured to maintain an index of digest values for files managed by the system;
  
  a hash comparison component configured to compare received digest values from a client computer with digest values maintained by the index; and
  
  a storage operation component configured to perform storage operations based on the result of the comparison of the digest values.
- View Dependent Claims (9, 10, 11, 12, 13)
- - 9. The system of claim 8 wherein the storage operation component performs a copy operation and wherein when the digest value for a file to be copied matches a digest value in the index, the copy operation only creates a reference to the file.
  - 10. The system of claim 8 wherein the hash receiving component precomputes the digest values as each file is modified on the client computer systems.
  - 11. The system of claim 8 wherein the hash indexing component receives the digest values produced by a client computer system when the hash comparison component indicates that a particular file does not match any file within the index.
  - 12. The system of claim 8 wherein the hash comparison component compares a hash value and file size to determine whether a file stored on a client computer is an instance of a file tracked within the index.
  - 13. The system of claim 8 wherein the hash indexing component maintains an index of digest values for files stored by at least one of a server managed by the system, a tape media, a client computer system, and an offsite storage location.

14. A computer-readable medium containing instructions for controlling a computer system to reduce redundant data, by a method comprising:
- receiving a list of files from a client computer, wherein the list contains information for each file for determining if other instances of the file are stored within the system, wherein the information comprises at least a hash value;
  
  comparing the list of files and the hash values received with an index of files stored by the system, wherein the index contains hash values for the first instance of each of the files stored by the system;
  
  for each file in the list of files for which the hash value of the file matches a hash value in the index, storing a reference to the file at a destination location; and
  
  for each file in the list of files for which the hash value of the file does not match any hash value in the index, storing the file at the destination location and updating the index.
- View Dependent Claims (15, 16, 17, 18, 19, 20)
- - 15. The computer-readable medium of claim 14 wherein updating the index comprises adding the hash value of the file and information describing the destination location where the file can be located to the index.
  - 16. The computer-readable medium of claim 14 wherein receiving a list of files comprises receiving additional information that in combination with the hash value reduces the possibility of collisions between two hash values of different files.
  - 17. The computer-readable medium of claim 14 wherein receiving a list of files from a client computer comprises receiving a list of files that have changed after a specified event.
  - 18. The computer-readable medium of claim 14 including storing the files at the destination location on sequential media.
  - 19. The computer-readable medium of claim 14 wherein receiving a list of files comprises receiving hash values for portions of each file and wherein comparing the list of files and the hash values received with an index of files stored by the system comprises comparing hash values for portions of each file with hash values for portions of the files stored by the system to determine if at least part of two files match.
  - 20. The computer-readable medium of claim 14 wherein storing a reference to the file at a destination location comprises determining that the file is stored on sequential media and storing an identifier of the sequential media and offset within the sequential media to the file.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
CommVault Systems Incorporated
Original Assignee
CommVault Systems Incorporated
Inventors
Gokhale, Parag, Attarde, Deepak R., Prahlad, Anand, Kottomtharayil, Rajiv, Retnamma, Manoj Kumar Vijayan

Granted Patent

US 7,953,706 B2
Time in Patent Office

Days
Field of Search
US Class Current

707/204
CPC Class Codes

G06F 11/1453   using de-duplication of the...

G06F 16/1748   De-duplication implemented ...

G06F 3/061   Improving I/O performance

G06F 3/0638   Organizing or formatting or...

G06F 3/065   Replication mechanisms

G06F 3/067   Distributed or networked st...

G11B 5/86   Re-recording, i.e. transcri...

SYSTEM AND METHOD FOR STORING REDUNDANT INFORMATION

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

SYSTEM AND METHOD FOR STORING REDUNDANT INFORMATION

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links