System and method for storing redundant information
First Claim
1. A method performed by a computer system for storing data objects to sequential media, wherein the computer system includes a processor and memory, the method comprising:
- receiving a set of data objects from multiple computing systems, wherein the set of data objects includes at least two similar data objects;
for at least some of the data objects in the setdetermining if a copy of a data object is already stored on random-access media, wherein the determining includes accessing an index that contains, for data objects already stored on the random-access mediaan identifier of the data object; and
a location of the data object on the random-access media,wherein the index is stored on the random-access media;
if the data object is already stored on the random-access media, then identifying a location of the data object on the random-access media and storing on the random-access media a reference to the identified location;
if the data object is not already stored on the random-access media, then storing the data object on the random-access media;
receiving a request to copy the data objects and the references stored on the random-access media to sequential media; and
copying to the sequential media, by the computer system, the data objects and reference data related to the references from the random-access media,wherein the reference data on the sequential media refer to locations of the data objects on the sequential media.
3 Assignments
0 Petitions
Accused Products
Abstract
A method and system for reducing storage requirements and speeding up storage operations by reducing the storage of redundant data includes receiving a request that identifies one or more data objects to which to apply a storage operation. For each data object, the storage system determines if the data object contains data that matches another data object to which the storage operation was previously applied. If the data objects do not match, then the storage system performs the storage operation in a usual manner. However, if the data objects do match, then the storage system may avoid performing the storage operation.
702 Citations
16 Claims
-
1. A method performed by a computer system for storing data objects to sequential media, wherein the computer system includes a processor and memory, the method comprising:
-
receiving a set of data objects from multiple computing systems, wherein the set of data objects includes at least two similar data objects; for at least some of the data objects in the set determining if a copy of a data object is already stored on random-access media, wherein the determining includes accessing an index that contains, for data objects already stored on the random-access media an identifier of the data object; and a location of the data object on the random-access media, wherein the index is stored on the random-access media; if the data object is already stored on the random-access media, then identifying a location of the data object on the random-access media and storing on the random-access media a reference to the identified location; if the data object is not already stored on the random-access media, then storing the data object on the random-access media; receiving a request to copy the data objects and the references stored on the random-access media to sequential media; and copying to the sequential media, by the computer system, the data objects and reference data related to the references from the random-access media, wherein the reference data on the sequential media refer to locations of the data objects on the sequential media. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A system for reducing redundant copies of files in a storage environment having sequential media, the system comprising:
-
a storage operation request component configured to receive requests to perform storage operations on files received from multiple computing systems, wherein files are stored on random-access media; a digest generation component configured to compute a digest that provides summary of a file that distinguishes it from other files referred to by storage operation requests; a digest comparison component configured to compare computed digests with previously stored digests to determine if a copy of a file already exists, wherein the comparing includes accessing an index that contains, for files already stored on the random-access media the digest of the file; and a location of the file on the random-access media, wherein the index is stored on the random-access media, and when the file is not already stored on the random-access media, then storing the file on the random-access media; when the file is already stored on the random-access media, then identifying a location of the data object on the random-access media and storing on the random-access media a reference to the identified location; and a single instance data store configured to store computed digests and files on sequential media such that only one instance of the same file is stored in a set of sequential media that contains multiple references to the same file, wherein the single instance data store receives a request to copy files and references stored on the random-access media to the sequential media, and copies to the sequential media files and reference data related to the references from the random-access media, wherein the reference data on the sequential media refer to locations of the files on the sequential media. - View Dependent Claims (10, 11, 12, 13)
-
-
14. A non-transitory computer-readable medium containing instructions for controlling a computer system to recover data, wherein the computer system includes a processor and memory, by a method comprising:
-
retrieving a backup object identifying a data object, wherein the backup object includes at least one sequential data storage medium, and, wherein the at least one sequential data storage medium includes a data structure containing header information identifying whether the backup object contains the data object or contains a reference to the data object, and; determining whether the backup object contains the reference to the data object or contains the data object itself based on the data structure of the at least one sequential data storage medium; when the backup object refers to the data object stored in a location outside of the at least one sequential data storage medium, then locating the data object at the location outside of the at least one sequential data storage medium and copying the data object to a recovery location, wherein the recovery location includes random-access data storage media; and when the backup object contains the data object, copying the data object from the at least one sequential data storage medium to the recovery location, wherein a first instance of each data object is stored as a of the data object and each additional instance is stored as a reference to the copy of the data object, wherein the reference is stored on the at least one sequential data storage medium in the data structure, and, wherein the reference refers to the location on the at least one sequential data storage medium where the copy of the data object is stored. - View Dependent Claims (15, 16)
-
Specification