Application-aware and remote single instance data management
First Claim
1. A system for copying files from a computer system at a first location to a second location, the system comprising:
- a processor;
a storage operation manager component, coupled to the processor, that is configured to receive a request to copy a file or data object from a computer system at a first location to a second location, wherein the first location and the second location are geographically remote from each other;
a file cache component at the first location configured to;
receive the file or data object to be copied from the computer system; and
store the file or data object before it is copied to the second location; and
a single instance database component at the first location configured to;
generate a substantially unique identifier for the file or data object;
extract metadata associated with the file or data object,wherein the extracted metadata describes at least three of the following;
permissions for the file or data object,a property of the file or data object,an access control list for the file or data object,an identifier for the file or data object,a size of the file or data object,a creation date for the file or data object, andan access date for the file or data object;
query the second location to determine whether the file or data object is already stored at the second location, including sending the generated substantially unique identifier and extracted metadata to the second location;
in response to the query, receive a single response from the second location that indicates whether the file or data object is already stored at the second location and that indicates whether the generated substantially unique identifier and extracted metadata match a substantially unique identifier and extracted metadata from any files or data objects stored at the second location;
when the file or data object is not already stored at the second location, copy the file or data object from the file cache component to the second location; and
when the file or data object is already stored at the second location and the extracted metadata does not match extracted metadata from the stored file or data object, copy the extracted metadata to the second location, thereby resulting in storing two instances of metadata at the second location for a single stored instance for that file or data object;
wherein the single instance database component copies a file or data object from the file cache component to the second location as part of a continuous data replication operation that automatically saves copies of all changes made to the file or data object.
4 Assignments
0 Petitions
Accused Products
Abstract
A method and system for reducing storage requirements and speeding up storage operations by reducing the storage of redundant data includes receiving a request that identifies one or more files or data objects to which to apply a storage operation. For each file or data object, the storage system determines if the file or data object contains data that matches another file or data object to which the storage operation was previously applied, based on awareness of the application that created the data object. If the data objects do not match, then the storage system performs the storage operation in a usual manner. However, if the data objects do match, then the storage system may avoid performing the storage operation with respect to the particular file or data object.
-
Citations
17 Claims
-
1. A system for copying files from a computer system at a first location to a second location, the system comprising:
-
a processor; a storage operation manager component, coupled to the processor, that is configured to receive a request to copy a file or data object from a computer system at a first location to a second location, wherein the first location and the second location are geographically remote from each other; a file cache component at the first location configured to; receive the file or data object to be copied from the computer system; and store the file or data object before it is copied to the second location; and a single instance database component at the first location configured to; generate a substantially unique identifier for the file or data object; extract metadata associated with the file or data object, wherein the extracted metadata describes at least three of the following; permissions for the file or data object, a property of the file or data object, an access control list for the file or data object, an identifier for the file or data object, a size of the file or data object, a creation date for the file or data object, and an access date for the file or data object; query the second location to determine whether the file or data object is already stored at the second location, including sending the generated substantially unique identifier and extracted metadata to the second location; in response to the query, receive a single response from the second location that indicates whether the file or data object is already stored at the second location and that indicates whether the generated substantially unique identifier and extracted metadata match a substantially unique identifier and extracted metadata from any files or data objects stored at the second location; when the file or data object is not already stored at the second location, copy the file or data object from the file cache component to the second location; and when the file or data object is already stored at the second location and the extracted metadata does not match extracted metadata from the stored file or data object, copy the extracted metadata to the second location, thereby resulting in storing two instances of metadata at the second location for a single stored instance for that file or data object; wherein the single instance database component copies a file or data object from the file cache component to the second location as part of a continuous data replication operation that automatically saves copies of all changes made to the file or data object. - View Dependent Claims (2, 3, 4)
-
-
5. A method for copying files from a computer system at a first location to a second location, the method comprising:
-
receiving a request to copy a file or data object from a computer system at a first location to a second location, wherein the first location and the second location are geographically remote from each other; at a file cache component at the first location; receiving the file or data object to be copied from the computer system; storing the file or data object before it is copied to the second location; at a single instance database component at the first location; generating a substantially unique identifier for the file or data object; extracting metadata associated with the file or data object, wherein the extracted metadata describes at least three of the following;
permissions for the file or data object, a property of the file or data object, an access control list for the file or data object, an identifier for the file or data object, a size of the file or data object, a creation date for the file or data object, and an access date for the file or data object;querying the second location to determine whether the file or data object is already stored at the second location, including sending the generated substantially unique identifier and extracted metadata to the second location; in response to the query, receiving a single response from the second location that indicates whether the file or data object is already stored at the second location and that indicates whether the generated substantially unique identifier and extracted metadata match a substantially unique identifier and extracted metadata from any files or data objects stored at the second location; when the file or data object is not already stored at the second location, copying the file or data object from the file cache component to the second location as part of a continuous data replication operation that automatically saves copies of all changes made to the file or data object; and when the file or data object is already stored at the second location and the extracted metadata does not match extracted metadata from the file or data object already stored at the second location, copying the extracted metadata to the second location, thereby resulting in storing two instances of metadata for a single stored instance for that file or data object. - View Dependent Claims (6, 7, 8)
-
-
9. A non-transitory computer-readable storage medium containing instructions for controlling a computer system to copy files from a computer system at a first location to a second location, by a method comprising:
-
receiving a request to copy a file or data object from a computer system at a first location to a second location, wherein the first location and the second location are geographically remote from each other; at a file cache component at the first location; receiving the file or data object to be copied from the computer system; storing the file or data object before it is copied to the second location; at a single instance database component at the first location; generating a substantially unique identifier for the file or data object; extracting metadata associated with the file or data object, wherein the extracted metadata describes at least three of;
permissions for the file or data object, a property of the file or data object, an access control list for the file or data object, an identifier for the file or data object, a size of the file or data object, a creation date for the file or data object, and an access date for the file or data object;querying the second location to determine whether the file or data object is already stored at the second location, including sending the generated substantially unique identifier and extracted metadata to the second location; in response to the query, receiving a single response from the second location that indicates whether the file or data object is already stored at the second location and that indicates whether the generated substantially unique identifier and extracted metadata match a substantially unique identifier and extracted metadata from any files or data objects stored at the second location; when the file or data object is not already stored at the second location, copying the file or data object from the file cache component to the second location as part of a continuous data replication operation that automatically saves copies of all changes made to the file or data object; and when the file or data object is already stored at the second location and the extracted metadata does not match extracted metadata from the file or data object already stored at the second location, copying the extracted metadata to the second location, thereby resulting in storing two instances of metadata for a single stored instance for that file or data object. - View Dependent Claims (10, 11, 12)
-
-
13. A system for copying data objects from a computer system at a first location to a second location, the system comprising:
-
a processor; a storage operation manager component, coupled to the processor, that is configured to receive a request to copy a data object from a computer system at a first location to a second location; a file cache component at the first location configured to; receive the data object to be copied from the computer system; and store the data object before it is copied to the second location; and a single instance database component at the first location configured to; generate a hash for the data object; extract metadata associated with the data object; query the second location to determine whether the data object is already stored at the second location, including sending the hash and extracted metadata to the second location; receive, in response to the query, a response from the second location that indicates whether the data object is already stored at the second location and that indicates whether the generated hash and extracted metadata match hashes and extracted metadata from any data objects stored at the second location; copy, when the data object is not already stored at the second location, the data object from the file cache component to the second location; and copy, when the data object is already stored at the second location and the extracted metadata does not match extracted metadata from the stored file or data object, the extracted metadata to the second location, thereby resulting in storing two instances of metadata at the second location for a single stored instance for that data object. - View Dependent Claims (14, 15, 16, 17)
-
Specification