Application-aware and remote single instance data management
First Claim
1. A method of storing application-specific data objects included within a file in a data storage system, the method comprising:
- receiving a request to store data contained in a file generated by an application,wherein the data includes multiple discrete application-specific data objects having differing sizes;
determining the application that generated the file that includes the multiple discrete application-specific data objects;
based on the determination of the application, identifying at least some of the multiple discrete application-specific data objects within the data,wherein the identifying includes parsing the file using an already existing module particular to the application that generated the file, andwherein the module does not expose the format of the file; and
for at least a first one of the identified multiple discrete application-specific data objects;
generating a substantially unique identifier that represents the first discrete application-specific data object;
based on the generated substantially unique identifier, determining whether an instance of the first discrete application-specific data object is already stored in a data storage system; and
if an instance of the first discrete application-specific data object is not already stored in the data storage system, then storing the first discrete application-specific data object in the data storage system.
4 Assignments
0 Petitions
Accused Products
Abstract
A method and system for reducing storage requirements and speeding up storage operations by reducing the storage of redundant data includes receiving a request that identifies one or more files or data objects to which to apply a storage operation. For each file or data object, the storage system determines if the file or data object contains data that matches another file or data object to which the storage operation was previously applied, based on awareness of the application that created the data object. If the data objects do not match, then the storage system performs the storage operation in a usual manner. However, if the data objects do match, then the storage system may avoid performing the storage operation with respect to the particular file or data object.
-
Citations
23 Claims
-
1. A method of storing application-specific data objects included within a file in a data storage system, the method comprising:
-
receiving a request to store data contained in a file generated by an application, wherein the data includes multiple discrete application-specific data objects having differing sizes; determining the application that generated the file that includes the multiple discrete application-specific data objects; based on the determination of the application, identifying at least some of the multiple discrete application-specific data objects within the data, wherein the identifying includes parsing the file using an already existing module particular to the application that generated the file, and wherein the module does not expose the format of the file; and for at least a first one of the identified multiple discrete application-specific data objects; generating a substantially unique identifier that represents the first discrete application-specific data object; based on the generated substantially unique identifier, determining whether an instance of the first discrete application-specific data object is already stored in a data storage system; and if an instance of the first discrete application-specific data object is not already stored in the data storage system, then storing the first discrete application-specific data object in the data storage system. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A system for managing application-generated data objects, the system comprising:
-
a processor; a storage operation manager component, coupled to the processor, configured to receive a request to perform a storage operation on a logical data container, wherein the logical data container includes data objects generated by one or more applications; a data object identification component configured to identify the application-generated data objects included within the logical data container; an application data extraction component configured to extract the identified application-generated data objects from the logical data container; an identifier generation component configured to generate substantially unique identifiers for the extracted application-generated data objects; an index configured to store substantially unique identifiers; an identifier comparison component configured to determine whether the generated substantially unique identifiers are already stored in the index; and a single instance data store configured to communicate with the identifier comparison component and store a subset of the extracted application-generated data objects, the subset including the extracted application-generated data objects whose substantially unique identifiers were not determined to be stored in the index, wherein only a single instance of an extracted application-generated data object is stored in the single instance data store; and wherein the data object identification component is further configured to determine the application that created the logical data container; and wherein the data object identification component is further configured to utilize the results of the determination made by the data object identification component to invoke an already existing module to parse the logical data container in order to identify the application-generated data objects included within the logical data container, wherein the module is particular to the application that created the logical data container and wherein the module is configured to avoid exposing the format of the logical data container. - View Dependent Claims (11, 12, 13, 14)
-
-
15. A non-transitory computer-readable storage medium whose contents cause a computer system to perform operation of storing application-specific data objects, the operation comprising:
-
receiving a first file that was created by a first application in a first format, wherein the first file contains multiple data objects; receiving a second file that was created by a second, different application in a second format, wherein the second file contains multiple data objects, and w herein the second format differs from the first format; determining the application that created the first file and based on that determination, selecting an object model that is particular to the first application; determining the application that created the second file and based on that determination, selecting an already created module that is particular to the second application; utilizing the object model that is particular to the first application to identify the data objects within the first file; utilizing the already created module that is particular to the second application to parse the second file in order to identify the data objects within the second file, wherein the already created module differs from the object model, and wherein the module avoids exposing the second format of the second file; generating substantially unique identifiers for the identified data objects within the first and second files; determining whether the identified data objects in the first and second files are already stored in a single instance data store; for each of the identified data objects in the first and second files that are already stored in the single instance data store, adding a reference in an index to the already stored data object; utilizing the object model that is particular to the first application to extract the identified data objects in the first file that are not already stored in the single instance data store from the first file; utilizing the already created object model that is particular to the second application module to extract the identified data objects in the second file that are not already stored in the single instance data store from the second file; and storing the extracted data objects in the single instance data store. - View Dependent Claims (16, 17, 18, 19, 20, 21, 22, 23)
-
Specification