APPLICATION-AWARE AND REMOTE SINGLE INSTANCE DATA MANAGEMENT
First Claim
1. A method for use with a data storage system, the method comprising:
- determining an application that generated a file,wherein the file includes multiple discrete application-specific data objects;
based on the determination of the application, identifying at least some of the multiple discrete application-specific data objects within the data,wherein the identifying includes parsing the file using an already existing module particular to the application that generated the file, andwherein the module does not expose the format of the file; and
for at least a first one of the identified multiple discrete application-specific data objects;
generating a substantially unique identifier that represents the first discrete application-specific data object;
based on the generated substantially unique identifier, determining whether an instance of the first discrete application-specific data object is already stored in a data storage system; and
if an instance of the first discrete application-specific data object is not already stored in the data storage system, then storing the first discrete application-specific data object in the data storage system.
3 Assignments
0 Petitions
Accused Products
Abstract
A method and system for reducing storage requirements and speeding up storage operations by reducing the storage of redundant data includes receiving a request that identifies one or more files or data objects to which to apply a storage operation. For each file or data object, the storage system determines if the file or data object contains data that matches another file or data object to which the storage operation was previously applied, based on awareness of the application that created the data object. If the data objects do not match, then the storage system performs the storage operation in a usual manner. However, if the data objects do match, then the storage system may avoid performing the storage operation with respect to the particular file or data object.
54 Citations
24 Claims
-
1. A method for use with a data storage system, the method comprising:
-
determining an application that generated a file, wherein the file includes multiple discrete application-specific data objects; based on the determination of the application, identifying at least some of the multiple discrete application-specific data objects within the data, wherein the identifying includes parsing the file using an already existing module particular to the application that generated the file, and wherein the module does not expose the format of the file; and for at least a first one of the identified multiple discrete application-specific data objects; generating a substantially unique identifier that represents the first discrete application-specific data object; based on the generated substantially unique identifier, determining whether an instance of the first discrete application-specific data object is already stored in a data storage system; and if an instance of the first discrete application-specific data object is not already stored in the data storage system, then storing the first discrete application-specific data object in the data storage system. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A system for managing application-generated data objects, the system comprising:
-
a processor; a data object identification component configured to identify application-generated data objects included within a logical data container; an application data extraction component configured to extract the identified application-generated data objects from the logical data container; an index configured to store substantially unique identifiers; and
,a data store configured to store a subset of the application-generated data objects, wherein the subset includes extracted application-generated data objects whose substantially unique identifiers were not determined to be stored in the index, wherein only a single instance of an extracted application-generated data object is stored in the data store; and wherein the data object identification component is further configured to determine the application that created the logical data container; and wherein the data object identification component is further configured to utilize results of the determination made by the data object identification component to invoke an already existing module to parse the logical data container in order to identify the application-generated data objects included within the logical data container, wherein the module is particular to the application that created the logical data container, and wherein the module is configured to avoid exposing the format of the logical data container. - View Dependent Claims (11, 12, 13, 14)
-
-
15. A non-transitory computer-readable storage medium whose contents cause a computer system to perform an operation of storing application-specific data objects, the operation comprising:
-
determining a first application that created a first file; based on the determination for the first file, selecting an object model that is particular to the first application; determining a second application that created a second file; based on the determination for the second file, selecting an already created module that is particular to the second application; utilizing the object model that is particular to the first application to identify data objects within the first file; utilizing the already created module that is particular to the second application to parse the second file in order to identify data objects within the second file, wherein the already created module differs from the object model; determining whether the identified data objects in the first and second files are already stored in a data store; utilizing the object model that is particular to the first application to extract the identified data objects in the first file that are not already stored in the data store; utilizing the already created object model that is particular to the second application module to extract the identified data objects in the second file that are not already stored in the data store; and causing the extracted data objects to be stored in the data store. - View Dependent Claims (16, 17, 18, 19, 20, 21, 22, 23, 24)
-
Specification