DE-DUPLICATION SYSTEMS AND METHODS FOR APPLICATION-SPECIFIC DATA
First Claim
1. A system for managing application-generated data objects, the system comprising:
- a first backup agent executing on a first client device, the first backup agent being configured to, in response to a storage operation request,parse first application-specific data of the first client device that is the subject of the storage operation request, the first application-specific data comprising a plurality of first data objects having different sizes,identify, based at least on an application that generated the first application-specific data, one or more of the plurality of first data objects of the first application-specific data to be considered for de-duplication, wherein the one of more first data objects comprises less than the entire first application-specific data, andinsert at least one indicator in the first application-specific data that identifies at least one location of the one or more first data objects within the first application-specific data; and
a de-duplication module executing on a computing device, the de-duplication module being in communication with the first backup agent to receive the first application-specific data and to,process each of the one or more data objects, andbased on said processing, determine if a duplicate copy of any of the one or more first data objects exists in a storage device.
6 Assignments
0 Petitions
Accused Products
Abstract
Content-aware systems and methods for improving de-duplication, or single instancing, in storage operations. In certain examples, backup agents on client devices parse application-specific data to identify data objects that are candidates for de-duplication. The backup agents can then insert markers or other indictors in the data that identify the location(s) of the particular data objects. Such markers can, in turn, assist a de-duplication manager to perform object-based de-duplication and increase the likelihood that like blocks within the data are identified and single instanced. In other examples, the agents can further determine if a data object of one file type can or should be single-instanced with a data object of a different file type. Such processing of data on the client side can provide for more efficient storage and back-end processing.
161 Citations
20 Claims
-
1. A system for managing application-generated data objects, the system comprising:
-
a first backup agent executing on a first client device, the first backup agent being configured to, in response to a storage operation request, parse first application-specific data of the first client device that is the subject of the storage operation request, the first application-specific data comprising a plurality of first data objects having different sizes, identify, based at least on an application that generated the first application-specific data, one or more of the plurality of first data objects of the first application-specific data to be considered for de-duplication, wherein the one of more first data objects comprises less than the entire first application-specific data, and insert at least one indicator in the first application-specific data that identifies at least one location of the one or more first data objects within the first application-specific data; and a de-duplication module executing on a computing device, the de-duplication module being in communication with the first backup agent to receive the first application-specific data and to, process each of the one or more data objects, and based on said processing, determine if a duplicate copy of any of the one or more first data objects exists in a storage device. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A method for managing application-generated data objects, the method comprising:
-
receiving a first storage operation request for first data generated by a first application executing on a first client device, the first data comprising a plurality of first data objects having non-uniform sizes; parsing the first data to identify one or more of the plurality of first data objects to be considered for de-duplication, wherein the one of more first data objects comprises less than the entire first data; inserting at least one indicator in the first data that identifies at least one location of the one or more first data objects within the first data; processing each of the one or more first data objects to determine if a duplicate copy of the one or more first data objects exists in at least one storage device; and for each of the one or more first data objects, if a duplicate copy does not exist in the at least one storage device, storing the first data object in the at least one storage device, otherwise, storing at least one of a stub file and a pointer in place of the first data object in the at least one storage device. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
-
-
17. A method for managing application-generated data objects, the method comprising:
-
receiving a first storage operation request for first data generated by a first application executing on a first client device, the first data comprising a plurality of first data objects; parsing the first data to identify one or more of the plurality of first data objects to be considered for de-duplication; receiving a second storage operation request for second data generated by a second application, the second data comprising a plurality of second data objects, and the second data having a different file format than the first data; parsing the second data to identify one or more of the plurality of second data objects to be considered for de-duplication; and inserting in at least one of a copy of the first data and a copy of the second data an indicator that denotes, based on the first and second file formats, that the one or more first data objects and the one or more second data objects should not be considered together for de-duplication. - View Dependent Claims (18)
-
-
19. A system for managing application-generated data objects, the system comprising:
-
means for receiving a first storage operation request for first data generated by a first application executing on a first client device, the first data comprising a plurality of first data objects having differing sizes; means for parsing the first data to identify one or more of the plurality of first data objects to be considered for de-duplication, wherein the one of more first data objects comprises less than the entire first data; means for inserting at least one indicator in the first data that identifies at least one location of the one or more first data objects within the first data; means for processing each of the one or more first data objects to determine if a duplicate copy of the first data object exists in at least one storage device; and means for (i) storing each of the one or more first data objects in the at least one storage device that does not have a duplicate copy and (ii) storing at least one of a stub and a pointer in place of each of the one or more first data objects that does have a duplicate copy in the at least one storage device. - View Dependent Claims (20)
-
Specification