De-duplication systems and methods for application-specific data
First Claim
1. A system for managing application-generated data objects, the system comprising:
- a processor;
a first de-duplication database associated with first application-specific data;
a second de-duplication database associated with second application-specific data;
a first backup agent executing in one or more computer processors on a first client device, the first backup agent being configured to, in response to a storage operation request;
prior to performing block-level de-duplication, parse first and second application-specific data of the first client device that is the subject of the storage operation request, the first and second application-specific data comprising a plurality of first and second data objects having first and second formats; and
prior to performing block-level de-duplication, insert de-duplication indicators in the first and second application-specific data, wherein the inserted de-duplication indicators identify portions within the first and second data objects where de-duplication should start and stop, and wherein the inserted de-duplication indicators further identify which of the first and second de-duplication databases to use in de-duplicating the first and second application-specific data; and
a de-duplication module executing on one or more computer processors and that is configured to perform block-level de-duplication, the de-duplication module being in communication with the first backup agent to receive the first application-specific data and to;
insert the de-duplication indicators by setting or clearing a bit in at least one header of the first application-specific data, wherein the at least one de-duplication indicator comprises an offset value identifying a beginning of the first data objects within the first application-specific data;
use de-duplication indicators to identify where the de-duplication module should start and stop de-duplication of blocks in identified portions of the first and second application-specific data;
based on said inserted de-duplication indicators, determine if a duplicate copy of any of the blocks in the identified portions of the first application-specific data exist in the first de-duplication database; and
based on said inserted de-duplication indicators, determine if a duplicate copy of any of the blocks in the identified portions of the second application-specific data exists in the second de-duplication database.
6 Assignments
0 Petitions
Accused Products
Abstract
Content-aware systems and methods for improving de-duplication, or single instancing, in storage operations. In certain examples, backup agents on client devices parse application-specific data to identify data objects that are candidates for de-duplication. The backup agents can then insert markers or other indictors in the data that identify the location(s) of the particular data objects. Such markers can, in turn, assist a de-duplication manager to perform object-based de-duplication and increase the likelihood that like blocks within the data are identified and single instanced. In other examples, the agents can further determine if a data object of one file type can or should be single-instanced with a data object of a different file type. Such processing of data on the client side can provide for more efficient storage and back-end processing.
240 Citations
14 Claims
-
1. A system for managing application-generated data objects, the system comprising:
-
a processor; a first de-duplication database associated with first application-specific data; a second de-duplication database associated with second application-specific data; a first backup agent executing in one or more computer processors on a first client device, the first backup agent being configured to, in response to a storage operation request; prior to performing block-level de-duplication, parse first and second application-specific data of the first client device that is the subject of the storage operation request, the first and second application-specific data comprising a plurality of first and second data objects having first and second formats; and prior to performing block-level de-duplication, insert de-duplication indicators in the first and second application-specific data, wherein the inserted de-duplication indicators identify portions within the first and second data objects where de-duplication should start and stop, and wherein the inserted de-duplication indicators further identify which of the first and second de-duplication databases to use in de-duplicating the first and second application-specific data; and a de-duplication module executing on one or more computer processors and that is configured to perform block-level de-duplication, the de-duplication module being in communication with the first backup agent to receive the first application-specific data and to; insert the de-duplication indicators by setting or clearing a bit in at least one header of the first application-specific data, wherein the at least one de-duplication indicator comprises an offset value identifying a beginning of the first data objects within the first application-specific data; use de-duplication indicators to identify where the de-duplication module should start and stop de-duplication of blocks in identified portions of the first and second application-specific data; based on said inserted de-duplication indicators, determine if a duplicate copy of any of the blocks in the identified portions of the first application-specific data exist in the first de-duplication database; and based on said inserted de-duplication indicators, determine if a duplicate copy of any of the blocks in the identified portions of the second application-specific data exists in the second de-duplication database. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A method for managing application-generated data objects, the method comprising:
-
storing a first de-duplication database associated with first application-specific data; storing a second de-duplication database associated with second application-specific data; receiving a first storage operation request for first data generated by a first application and second data generated by a second application executing on a first client device, the first and second data comprising a plurality of first and second data objects having first and second format prior to performing block-level de-duplication, inserting de-duplication indicators in the first and second data that identify portions within of the first and second data objects where de-duplication should start and stop, and wherein the inserted de-duplication indicators further identify which of the first and second de-duplication databases to use in de-duplicating the first and second application-specific data; and with a de-duplication module executing on one or more computer processors that is configured to perform block-level de-duplication, the de-duplication module being in communication with a first backup agent to receive the first application-specific data; inserting the de-duplication indicators by setting or clearing a bit in at least one header of the first application-specific data;
wherein the at least one de-duplication indicator comprises an offset value identifying a beginning of the first data objects within the first application-specific data;using the inserted de-duplication indicators to identify where the de-duplication module should start and stop de-duplication of blocks in identified portions within the first and second application-specific data; based on said inserted de-duplication indicators, determining if a duplicate copy of any of the blocks in the identified portions of the first application-specific data exists in the first de-duplication database; and based on said inserted de-duplication indicators, determining if a duplicate copy of any of the identified blocks in the portions of the second application-specific data exists in the second de-duplication database. - View Dependent Claims (8, 9, 10, 11, 12, 13, 14)
-
Specification