DE-DUPLICATION SYSTEMS AND METHODS FOR APPLICATION-SPECIFIC DATA
First Claim
1. A system for managing application-generated data objects, the system comprising:
- at least a first de-duplication database associated with first application-specific data, the first application-specific data comprising a plurality of data objects;
a first module comprising one or more computer processors configured to insert de-duplication indicators in the first application-specific data,wherein the inserted de-duplication indicators are inserted in at least one header of the first application-specific data,wherein one or more of the de-duplication indicators comprises an offset value identifying a beginning of at least one of the plurality of data objects within the first application-specific data, andwherein the inserted de-duplication indicators further identify that the first de-duplication database is to be used in de-duplicating the first application-specific data;
a second module executing on one or more computer processors configured to perform block-level de-duplication, the second module further configured to;
use the inserted de-duplication indicators in the at least one header to identify the beginning of at least one of the plurality of data objects based on the offset value; and
determine if a duplicate copy of any of the blocks associated with the data objects exist in the first de-duplication database.
4 Assignments
0 Petitions
Accused Products
Abstract
Content-aware systems and methods for improving de-duplication, or single instancing, in storage operations. In certain examples, backup agents on client devices parse application-specific data to identify data objects that are candidates for de-duplication. The backup agents can then insert markers or other indictors in the data that identify the location(s) of the particular data objects. Such markers can, in turn, assist a de-duplication manager to perform object-based de-duplication and increase the likelihood that like blocks within the data are identified and single instanced. In other examples, the agents can further determine if a data object of one file type can or should be single-instanced with a data object of a different file type. Such processing of data on the client side can provide for more efficient storage and back-end processing.
59 Citations
16 Claims
-
1. A system for managing application-generated data objects, the system comprising:
-
at least a first de-duplication database associated with first application-specific data, the first application-specific data comprising a plurality of data objects; a first module comprising one or more computer processors configured to insert de-duplication indicators in the first application-specific data, wherein the inserted de-duplication indicators are inserted in at least one header of the first application-specific data, wherein one or more of the de-duplication indicators comprises an offset value identifying a beginning of at least one of the plurality of data objects within the first application-specific data, and wherein the inserted de-duplication indicators further identify that the first de-duplication database is to be used in de-duplicating the first application-specific data; a second module executing on one or more computer processors configured to perform block-level de-duplication, the second module further configured to; use the inserted de-duplication indicators in the at least one header to identify the beginning of at least one of the plurality of data objects based on the offset value; and determine if a duplicate copy of any of the blocks associated with the data objects exist in the first de-duplication database. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A method for managing application-generated data objects, the method comprising:
-
storing a first de-duplication database associated with first application-specific data, the first application-specific data comprising a plurality of data objects; inserting de-duplication indicators in at least one header of the first application-specific data, wherein one or more of the de-duplication indicators comprises an offset value identifying a beginning of at least one of the plurality of data objects within the first application-specific data, and wherein the inserted de-duplication indicators further identify that the first de-duplication database is to be used in de-duplicating the first application-specific data; using the inserted de-duplication indicators in the at least one header to identify the beginning of at least one of the plurality of data objects based on the offset value; and determining if a duplicate copy of any of the blocks associated with the data objects exist in the first de-duplication database. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
-
Specification