DE-DUPLICATION SYSTEMS AND METHODS FOR APPLICATION-SPECIFIC DATA
First Claim
1. A system for creating a backup copy of data, the system comprising:
- computer readable memory comprising at least a first de-duplication database associated with data generated by at least first and second clients;
a de-duplication module executing on one or more computer processors comprising computer hardware, the de-duplication module receives the data and performs de-duplication as part of a backup of the data, the de-duplication module further configured to;
determine if a duplicate copy of a first portion of the data from the first client exists in the first de-duplication database; and
if a duplicate copy does not exist in the first de-duplication database, storing first metadata that identifies the first client in association with the duplicate copy;
determine if a duplicate copy of a second portion of the data from the second client exists in the first de-duplication database;
if a duplicate copy of the second portion of the data exists in the first de-duplication database, removing the duplicate data in the second portion of the data;
determining whether second metadata in the second portion of the data identifies whether the second client is unique; and
if the second metadata is unique, storing the second metadata in association with the duplicate copy in the first de-duplication database, store the first and second metadata associated with the duplicate copy wherein the first metadata that identifies the first client and the second metadata that identifies the second client are stored in association with the duplicate copy.
1 Assignment
0 Petitions
Accused Products
Abstract
Content-aware systems and methods for improving de-duplication, or single instancing, in storage operations. In certain examples, backup agents on client devices parse application-specific data to identify data objects that are candidates for de-duplication. The backup agents can then insert markers or other indictors in the data that identify the location(s) of the particular data objects. Such markers can, in turn, assist a de-duplication manager to perform object-based de-duplication and increase the likelihood that like blocks within the data are identified and single instanced. In other examples, the agents can further determine if a data object of one file type can or should be single-instanced with a data object of a different file type. Such processing of data on the client side can provide for more efficient storage and back-end processing.
83 Citations
20 Claims
-
1. A system for creating a backup copy of data, the system comprising:
-
computer readable memory comprising at least a first de-duplication database associated with data generated by at least first and second clients; a de-duplication module executing on one or more computer processors comprising computer hardware, the de-duplication module receives the data and performs de-duplication as part of a backup of the data, the de-duplication module further configured to; determine if a duplicate copy of a first portion of the data from the first client exists in the first de-duplication database; and if a duplicate copy does not exist in the first de-duplication database, storing first metadata that identifies the first client in association with the duplicate copy; determine if a duplicate copy of a second portion of the data from the second client exists in the first de-duplication database; if a duplicate copy of the second portion of the data exists in the first de-duplication database, removing the duplicate data in the second portion of the data; determining whether second metadata in the second portion of the data identifies whether the second client is unique; and if the second metadata is unique, storing the second metadata in association with the duplicate copy in the first de-duplication database, store the first and second metadata associated with the duplicate copy wherein the first metadata that identifies the first client and the second metadata that identifies the second client are stored in association with the duplicate copy. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A method for creating a backup copy of data, the method comprising:
-
storing a first de-duplication database associated with data generated by at least first and second clients; performing de-duplication of the data as part of a backup of the data; determining if a duplicate copy of a first portion of the data from the first client exists in the first de-duplication database; if a duplicate copy of the first portion of the data does not exist in the first de-duplication database, storing first metadata that identifies the first client in association with the duplicate copy; determining if a duplicate copy of a second portion of the data from the second client exists in the first de-duplication database; if a duplicate copy associated with the second portion of the data exists in the first de-duplication database, removing the duplicate data in the second portion of the data; determining whether second metadata in the second portion of the data identifies whether the second client is unique; and if the second metadata is unique, storing the second metadata in association with the duplicate copy in the first de-duplication database, wherein the backup copy stores the first and second metadata associated with the duplicate copy wherein the first metadata that identifies the first client and the second metadata that identifies the second client are stored in association with the duplicate copy. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
-
Specification