System and method for data de-duplication
First Claim
1. A method utilizing one or more computer systems for extracting data directly from various distributed media and storing the data at a target location, the target location comprising one or more computer memory devices, without unnecessary duplication of identical data, comprising the steps of:
- reading components of data in a plurality of files stored in a plurality of data storage devices, wherein the components of data include a content for each file, a location of the file as to where it is stored in the plurality of data storage devices, and a metadata for each file;
comparing the content with previously stored content, the previously stored content stored in a target location, the target location comprising one or more data storage devices, wherein comparing the content comprises generating a digital signature from the content and comparing the digital signature of the content with a digital signature of previously stored content;
storing the content at the target location only if substantially identical content is not found;
comparing the metadata to previously stored metadata, the previously stored metadata stored in a target location, the target location comprising one or more data storage devices;
storing the metadata at the target location only if substantially identical metadata is not found, wherein the stored metadata is associated with the substantially identical content if it is found, and the stored content if substantially identical content is not found;
associating the substantially identical metadata with the stored content if substantially identical metadata is found and substantially identical content is not found;
comparing the location of the file to a previously stored location, the previously stored location stored in a target location and storing the location if a substantially identical location is not found, wherein the location is associated with the metadata or content of the file; and
applying retention criteria to insure that any data stored meets the criteria for the retention criteria.
20 Assignments
0 Petitions
Accused Products
Abstract
A method and system can be used to read and obtain data from a variety of media, regardless of the application used to generate the backup media. The component parts of a file may be read from a medium, including content and metadata pertaining to a file. These pieces of content and metadata may then be stored and associated. To avoid duplication of data, pieces of content and metadata may be compared to previously stored content and metadata. Furthermore, using these same methods and systems the content and metadata of a file may be associated with a location where the file resided. A database which stores these components and allows linking between the various stored components may be particularly useful in implementing embodiments of these methods and systems.
212 Citations
10 Claims
-
1. A method utilizing one or more computer systems for extracting data directly from various distributed media and storing the data at a target location, the target location comprising one or more computer memory devices, without unnecessary duplication of identical data, comprising the steps of:
-
reading components of data in a plurality of files stored in a plurality of data storage devices, wherein the components of data include a content for each file, a location of the file as to where it is stored in the plurality of data storage devices, and a metadata for each file; comparing the content with previously stored content, the previously stored content stored in a target location, the target location comprising one or more data storage devices, wherein comparing the content comprises generating a digital signature from the content and comparing the digital signature of the content with a digital signature of previously stored content; storing the content at the target location only if substantially identical content is not found; comparing the metadata to previously stored metadata, the previously stored metadata stored in a target location, the target location comprising one or more data storage devices; storing the metadata at the target location only if substantially identical metadata is not found, wherein the stored metadata is associated with the substantially identical content if it is found, and the stored content if substantially identical content is not found; associating the substantially identical metadata with the stored content if substantially identical metadata is found and substantially identical content is not found; comparing the location of the file to a previously stored location, the previously stored location stored in a target location and storing the location if a substantially identical location is not found, wherein the location is associated with the metadata or content of the file; and applying retention criteria to insure that any data stored meets the criteria for the retention criteria. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
Specification