×

System and method for data de-duplication

  • US 8,943,024 B1
  • Filed: 01/16/2004
  • Issued: 01/27/2015
  • Est. Priority Date: 01/17/2003
  • Status: Active Grant
First Claim
Patent Images

1. A method utilizing one or more computer systems for extracting data directly from various distributed media and storing the data at a target location, the target location comprising one or more computer memory devices, without unnecessary duplication of identical data, comprising the steps of:

  • reading components of data in a plurality of files stored in a plurality of data storage devices, wherein the components of data include a content for each file, a location of the file as to where it is stored in the plurality of data storage devices, and a metadata for each file;

    comparing the content with previously stored content, the previously stored content stored in a target location, the target location comprising one or more data storage devices, wherein comparing the content comprises generating a digital signature from the content and comparing the digital signature of the content with a digital signature of previously stored content;

    storing the content at the target location only if substantially identical content is not found;

    comparing the metadata to previously stored metadata, the previously stored metadata stored in a target location, the target location comprising one or more data storage devices;

    storing the metadata at the target location only if substantially identical metadata is not found, wherein the stored metadata is associated with the substantially identical content if it is found, and the stored content if substantially identical content is not found;

    associating the substantially identical metadata with the stored content if substantially identical metadata is found and substantially identical content is not found;

    comparing the location of the file to a previously stored location, the previously stored location stored in a target location and storing the location if a substantially identical location is not found, wherein the location is associated with the metadata or content of the file; and

    applying retention criteria to insure that any data stored meets the criteria for the retention criteria.

View all claims
  • 20 Assignments
Timeline View
Assignment View
    ×
    ×