×

Method and system to scan data from a system that supports deduplication

  • US 8,832,042 B2
  • Filed: 03/15/2010
  • Issued: 09/09/2014
  • Est. Priority Date: 03/15/2010
  • Status: Active Grant
First Claim
Patent Images

1. A method of providing file information relating to data deduplication in a computer system, comprising:

  • accessing a plurality of files, wherein each file comprises a plurality of segments;

    accessing a data repository storing data resultant from a file deduplication process including accessing a plurality of checksum values associated with said plurality of segments, wherein a first application program performs said file deduplication process;

    identifying segments of said plurality of segments having a same checksum value;

    generating a data association structure by associating said segments of said plurality of segments having said same checksum value, wherein a first checksum value is operable as an index into said data association structure for obtaining segments having said first checksum value;

    storing, using a deduplication database, said data association structure;

    storing a plurality of respective timestamps associated with said plurality of segments, wherein each respective timestamp indicates a last time an associated segment was altered;

    accessing said stored data association structure in said computer memory by a second application program using a received application timestamp and accessing one or more segments of said plurality of segments having an associated timestamp of said plurality of respective timestamps that is newer than the received application timestamp, wherein the second application program identifies, using only said associated timestamp, one or more segments of said plurality of segments that are not processed by the second application program;

    receiving, by the second application program, a listing of files to which a segment of said one or more segments of said plurality of segments belongs; and

    comparing said listing of files against a defined subset of files to exclude from processing to further determine whether said segment needs to be processed.

View all claims
  • 7 Assignments
Timeline View
Assignment View
    ×
    ×