×

System and method for creating a de-duplicated data set

  • US 8,738,668 B2
  • Filed: 12/16/2010
  • Issued: 05/27/2014
  • Est. Priority Date: 12/16/2009
  • Status: Active Grant
First Claim
Patent Images

1. A method utilizing one or more computer systems for creating a data set without duplication, from data taken from one or more database sources, comprising the steps of:

  • in a first phase, using the one or more computer systems to traverse files contained in one or more custodian containers of the database sources and creating indices of the custodian containers, the indices comprising (i) hash keys representing the data files and (ii) seek information for locating and handling the data files;

    in a second phase, creating at the database sources, a master key table of unique hash keys and seek information from all the data indices created; and

    in a third phase, using the one or more computer systems to query the master key table of unique hash keys and using the seek information to produce the data files associated with the hash keys to a storage system,wherein there are at least two custodian containers, and a first phase on a second container is configured to perform substantially in parallel with a second phase on a first container upon completion of a first phase on the first container.

View all claims
  • 17 Assignments
Timeline View
Assignment View
    ×
    ×