×

Systems and methods for efficient data searching, storage and reduction

  • US 9,430,486 B2
  • Filed: 03/19/2009
  • Issued: 08/30/2016
  • Est. Priority Date: 09/15/2004
  • Status: Active Grant
First Claim
Patent Images

1. A method in repository data for data that are similar to an input data, wherein the repository data comprises a plurality of repository data chunks and the input data comprises a plurality of input data chunks, the method comprising:

  • for each repository data chunk, generating a corresponding set of repository distinguishing characteristics (RDCs);

    for each input data chunk, generating a corresponding set of input distinguishing characteristics (IDCs); and

    searching for data in the repository data that is similar to the input data by comparing the IDC sand RDCs,wherein each set of RDCs and IDCs is generated by;

    applying a hash function to the respective input data chunk or repository data chunk to generate a plurality of hashes, each hash comprising a hash value and a hash position within the data chunk;

    applying a first function to the plurality of generated hashes to identify a first subset of hashes distributed across the data chunk;

    applying a second function to the hash positions of the hashes of the first subset to identify a second subset of the plurality of generated hashes; and

    defining the second subset of hashes as the set of respective IDCs or RDCs.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×