×

Systems and methods for efficient data searching, storage and reduction

  • US 7,523,098 B2
  • Filed: 09/15/2004
  • Issued: 04/21/2009
  • Est. Priority Date: 09/15/2004
  • Status: Expired due to Fees
First Claim
Patent Images

1. A computer-readable storage media encoded with computer-executable instructions to configure a processor to perform a method for identifying input data in repository data, the method comprising:

  • providing an index of the repository data comprising a plurality of repository distinguishing characteristics (RDCs) for each of a plurality of chunks of the repository data;

    partitioning the input data into a plurality of input chunks and for each input chunk, determining a plurality of input distinguishing characteristics (IDCs);

    wherein the distinguishing characteristics (DCs) of the repository and input chunks are determined by;

    selecting a seed size and calculating hash values for every seed of the chunk;

    selecting a subset of the plurality of hash values;

    determining positions of the seeds within a seed sequence of the selected subset of hash values;

    applying a function to the determined positions to determine corresponding other positions within the seed sequence;

    defining the set of distinguishing characteristics as the hash values of the seeds at the determined other positions;

    conducting a similarity search for each input chunk comprising searching the index for matches of the IDCs of the input chunk with the RDCs, wherein the similarity searching requires a threshold number of matching IDCs and RDCs for a declared similarity of an input chunk and similar repository chunk; and

    computing at least one of common and noncommon sections of the input data and repository data using the locations of pairs of matching distinguishing characteristics of an input chuck and similar repository chunk as anchors to define corresponding intervals in the input data and repository data for use in identifying said common or noncommon data sections.

View all claims
  • 3 Assignments
Timeline View
Assignment View
    ×
    ×