×

Systems and methods for efficient data searching, storage and reduction

  • US 8,275,756 B2
  • Filed: 03/20/2009
  • Issued: 09/25/2012
  • Est. Priority Date: 09/15/2004
  • Status: Expired due to Fees
First Claim
Patent Images

1. A system for searching in a repository data for data that are similar to an input data, the repository data being divided into one or more repository chunks, the system comprising:

  • means for, for each repository chunk, calculating a corresponding set of repository distinguishing characteristics (RDCs), each set of RDCs comprising a plurality of distinguishing characteristics, said means arranged to partition the respective data chunks into a plurality of seeds, each seed being a smaller part of the respective data chunk and ordered in a seed sequence and to apply a hash function to each of the seeds to generate a plurality of hash values wherein each seed yields one hash value;

    means for maintaining an index associating each set of RDCs and the corresponding repository chunk;

    means for comparing input distinguishing characteristics of an input chunk of input data to one or more sets of RDCs stored in the index to determine whether a similarity exists between the input chunk and the distinguishing repository chunk, characterized in that;

    said comparing means is configured to determine a similarity exists if a similarity threshold (j) of a set of input distinguishing characteristics is found in a set of RDCs stored in the index; and

    in that said calculating means is configured to select a subset (k) of the plurality of hash values;

    to determine positions of the seeds within the seed sequence corresponding to the selected subset of hash values;

    to apply a function to the determined positions to determine corresponding other positions within the seed sequence; and

    to define the set of distinguishing characteristics as the hash values of the seeds at the determined other positions.

View all claims
  • 0 Assignments
Timeline View
Assignment View
    ×
    ×