Systems and Methods for Efficient Data Searching, Storage and Reduction
0 Assignments
0 Petitions
Accused Products
Abstract
Systems and methods enabling search of a repository for the location of data that is similar to input data, using a defined measure of similarity, in a time that is independent of the size of the repository and linear in a size of the input data, and a space that is proportional to a small fraction of the size of the repository. The similar data segments thus located are further analyzed to determine their common (identical) data sections, regardless of the order and position of the common data sections in the repository and input, and in a time that is linear in the segment size and in constant space.
96 Citations
183 Claims
-
1-177. -177. (canceled)
-
178. A method of searching in repository data for data that are similar to an input data, wherein the repository data comprises a plurality of repository data chunks each having a corresponding set of repository distinguishing characteristics (RDCs), the method comprising:
- calculating a set of input distinguishing characteristics (IDCs) for each input data chunk by;
selecting IDCs that are robust with respect to modifications of the corresponding input data chunk; and ordering the selected IDCs in a first order; determining a similarity threshold value N; and for each set of IDCs; determining, in the first order, if an IDC is found among the RDCs; and when a number j of IDCs found among the RDCs is at least N, then determining that the input data chunk corresponding to the set of IDCs is similar to at least one repository data chunk in the repository. - View Dependent Claims (179, 180, 181, 182)
- calculating a set of input distinguishing characteristics (IDCs) for each input data chunk by;
-
183-186. -186. (canceled)
Specification