Systems and Methods for Efficient Data Searching, Storage and Reduction
0 Assignments
0 Petitions
Accused Products
Abstract
Systems and methods enabling search of a repository for the location of data that is similar to input data, using a defined measure of similarity, in a time that is independent of the size of the repository and linear in a size of the input data, and a space that is proportional to a small fraction of the size of the repository. The similar data segments thus located are further analyzed to determine their common (identical) data sections, regardless of the order and position of the common data sections in the repository and input, and in a time that is linear in the segment size and in constant space.
101 Citations
56 Claims
-
1-30. -30. (canceled)
-
31. A method enabling lossless data reduction by partitioning version data into:
-
a) data already stored in a repository; and b) data not already stored in the repository; wherein, each of the repository data and the version data comprise a plurality of data chunks, and wherein the method comprises; storing in an index a plurality of n distinguishing characteristics (RDCs) and a position in the repository of each of a plurality of repository chunks, where n is substantially smaller than size m of the repository chunk; and for each version chunk determining a plurality of k distinguishing characteristics (IDCs) of the version chunk, where k is greater than or equal to n; determining whether a similar repository chunk exists based on a plurality of matching distinguishing characteristics in the version chunk and similar repository chunk, wherein the similarity determination includes searching for each of the k distinguishing characteristics of the version chunk in the index until at most n matches are found; determining that one or more similar repository chunks exist where the number of matches satisfies a threshold; determining differences between the version chunk and similar repository chunk by comparing the full data of the respective chunks; and storing the differences in the repository. - View Dependent Claims (33, 36, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 54, 55)
-
-
32. (canceled)
-
34-35. -35. (canceled)
-
37-38. -38. (canceled)
-
53. (canceled)
-
56-186. -186. (canceled)
Specification