Systems and methods for efficient data searching, storage and reduction
First Claim
1. A computer program product for searching a data storage system repository of binary uninterpretted data for a location of common data to an input data, the computer program product comprising a computer readable storage medium having program instructions executable by a processor to cause the processor to perform a method comprising:
- analyzing, by the processor, segments of each of the repository and input data to determine a repository segment that is similar to an input segment, wherein the analyzing the segments includes;
searching an index of repository representation values for representation values which match input representation values, wherein the searching the index is performed in an amount of time which is independent of a size of the repository and which is linear in a size of the input data; and
specifying locations in the repository and input data of distinguishing characteristics corresponding to the matching representation values as matched values, wherein the amount of time which the searching the index is performed reduces an amount of processing time consumed by the processor during the searching and the specifying; and
analyzing, by the processor, the similar repository segment with respect to the input segment to determine their common data sections, wherein at least some of the matching representation values are utilized for data alignment, wherein the analyzing is performed in an amount of time which is linear in a size of the input segment.
3 Assignments
0 Petitions
Accused Products
Abstract
A computer program product for searching a repository of binary uninterpretted data, according to one embodiment, includes a computer readable storage medium having program instructions executable by a computer to cause the computer to perform a method comprising: analyzing, by the computer, segments of each of the repository and input data to determine a repository segment that is similar to an input segment, the analyzing including searching an index of representation values of the repository data for matching representation values of the input in a time independent of a size of the repository and linear in a size of the input data; and analyzing, by the computer, the similar repository segment with respect to the input segment to determine their common data sections while utilizing at least some of the matching representation values for data alignment, in a time linear in a size of the input segment.
-
Citations
20 Claims
-
1. A computer program product for searching a data storage system repository of binary uninterpretted data for a location of common data to an input data, the computer program product comprising a computer readable storage medium having program instructions executable by a processor to cause the processor to perform a method comprising:
-
analyzing, by the processor, segments of each of the repository and input data to determine a repository segment that is similar to an input segment, wherein the analyzing the segments includes; searching an index of repository representation values for representation values which match input representation values, wherein the searching the index is performed in an amount of time which is independent of a size of the repository and which is linear in a size of the input data; and specifying locations in the repository and input data of distinguishing characteristics corresponding to the matching representation values as matched values, wherein the amount of time which the searching the index is performed reduces an amount of processing time consumed by the processor during the searching and the specifying; and analyzing, by the processor, the similar repository segment with respect to the input segment to determine their common data sections, wherein at least some of the matching representation values are utilized for data alignment, wherein the analyzing is performed in an amount of time which is linear in a size of the input segment. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A computer program product for searching a data storage system repository of binary uninterpretted data for a location of common data to an input data, the computer program product comprising a computer readable storage medium having program instructions executable by a processor to cause the processor to perform a method comprising:
-
analyzing, by the processor, segments of each of the repository and input data to determine a repository segment that is similar to an input segment, wherein the analyzing includes; searching an index of repository representation values for representation values which match input representation values, wherein the searching the index is performed in an amount of time which is independent of a size of the repository and which is linear in a size of the input data; and specifying locations in the repository and input data of distinguishing characteristics corresponding to the matching representation values as matched values, wherein the amount of time which the searching the index is performed reduces an amount of processing time consumed by the processor during the searching and the specifying; and analyzing, by the processor, the similar repository segment with respect to the input segment to determine their common data sections, wherein the specified locations are utilized for data alignment, wherein the analyzing is performed in an amount of time which is linear in a size of the input segment; and using the determined common data sections to align the similar repository and input segments. - View Dependent Claims (13, 14, 15, 16, 17, 18)
-
-
19. A data storage system comprising:
-
magnetic disk; magnetic tape; a repository of binary uninterpretted data, wherein the repository data is stored on the magnetic disk and/or the magnetic tape; and a processor; wherein the processor is configured to perform a method of searching for a location of repository data which is common to input data, the method comprising; analyzing, by the processor, segments of each of the repository data and the input data to determine a repository segment that is similar to an input segment, wherein the analyzing the segment includes; searching an index of repository representation values for representation values which match input representation values, wherein the searching the index is performed in an amount of time which is independent of a size of the repository, and which is linear in a size of the input data; and determining that at least one repository segment is similar to at least one input segment in response to finding at least n matches between the repository representation values of the at least one repository segment and the input representation values of the at least one input segment, wherein the amount of time which the searching the index is performed reduces an amount of processing time consumed by the processor during the searching and the determining; and in response to determining a repository segment that is similar to an input segment, analyzing, by the processor, the similar repository segment with respect to the input segment to determine their common data sections, wherein at least some of the matching representation values are utilized for data alignment, wherein the analyzing is performed in an amount of time which is linear in a size of the input segment. - View Dependent Claims (20)
-
Specification