String searcher, and compressor using same
First Claim
1. A method for searching a vector of symbols for a predetermined target string, comprising:
- generating a pointer array having one pointer for each of the symbols in the vector;
calculating, by means of a hashing function, the hashing value of each of the symbols, and arranging the pointer array in order of hashing value of the respective symbol, the hashing function being chosen so that there are fewer hashing values than there are pointers in the pointer array;
generating an index of the pointers in the pointer array, that index having only as many entries as there are possible hashing values, and therefore substantially fewer entries than the pointer array;
applying the hashing function to the target string, to generate a target hashing value; and
if the target hashing value is not defined or is null, sending a response that the target string does not appear in the vector;
if the target hashing value is defined and not null, using the index to determine those pointers to symbols having hashing values which are equal to the target hashing value, and sequentially comparing the target string to each of the locations in the vector pointed to by each of the corresponding pointers, to determine whether each location contains a match to the target string
5 Assignments
0 Petitions
Accused Products
Abstract
Methods and apparatus for string searching and data compression. In the string search method and apparatus pointers to the string to be searched are indexed via a hashing function and organized according to the hashing values of the string elements pointed to. The hashing function is also run on the string desired to be found, and the resulting hashing value is used to access the index. If the resulting hashing value is not in the index, it is known that the target string does not appear in the string being searched. Otherwise the index is used to determine the pointers which correspond to the target hashing value, these pointers pointing to likely candidates for matching the target string. The pointers are then used to sequentially compare each of the locations in the string being searched to the target string, to determine whether each location contains a match to the target string. In the method and apparatus for compressing a stream of data symbols, a fixed length search window, comprising a predetermined contiguous portion of the symbol stream, is selected as the string to be searched by the string searcher. If a string to be compressed is found in the symbol stream, a code is output designating the location within the search window of the matching string and the length of the matching string.
249 Citations
12 Claims
-
1. A method for searching a vector of symbols for a predetermined target string, comprising:
-
generating a pointer array having one pointer for each of the symbols in the vector; calculating, by means of a hashing function, the hashing value of each of the symbols, and arranging the pointer array in order of hashing value of the respective symbol, the hashing function being chosen so that there are fewer hashing values than there are pointers in the pointer array; generating an index of the pointers in the pointer array, that index having only as many entries as there are possible hashing values, and therefore substantially fewer entries than the pointer array; applying the hashing function to the target string, to generate a target hashing value; and if the target hashing value is not defined or is null, sending a response that the target string does not appear in the vector; if the target hashing value is defined and not null, using the index to determine those pointers to symbols having hashing values which are equal to the target hashing value, and sequentially comparing the target string to each of the locations in the vector pointed to by each of the corresponding pointers, to determine whether each location contains a match to the target string - View Dependent Claims (2, 3)
-
-
4. A string searcher for searching an existing vector of symbols for the existence a target string, comprising:
-
a hashing function for accepting an input and providing a hashing value, and permitting several inputs to have the same hashing value; a pointer array having one pointer for each of the symbols in the vector, said pointer array being arranged in order of the hashing values of the symbols pointed to; an index, having one entry for each value generated by the hashing function, and having substantially fewer entries than the pointer array; a target string hashing value obtained by operating the hashing function on the target string; a comparator for comparing the target string to locations in the vector pointed to by the pointers corresponding the target string hashing value as an index value, and for determining whether a match exists. - View Dependent Claims (5, 6)
-
-
7. A data compressor for compressing a stream of data symbols, comprising:
-
a finite, substantially fixed length search window, comprising a predetermined contiguous portion of the symbol stream; a hashing function for accepting an input and providing a hashing value, said hashing function permitting several inputs to have the same hashing value; a pointer array having one pointer for each of the symbols in the search window, said pointer array being arranged in order of the hashing values of the data symbols pointed to; an index into said pointer array, having one entry for each value generated by the hashing function, and having substantially fewer entries than the pointer array; means for selecting a portion of the symbol stream as the target symbol string to be compressed; a target symbol string hashing value obtained by operating the hashing function on the target symbol string; a comparator for comparing the target string to locations in the search window pointed to by the pointers corresponding to the target symbol string hashing value as an index value, and for determining whether a matching string of symbols exists in the search window; means for outputting a code such that, if a matching string is found, that code designates the location within the search window of the matching string and the length of the matching string and, if no matching string is found, that code designates the target string directly. - View Dependent Claims (8, 9)
-
-
10. A method of encoding a stream of data symbols, comprising:
-
selecting a finite, substantially fixed length search window, as a predetermined contiguous portion of the symbol stream; providing a hashing function which accepts an input and generates a hashing value, and which permits several inputs to have the same hashing value; generating a pointer array having one pointer for each of the symbols in the search window; arranging said pointer array in order of the hashing values of the symbols pointed to; generating an index having one entry for each value generated by the hashing function, that index having substantially fewer entries than pointers in the pointer array; selecting a portion of the symbol stream as a target symbol string to be encoded; operating the hashing function on the target symbol string to obtain a target symbol string hashing value; comparing the target symbol string to locations in the search window pointed to by the pointers corresponding to the target symbol string hashing value as an index value, and determining whether a matching string of symbols exists in the search window; if a matching string is found, outputting a code designating the location within the search window of the matching string and the length of the matching string; if no matching string is found, outputting the target string. - View Dependent Claims (11, 12)
-
Specification